Agent Memory Systems

The dominant production pattern: structured memory blocks always in-context (like RAM) + vector-searchable archival store + offline consolidation pipeline. Claude Code’s leak confirmed this is exactly how Anthropic does it.

The Problem

Without persistent memory, AI agents start every session from scratch — “a brilliant employee with amnesia.” The challenge is balancing:

  • Relevance: Only load what matters for this task
  • Cost: Don’t burn tokens on irrelevant history
  • Freshness: Resolve contradictions between old and new information
  • Privacy: Control what persists and what’s forgotten

Claude Code’s 7-Layer Memory (Production Reference)

From the source leak:

LayerNamePersistenceWhat It Stores
L1In-contextSession onlyCurrent messages array
L2Working MemoryProject-levelCLAUDE.md + pinned context (injected at session start)
L3EpisodicLog-levelAppend-only session logs (KAIROS daemon)
L4SemanticCore knowledgeSolidified facts from autoDream consolidation
L5ProceduralSkill-levelReusable workflows in skills/ directory
L6ContactRelationshipKnown people and roles
L7TeamCross-userShared remote state with delta sync

Key insight: MEMORY.md acts as a lightweight pointer index (~150 chars/line) that’s always loaded. Actual knowledge is distributed across topic files, fetched on-demand. This is the pointer-based architecture — cheap persistent index + deep retrieval.

autoDream: Idle-Time Consolidation

When user is idle 5+ minutes, KAIROS spawns a background subagent:

  1. Scan: Extract observations from daily log
  2. Merge: Combine similar observations, find patterns
  3. Refine: Remove contradictions against existing semantic memory
  4. Commit: Convert vague observations → absolute facts

This batch processing approach is more efficient than real-time consolidation and doesn’t pollute the main context.

Major Memory Frameworks

Letta (formerly MemGPT)

Philosophy: LLM-as-OS — the agent manages its own memory via explicit function calls, like an operating system managing RAM, disk, and swap.

Three-tier architecture:

  • Core Memory (always in-context, like RAM): Goals, user persona, current task state. ~2-4K tokens.
  • Archival Memory (vector store, like disk): Long-term knowledge queried via explicit archival_memory_search tool calls.
  • Recall Memory (conversation history, searchable): Past interactions indexed for retrieval.

Key innovation: The agent decides what to promote from archival to core, and what to archive from core. Memory management is a first-class tool, not a background process.

Results: 18% accuracy gains, 2.5x cost reduction per query (vs. full history in context). $10M funding, production-ready.

Mem0 (Managed Memory Layer)

More opinionated than Letta. Automatic extraction and retrieval of memories from conversations. Less flexible but easier to integrate. The Mem0 paper (arxiv 2504.19413) formalizes the architecture.

Best for: Teams that want memory without building the infrastructure. Drop-in integration with existing agent systems.

Zep

Focuses specifically on conversation memory with automatic fact extraction, entity tracking, and temporal awareness.

Best for: Customer-facing agents where conversation history is the primary memory source.

Pattern Comparison

AspectClaude CodeLetta/MemGPTMem0Zep
Architecture7-layer hierarchy3-tier (core/archival/recall)Managed extractionConversation-focused
Who manages memoryBackground daemon (autoDream)Agent itself (via tools)Automatic pipelineAutomatic pipeline
ConsolidationBatch (idle-triggered)Real-time (agent decides)BackgroundBackground
Conflict resolutionautoDream Phase 3 (Refine)Agent reasoningAutomaticTemporal ordering
Open sourceNo (leaked)YesYesYes
MaturityProduction ($25B ARR product)ProductionProductionProduction

The Dominant Production Pattern

Across all implementations, the winning pattern is:

1. Structured memory blocks always in context
   (user profile, session state, key facts — like Claude Code's L2)

2. Vector-searchable archival store for long-tail knowledge
   (RAG over past conversations/documents — like Letta's archival)

3. Offline consolidation pipeline
   (batch processing to extract, deduplicate, solidify — like autoDream)

VentureBeat predicts contextual memory will surpass RAG for agentic AI in 2026 — meaning the memory layer becomes more important than the retrieval layer for agent performance.

Implementation Guidance

If building from scratch: Start with the pointer-based architecture (MEMORY.md pattern). A JSON/markdown index file always in context, pointing to topic-specific files loaded on demand. Add autoDream-style consolidation when you have enough session data.

If using a framework: Letta for maximum control (agent self-manages memory). Mem0 for quickest integration. Zep if conversation history is your primary concern.

Sources