Agent Memory Systems

The dominant production pattern: structured memory blocks always in-context (like RAM) + vector-searchable archival store + offline consolidation pipeline. Claude Code’s leak confirmed this is exactly how Anthropic does it.

The Problem

Without persistent memory, AI agents start every session from scratch — “a brilliant employee with amnesia.” The challenge is balancing:

Relevance: Only load what matters for this task
Cost: Don’t burn tokens on irrelevant history
Freshness: Resolve contradictions between old and new information
Privacy: Control what persists and what’s forgotten

Claude Code’s 7-Layer Memory (Production Reference)

From the source leak:

Layer	Name	Persistence	What It Stores
L1	In-context	Session only	Current messages array
L2	Working Memory	Project-level	CLAUDE.md + pinned context (injected at session start)
L3	Episodic	Log-level	Append-only session logs (KAIROS daemon)
L4	Semantic	Core knowledge	Solidified facts from autoDream consolidation
L5	Procedural	Skill-level	Reusable workflows in skills/ directory
L6	Contact	Relationship	Known people and roles
L7	Team	Cross-user	Shared remote state with delta sync

Key insight: MEMORY.md acts as a lightweight pointer index (~150 chars/line) that’s always loaded. Actual knowledge is distributed across topic files, fetched on-demand. This is the pointer-based architecture — cheap persistent index + deep retrieval.

autoDream: Idle-Time Consolidation

When user is idle 5+ minutes, KAIROS spawns a background subagent:

Scan: Extract observations from daily log
Merge: Combine similar observations, find patterns
Refine: Remove contradictions against existing semantic memory
Commit: Convert vague observations → absolute facts

This batch processing approach is more efficient than real-time consolidation and doesn’t pollute the main context.

Major Memory Frameworks

Letta (formerly MemGPT)

Philosophy: LLM-as-OS — the agent manages its own memory via explicit function calls, like an operating system managing RAM, disk, and swap.

Three-tier architecture:

Core Memory (always in-context, like RAM): Goals, user persona, current task state. ~2-4K tokens.
Archival Memory (vector store, like disk): Long-term knowledge queried via explicit archival_memory_search tool calls.
Recall Memory (conversation history, searchable): Past interactions indexed for retrieval.

Key innovation: The agent decides what to promote from archival to core, and what to archive from core. Memory management is a first-class tool, not a background process.

Results: 18% accuracy gains, 2.5x cost reduction per query (vs. full history in context). $10M funding, production-ready.

Mem0 (Managed Memory Layer)

More opinionated than Letta. Automatic extraction and retrieval of memories from conversations. Less flexible but easier to integrate. The Mem0 paper (arxiv 2504.19413) formalizes the architecture.

Best for: Teams that want memory without building the infrastructure. Drop-in integration with existing agent systems.

Zep

Focuses specifically on conversation memory with automatic fact extraction, entity tracking, and temporal awareness.

Best for: Customer-facing agents where conversation history is the primary memory source.

Pattern Comparison

Aspect	Claude Code	Letta/MemGPT	Mem0	Zep
Architecture	7-layer hierarchy	3-tier (core/archival/recall)	Managed extraction	Conversation-focused
Who manages memory	Background daemon (autoDream)	Agent itself (via tools)	Automatic pipeline	Automatic pipeline
Consolidation	Batch (idle-triggered)	Real-time (agent decides)	Background	Background
Conflict resolution	autoDream Phase 3 (Refine)	Agent reasoning	Automatic	Temporal ordering
Open source	No (leaked)	Yes	Yes	Yes
Maturity	Production ($25B ARR product)	Production	Production	Production

The Dominant Production Pattern

Across all implementations, the winning pattern is:

1. Structured memory blocks always in context
   (user profile, session state, key facts — like Claude Code's L2)

2. Vector-searchable archival store for long-tail knowledge
   (RAG over past conversations/documents — like Letta's archival)

3. Offline consolidation pipeline
   (batch processing to extract, deduplicate, solidify — like autoDream)

VentureBeat predicts contextual memory will surpass RAG for agentic AI in 2026 — meaning the memory layer becomes more important than the retrieval layer for agent performance.

Implementation Guidance

If building from scratch: Start with the pointer-based architecture (MEMORY.md pattern). A JSON/markdown index file always in context, pointing to topic-specific files loaded on demand. Add autoDream-style consolidation when you have enough session data.

If using a framework: Letta for maximum control (agent self-manages memory). Mem0 for quickest integration. Zep if conversation history is your primary concern.

KahWei's Wiki

Explorer

Agent Memory Systems

Agent Memory Systems

The Problem

Claude Code’s 7-Layer Memory (Production Reference)

autoDream: Idle-Time Consolidation

Major Memory Frameworks

Letta (formerly MemGPT)

Mem0 (Managed Memory Layer)

Zep

Pattern Comparison

The Dominant Production Pattern

Implementation Guidance

Sources

Graph View

Table of Contents

Backlinks

KahWei's Wiki

Explorer

Agent Memory Systems

Agent Memory Systems

The Problem

Claude Code’s 7-Layer Memory (Production Reference)

autoDream: Idle-Time Consolidation

Major Memory Frameworks

Letta (formerly MemGPT)

Mem0 (Managed Memory Layer)

Zep

Pattern Comparison

The Dominant Production Pattern

Implementation Guidance

Related Pages

Sources

Graph View

Table of Contents

Backlinks