Mem0 — Memory Architecture Deep Dive
The “universal memory layer for AI agents.” LLM-based memory extraction pipeline + vector + graph storage. 80K+ developers, 186M API calls/quarter, $24M Series A. Used by Netflix, Lemonade, Rocket Money.
Core Architecture: Extract → Update → Store
Phase 1: Extraction
Processes a message pair (user + assistant) along with conversation summary and recent messages. An LLM extracts salient facts — this is entirely LLM-based, not NLP/regex:
Input: {
user_message: "I'm allergic to peanuts and prefer window seats",
assistant_response: "I'll remember that for future bookings",
conversation_summary: "User is planning trip to Tokyo...",
recent_messages: [...]
}
↓
LLM Extraction: [
{ fact: "User is allergic to peanuts", category: "health" },
{ fact: "User prefers window seats", category: "preference" }
]
Phase 2: Update (Tool Call Mechanism)
Extracted memories are evaluated against existing similar memories. The LLM decides via tool calls:
| Action | When |
|---|---|
| ADD | New fact not in memory |
| UPDATE | Existing fact needs revision (e.g., “moved from NYC to SF”) |
| DELETE | Fact is no longer true or relevant |
This is the key differentiator — memory is actively managed, not just appended.
Phase 3: Storage (Dual Backend)
Vector storage: Embeddings of memory facts in Qdrant, Pinecone, Weaviate, Chroma, etc.
Graph storage (Mem0-g): Entity-relationship graph alongside vector:
- Entity Extractor: Identifies entities as nodes
- Relations Generator: Infers labeled edges between entities
- Stores in Neo4j, Memgraph, Neptune, Kuzu, or Apache AGE on PostgreSQL
[User] --allergic_to--> [Peanuts]
[User] --prefers--> [Window Seat]
[User] --planning_trip--> [Tokyo]
[Tokyo Trip] --departure--> [March 2026]
Graph memory enables multi-hop reasoning: “What dietary restrictions should we consider for the user’s Tokyo trip?” → traverse graph from Tokyo Trip → User → Allergies → Peanuts.
Mem0 vs Letta/MemGPT: When to Use Which
| Mem0 | Letta/MemGPT | |
|---|---|---|
| What it is | Memory layer (bolt-on service) | Agent runtime with built-in memory |
| Integration | Add to any framework (LangChain, CrewAI, OpenAI SDK) | Agents run inside Letta runtime |
| Lock-in | Minimal — swap memory calls, keep your framework | Architectural — agents built on Letta |
| Memory model | LLM extracts facts → vector + graph | Three-tier: Core (in-context RAM) + Recall (searchable history) + Archival (long-term) |
| Who manages memory | Automatic pipeline | Agent self-manages via tool calls |
| Multi-language | Python + JavaScript SDKs | Python-first |
| Compliance | SOC 2, HIPAA (managed service) | Self-hosted focus |
| Best for | Adding memory to existing agent products | Long-running autonomous agents needing OS-like memory management |
Decision rule: If you have an existing agent framework and want to add memory → Mem0. If you’re building a new autonomous agent from scratch → Letta.
Comparison with Claude Code’s Memory
| Aspect | Mem0 | Claude Code |
|---|---|---|
| Architecture | Extract → Update → Store | 7-layer hierarchy (L1-L7) |
| Pointer pattern | Flat vector + graph | MEMORY.md index → topic files on demand |
| Consolidation | Real-time (on each message) | Batch (autoDream during idle) |
| Conflict resolution | LLM decides UPDATE/DELETE | autoDream Phase 3 (Refine) |
| Graph support | Native (Mem0-g) | None |
Integration Ecosystem
Native integrations with:
- CrewAI, OpenAI Agents SDK, Google AI ADK
- Flowise, Langflow, Mastra
- AWS Agent SDK (exclusive memory provider)
Production Numbers
- 80,000+ developers on cloud service
- API calls: 35M (Q1 2025) → 186M (Q3 2025) — 5x growth
- $24M Series A (YC, Peak XV, Basis Set — October 2025)
- Enterprise customers: Netflix, Lemonade, Rocket Money
Related Pages
- Agent Memory Systems
- Claude Code Architecture Deep Dive
- Mastra Agent Framework
- RAG Architecture Patterns 2026