Mem0 — Memory Architecture Deep Dive

The “universal memory layer for AI agents.” LLM-based memory extraction pipeline + vector + graph storage. 80K+ developers, 186M API calls/quarter, $24M Series A. Used by Netflix, Lemonade, Rocket Money.

Core Architecture: Extract → Update → Store

Phase 1: Extraction

Processes a message pair (user + assistant) along with conversation summary and recent messages. An LLM extracts salient facts — this is entirely LLM-based, not NLP/regex:

Input: {
  user_message: "I'm allergic to peanuts and prefer window seats",
  assistant_response: "I'll remember that for future bookings",
  conversation_summary: "User is planning trip to Tokyo...",
  recent_messages: [...]
}
↓
LLM Extraction: [
  { fact: "User is allergic to peanuts", category: "health" },
  { fact: "User prefers window seats", category: "preference" }
]

Phase 2: Update (Tool Call Mechanism)

Extracted memories are evaluated against existing similar memories. The LLM decides via tool calls:

ActionWhen
ADDNew fact not in memory
UPDATEExisting fact needs revision (e.g., “moved from NYC to SF”)
DELETEFact is no longer true or relevant

This is the key differentiator — memory is actively managed, not just appended.

Phase 3: Storage (Dual Backend)

Vector storage: Embeddings of memory facts in Qdrant, Pinecone, Weaviate, Chroma, etc.

Graph storage (Mem0-g): Entity-relationship graph alongside vector:

  • Entity Extractor: Identifies entities as nodes
  • Relations Generator: Infers labeled edges between entities
  • Stores in Neo4j, Memgraph, Neptune, Kuzu, or Apache AGE on PostgreSQL
[User] --allergic_to--> [Peanuts]
[User] --prefers--> [Window Seat]
[User] --planning_trip--> [Tokyo]
[Tokyo Trip] --departure--> [March 2026]

Graph memory enables multi-hop reasoning: “What dietary restrictions should we consider for the user’s Tokyo trip?” → traverse graph from Tokyo Trip → User → Allergies → Peanuts.

Mem0 vs Letta/MemGPT: When to Use Which

Mem0Letta/MemGPT
What it isMemory layer (bolt-on service)Agent runtime with built-in memory
IntegrationAdd to any framework (LangChain, CrewAI, OpenAI SDK)Agents run inside Letta runtime
Lock-inMinimal — swap memory calls, keep your frameworkArchitectural — agents built on Letta
Memory modelLLM extracts facts → vector + graphThree-tier: Core (in-context RAM) + Recall (searchable history) + Archival (long-term)
Who manages memoryAutomatic pipelineAgent self-manages via tool calls
Multi-languagePython + JavaScript SDKsPython-first
ComplianceSOC 2, HIPAA (managed service)Self-hosted focus
Best forAdding memory to existing agent productsLong-running autonomous agents needing OS-like memory management

Decision rule: If you have an existing agent framework and want to add memory → Mem0. If you’re building a new autonomous agent from scratch → Letta.

Comparison with Claude Code’s Memory

AspectMem0Claude Code
ArchitectureExtract → Update → Store7-layer hierarchy (L1-L7)
Pointer patternFlat vector + graphMEMORY.md index → topic files on demand
ConsolidationReal-time (on each message)Batch (autoDream during idle)
Conflict resolutionLLM decides UPDATE/DELETEautoDream Phase 3 (Refine)
Graph supportNative (Mem0-g)None

Integration Ecosystem

Native integrations with:

  • CrewAI, OpenAI Agents SDK, Google AI ADK
  • Flowise, Langflow, Mastra
  • AWS Agent SDK (exclusive memory provider)

Production Numbers

  • 80,000+ developers on cloud service
  • API calls: 35M (Q1 2025) → 186M (Q3 2025) — 5x growth
  • $24M Series A (YC, Peak XV, Basis Set — October 2025)
  • Enterprise customers: Netflix, Lemonade, Rocket Money

Sources