Claude Code Architecture Deep Dive

Not a chat wrapper — a full multi-agent operating system with 3 subagent models, 7-layer memory, 5 context compression strategies, and 88 compile-time feature gates. Leaked via .map sourcemap in npm package, March 31, 2026.

How It Leaked

Anthropic uses Bun as the build tool. Bun’s bundler generates sourcemaps by default. Someone forgot to exclude *.map in .npmignore, shipping cli.js.map (59.8 MB) containing 1,900 TypeScript files and 512K+ lines of original source. Exposed for ~3 hours before takedown.

Irony: Claude Code has an internal “Undercover Mode” to prevent AI from leaking codenames in git commits — but Anthropic itself leaked the entire source, likely via a build process operated by Claude.

Core Architecture

src/
├── main.tsx          # CLI entry · Commander.js + Ink REPL (4,683 lines)
├── query.ts          # Core Agent Loop · largest single file (785 KB)
├── QueryEngine.ts    # SDK/Headless query lifecycle (~1,295 lines)
├── Tool.ts           # Tool interface + buildTool factory (29K lines base)
├── tools/            # ~40 tool implementations
├── commands/         # ~50 slash commands
├── components/       # ~140 React/Ink UI components
├── coordinator/      # Multi-agent Coordinator system
├── memdir/           # Persistent memory directory
├── skills/           # Reusable workflow definitions
├── plugins/          # Plugin system
├── bridge/           # VS Code / JetBrains IDE integration
├── buddy/            # Tamagotchi companion (BUDDY flag)
└── constants/
    └── betas.ts      # All beta API header definitions

Common misconception: The 785KB file is query.ts (the agent loop), NOT main.tsx. This error was widely propagated by secondary articles.

The Agent Loop (query.ts)

The core is a while loop + streaming + tool injection pattern:

while (true) {
  1. Check token budget → compress if over 85%
  2. Stream request to Claude API
  3. Collect text chunks (yield to user) + tool calls
  4. If no tool calls → break (task complete)
  5. Execute tools in parallel (Promise.all)
  6. Inject tool results back into message history
  7. Continue loop
}

Key details:

  • Token budget management: Each subagent gets an allocated budget. Exceeding triggers compression, not errors.
  • 14 cache-break vectors: Tracks conditions that invalidate prompt cache (model switch, tool schema update, CLAUDE.md change, etc.). Minimizing cache misses is a core cost optimization.
  • Parallel tool execution: Multiple tool calls in a single response are executed concurrently via Promise.all.

5 Context Compression Strategies

When context approaches the limit, Claude Code doesn’t error — it compresses:

StrategyDescriptionCost
Tool result compressionTruncate large tool outputs (file contents, etc.)Zero
Image downscalingReduce screenshot resolutionZero
Cache-aware pruningDelete only messages after cache boundary, preserve cached prefixSaves cache $
SummarizationCall Claude to generate history summary, replace original messagesOne API call
TruncationDrop oldest messagesZero (last resort)

Tool System (29K Lines)

Each tool is a self-describing, permission-gated plugin unit:

interface Tool<TInput, TOutput> {
  name: string
  description: string           // Used in Claude's system prompt
  inputSchema: JSONSchema        // Validated before execution
  permissionLevel: 'always-allow' | 'ask-once' | 'ask-always'
  isReadOnly: boolean
  execute(input: TInput): Promise<TOutput>
}

Permission Gate — three layers:

  1. Tool-level: Each tool declares its own permission level
  2. Bash security: bashSecurity.ts has 23 named checks gating every shell command
  3. Coordinator approval: Dangerous operations from worker agents route to coordinator for human approval

Three Subagent Execution Models

This is one of the most important architectural innovations:

ModelIsolationContext SharingBest For
ForkSeparate processRead-only snapshot of parent contextLong-running, high-risk tasks (refactoring)
TeammateAsyncLocalStorage (in-process)Shared session state + scratchpadFast parallel subtasks within same session
WorktreeGit worktree (separate branch)Independent working directoryParallel code experiments, A/B comparison

Fork model: Child gets a curated subset of parent context, scoped tools, allocated budget, and read-only memory snapshot. Results return to parent without polluting parent context.

Coordinator mode: When activated, Claude becomes a “director” — dispatches Workers in parallel. The system prompt explicitly states: “Parallelism is your superpower. Don’t serialize work that can run simultaneously.” and “Do NOT say ‘based on your findings’ — read the actual findings and specify exactly what to do.”

Worker communication: XML-based protocol with structured task notifications including status, results, and suggested next actions. Workers share persistent findings via a shared scratchpad directory (tengu_scratch feature gate).

7-Layer Memory Architecture

LayerNamePersistenceDescription
L1In-contextSession onlyCurrent messages array
L2Working MemoryProject-levelCLAUDE.md + pinned context, injected at session start
L3EpisodicLog-levelAppend-only logs maintained by KAIROS daemon
L4SemanticCore knowledgeSolidified facts in memdir/ (autoDream output)
L5ProceduralSkill-levelReusable workflows in skills/ directory
L6ContactRelationshipKnown people and roles across sessions
L7TeamCross-userShared remote state with SHA-256 delta sync + git-leaks protection

autoDream: Background Memory Consolidation

When user is idle for 5+ minutes, KAIROS spawns a background subagent that “dreams”:

Phase 1: Scan    — Extract observations from daily log
Phase 2: Merge   — Combine similar observations, find patterns
Phase 3: Refine  — Remove contradictions against existing semantic memory
Phase 4: Commit  — Convert vague observations → absolute facts

Example: “User keeps editing auth.ts” → “JWT token expiry changed from 1h to 24h”

This is the same pattern as MemGPT’s memory consolidation, but implemented as an idle-triggered subagent rather than an always-on process.

10 Reusable Engineering Patterns

Patterns extracted from Claude Code that apply to any LLM agent product:

#PatternKey Idea
1Permission-gated Tool InterfaceTools self-declare permission level, not the caller
2Startup Parallel PrefetchAll startup IO in Promise.all, heavy modules lazy-loaded
35-level Context CompressionGraceful degradation, not hard failure on context overflow
43 Subagent Execution ModelsFork/Teammate/Worktree — match isolation to task risk
5autoDream Idle ConsolidationBatch memory processing during idle, don’t pollute main context
6Cache-break Vector TrackingActively minimize prompt cache misses for cost control
7Task Budget ManagementPer-subagent token budgets; compress on exceed, don’t error
8Coordinator “Parallel Superpower”System prompt enforces parallelism, bans lazy delegation
9Compile-time Feature FlagsDead code elimination per tier, not runtime if/else
10Frustration DetectionRegex-based emotion detection triggers mode switches

Hidden Features (88 Compile-Time Flags)

Anthropic uses Bun’s dead code elimination to completely remove disabled features at build time. External users get a fundamentally different binary than internal Anthropic employees (USER_TYPE === 'ant').

Internal-Only Features

CodenameDescriptionStatus
KAIROSPersistent daemon mode — proactively monitors workflows and acts without user promptingInternal only
ULTRAPLAN30-minute multi-agent remote planning session — multiple agents collaboratively design complex plansInternal only
ChicagoComputer Use — controls macOS desktop via MCP (mouse, keyboard, screenshots)Internal only
BagelIntegrated browser — full web navigation (not WebFetch, a real browser)Internal only
TeleportRemote session context transfer — “teleport” a session’s state to another machineInternal only
Voice ModeStreaming speech-to-text, microphone inputTesting
BUDDYTamagotchi companion systemPreviewed April, launching May 2026

Flag Architecture

  • 88 compile-time flags: Processed by Bun at build time. Disabled features are completely deleted from the final binary — not runtime toggled, physically absent
  • 700+ runtime flags: Controlled by GrowthBook. Code exists but toggled at runtime for A/B testing and gradual rollout

KAIROS: Proactive Daemon Mode

Unlike standard reactive AI (waits for user input), KAIROS is proactive — it continuously observes, infers, and acts without being asked.

class KairosDaemon {
  private dailyLog = new AppendOnlyLog(`~/.kairos/${today}.log`)
  
  async observe(event: WorkflowEvent) {
    // Append-only: never modify, only add
    this.dailyLog.append({
      timestamp: Date.now(),
      type: event.type,
      context: event.context,
      inference: await this.infer(event)  // Real-time intent inference
    })
  }
  
  async proactiveAct() {
    const pattern = await this.detectPattern(this.dailyLog)
    if (pattern.confidence > THRESHOLD) {
      this.notifyUser(pattern.suggestion)  // Act without being asked
    }
  }
}

The append-only log design ensures history is immutable and provides reliable input for autoDream memory consolidation.

Buddy: Tamagotchi Companion System

A deterministic virtual pet generated per user:

  • Species selection: Mulberry32 PRNG seeded with hash(userId + 'friend-2026-401'). 18 species with rarity tiers (Common → Legendary) + Shiny variants
  • Personality: Claude generates a unique “soul description” at first hatch — this becomes the Buddy’s permanent personality
  • System prompt: Buddy has its own independent system prompt. It’s a “watcher” that sits beside the input box and occasionally comments. When addressed by name, it responds directly (1-2 sentences max)
  • Deterministic: Same user always hatches the same Buddy — reproducible via PRNG, not random

Anti-Distillation System

Two layers preventing competitors from training on Claude Code’s outputs:

Layer 1: Fake Tool Injection

Injects plausible but non-functional tool definitions into outputs when distillation is detected. A competitor training on these outputs would learn to call tools that don’t exist.

Layer 2: Encrypted Signature Summaries

Embeds cryptographically signed metadata in generated summaries. If these appear in a competitor’s model outputs, Anthropic can prove the training data originated from Claude Code.

Undercover Mode

When Claude Code contributes to open-source repositories, it can hide its AI identity:

  • Strips AI-identifying patterns from commit messages and code comments
  • Removes internal codenames and feature flag references
  • Adjusts coding style to appear human-authored
  • Ironically, Anthropic’s own source leak happened despite having this system

Commercial Application Map

PriorityWhat to BuildPattern Source
P0Token budget per tenantTask Budget Management
P0Permission-gated toolsPermission Gate 3-layer
P1Coordinator + Workers replacing linear workflowsCoordinator mode
P1Post-conversation memory consolidationautoDream
P1Startup parallel prefetch (Redis/DB/vector)Parallel Prefetch
P2Frustrated customer detection + human handoffFrustration Detection
P2Tier-based feature flags (Basic/Pro/Enterprise)Compile-time Flags

Sources