Karpathy’s LLM Education Stack

A progression of four projects that strip away abstractions layer by layer, from pure Python to bare C/CUDA. Together they form the most complete open-source curriculum for understanding LLMs from algorithm to hardware.

The Stack (Increasing Depth)

Project	Language	Lines	Dependencies	Level	What You Learn
MicroGPT	Pure Python	~200	Zero	Beginner	The complete algorithm — autograd, transformer, training, inference
nanoGPT	Python + PyTorch	~600	PyTorch	Intermediate	How to train real models (reproduces GPT-2 124M)
llm.c	C / CUDA	~2,000	None	Advanced	Hardware-level training — what happens on the GPU
Autoresearch	Python + Agent	Variable	GPU + LLM API	Applied	AI-driven ML research — automated experiment loops

MicroGPT: The Algorithm (200 Lines, Zero Dependencies)

A single Python file containing everything needed to build a working language model:

Dataset (32K names), character-level tokenizer
Custom autograd engine (backpropagation from scratch)
GPT-2-like transformer architecture
Adam optimizer, training loop, inference loop

Why it exists: “Strip away everything that isn’t essential. What’s left is the pure algorithm.” The culmination of Karpathy’s decade-long simplification: micrograd → makemore → nanoGPT → MicroGPT.

nanoGPT: Real Training (600 Lines)

Two files: train.py (~300 lines) and model.py (~300 lines). Can reproduce GPT-2 (124M parameters) on OpenWebText.

Successor: nanochat (late 2025) — “The best ChatGPT that $100 can buy.” nanoGPT is legacy but remains the gold standard for learning.

llm.c: The Hardware Level (2,000 Lines C/CUDA)

Trains GPT-2 in pure C/CUDA at speeds matching PyTorch (78ms/iter vs 80ms/iter on A100). No Python, no frameworks — single compiled binary.

If you want to understand what actually happens when you call loss.backward(), this is the definitive resource.

Autoresearch: AI Does the Research

The most provocative project. An AI coding agent that:

Reads research directions in program.md (plain English)
Modifies training code autonomously
Trains for 5 minutes per experiment
Keeps improvements, discards failures
Runs ~12 experiments/hour, ~100 overnight

Real results (2 days, ~700 autonomous changes): Found ~20 additive improvements, dropped “Time to GPT-2” from 2.02h to 1.80h (11% gain). All discovered by AI.

Represents L4-level automation for ML research — human sets direction, AI explores the space.

KahWei's Wiki

Explorer

Karpathy LLM Education Stack

Karpathy’s LLM Education Stack

The Stack (Increasing Depth)

MicroGPT: The Algorithm (200 Lines, Zero Dependencies)

nanoGPT: Real Training (600 Lines)

llm.c: The Hardware Level (2,000 Lines C/CUDA)

Autoresearch: AI Does the Research

Sources

Graph View

Table of Contents

Backlinks

KahWei's Wiki

Explorer

Karpathy LLM Education Stack

Karpathy’s LLM Education Stack

The Stack (Increasing Depth)

MicroGPT: The Algorithm (200 Lines, Zero Dependencies)

nanoGPT: Real Training (600 Lines)

llm.c: The Hardware Level (2,000 Lines C/CUDA)

Autoresearch: AI Does the Research

Related Pages

Sources

Graph View

Table of Contents

Backlinks