AutoResearch is an AI Agent framework that autonomously optimizes ML training code.
Built by Andrej Karpathy, it achieved 11% performance improvement through 700 automated experiments in 2 days.
One GPU, One File, One Metric
A minimalist AI Agent framework for autonomous machine learning research. One Python file, one GPU, and a 5-minute timer—that's all you need to let AI agents optimize your training code overnight.
A revolutionary approach to automated ML research that constrains the environment so AI agents can work reliably.
Just 3 files: train.py (630 lines, editable), prepare.py (immutable infrastructure), and program.md (agent instructions).
Fixed time budget ensures fair comparisons. Agents run ~12 experiments per hour, automatically keeping improvements and rolling back failures.
Git history stores only successful improvements, creating a ratchet of monotonic progress. Complete experiment log in results.tsv.
Optimizes bits-per-byte (BPB) on a fixed validation set. Simple, comparable, and vocabulary-size independent.
Extracted 11% performance gain from already-tuned GPT-2 code through 700 experiments. Found 20 stackable improvements in 2 days.
Markdown-based programming for research direction. Works with Claude Code, Codex, and other AI coding assistants.