630 Lines of Python. Infinite Research Possibilities.

AutoResearch is an AI Agent framework that autonomously optimizes ML training code.
Built by Andrej Karpathy, it achieved 11% performance improvement through 700 automated experiments in 2 days.

View on GitHub

One GPU, One File, One Metric

What is AutoResearch

A minimalist AI Agent framework for autonomous machine learning research. One Python file, one GPU, and a 5-minute timer—that's all you need to let AI agents optimize your training code overnight.

Core Features of AutoResearch

A revolutionary approach to automated ML research that constrains the environment so AI agents can work reliably.

Minimalist Architecture

Just 3 files: train.py (630 lines, editable), prepare.py (immutable infrastructure), and program.md (agent instructions).

5-Minute Experiment Loop

Fixed time budget ensures fair comparisons. Agents run ~12 experiments per hour, automatically keeping improvements and rolling back failures.

Git as Memory

Git history stores only successful improvements, creating a ratchet of monotonic progress. Complete experiment log in results.tsv.

Single Metric Optimization

Optimizes bits-per-byte (BPB) on a fixed validation set. Simple, comparable, and vocabulary-size independent.

Proven Results

Extracted 11% performance gain from already-tuned GPT-2 code through 700 experiments. Found 20 stackable improvements in 2 days.

Agent-Friendly Design

Markdown-based programming for research direction. Works with Claude Code, Codex, and other AI coding assistants.