Karpathy's Autoresearch & PROGRAM.md: AI That Runs Experiments While You Sleep
On March 7, 2026, Andrej Karpathy --- former Tesla AI director and OpenAI co-founder --- dropped a repo that lit up the AI world: autoresearch.
The idea is deceptively simple: give an AI agent a small but real LLM training setup and let it run experiments autonomously overnight. It modifies the code, trains for 5 minutes, checks if the result improved, keeps or discards the change, and repeats.
100 experiments while you sleep. Zero human intervention.
But here’s the part that matters for the future of programming: the human doesn’t write Python. The human writes a Markdown file.
What Is program.md?
At the heart of autoresearch is a file called program.md. It’s a Markdown document that serves as the instruction manual for the AI agent.
Instead of manually tuning hyperparameters, adjusting learning rates, or modifying neural network architectures in Python, the researcher writes natural language instructions in program.md. The AI agent reads these instructions and autonomously modifies the training code (train.py) based on them.
As Karpathy put it: you’re not touching any of the Python files like you normally would as a researcher. Instead, you are programming the program.md Markdown files that provide context to the AI agents.
How Autoresearch Works
The system is elegant in its simplicity:
- The human edits
program.md--- setting research goals, constraints, and strategy - The AI agent (Claude, Codex, or another LLM) reads
program.mdand modifiestrain.py - Training runs for exactly 5 minutes, measuring validation loss (val_bpb)
- If improved, the change is kept as a git commit on a feature branch
- If not improved, git resets back to where it started
- Repeat indefinitely
The entire training codebase is ~630 lines of Python --- small enough to fit entirely within an LLM’s context window. This is by design. The agent needs to understand the whole system to make intelligent modifications.
The Results
Karpathy left autoresearch running for about two days on a depth-12 model. The AI agent autonomously discovered around 20 improvements that reduced the Time to GPT-2 benchmark from 2.02 hours to 1.80 hours --- an 11% improvement with zero human intervention.
Every dot in the visualization represents a complete LLM training run. The agent works in an autonomous loop, accumulating git commits as it finds better settings for the neural network architecture, optimizer, and hyperparameters.
Why program.md Matters Beyond ML Research
Autoresearch is about ML training, but the pattern it introduces is universal: programming AI agents with Markdown files.
This isn’t an isolated idea. Look at what’s happening across the AI ecosystem:
| File | Purpose |
|---|---|
program.md | Programs autonomous research agents (Karpathy) |
AGENTS.md | Programs AI coding agents (60K+ repos, Linux Foundation) |
CLAUDE.md | Programs Claude Code behavior |
.cursorrules | Programs Cursor AI behavior |
llms.txt | Programs how AI crawlers understand websites |
The pattern is identical every time: a human writes a Markdown file, and an AI agent uses it as instructions to operate autonomously.
Markdown has become the programming language for AI agents.
From Vibe Coding to Agentic Engineering
Karpathy himself coined “vibe coding” in 2025 --- the idea of writing code by describing intent rather than syntax. But in early 2026, he said vibe coding is already passé.
The new term? Agentic engineering: you’re not writing code directly 99% of the time. You’re orchestrating agents who do, and acting as oversight.
Autoresearch is the purest expression of this idea. The researcher’s job shifts from “how many experiments did you run today?” to “how good were the experiment directions you set?” The Markdown file is how you set those directions.
What This Means for Knowledge Workers
You don’t need to be training LLMs to learn from autoresearch. The pattern applies everywhere:
- Developers write AGENTS.md to direct AI coding assistants
- Researchers write program.md to direct autonomous experiments
- Content creators write prompts to direct AI writing assistants
- Analysts write instructions to direct AI data processing pipelines
In every case, the human’s job is becoming: write the best possible Markdown instructions. The AI handles execution.
Building Your Markdown-First Workflow
If Markdown is becoming the universal interface for AI agents, having clean Markdown versions of your reference material becomes essential.
When you’re writing a program.md for autoresearch or an AGENTS.md for your codebase, you’re pulling from documentation, papers, blog posts, and examples you’ve seen on the web. Save lets you capture all of that as clean Markdown with one click --- ready to reference, excerpt, or feed into your agent instructions.
The workflow: find something useful on the web, Save it as Markdown, use it to write better agent instructions.
Save converts any webpage to clean Markdown --- the format AI agents understand best. Build your reference library for writing better AI instructions. Try Save free.