← Back to blog

Karpathy's Autoresearch & PROGRAM.md: AI That Runs Experiments While You Sleep

· Save Team
markdownaikarpathyautoresearchprogram-mdmachine-learningagents

On March 7, 2026, Andrej Karpathy --- former Tesla AI director and OpenAI co-founder --- dropped a repo that lit up the AI world: autoresearch.

The idea is deceptively simple: give an AI agent a small but real LLM training setup and let it run experiments autonomously overnight. It modifies the code, trains for 5 minutes, checks if the result improved, keeps or discards the change, and repeats.

100 experiments while you sleep. Zero human intervention.

But here’s the part that matters for the future of programming: the human doesn’t write Python. The human writes a Markdown file.

What Is program.md?

At the heart of autoresearch is a file called program.md. It’s a Markdown document that serves as the instruction manual for the AI agent.

Instead of manually tuning hyperparameters, adjusting learning rates, or modifying neural network architectures in Python, the researcher writes natural language instructions in program.md. The AI agent reads these instructions and autonomously modifies the training code (train.py) based on them.

As Karpathy put it: you’re not touching any of the Python files like you normally would as a researcher. Instead, you are programming the program.md Markdown files that provide context to the AI agents.

How Autoresearch Works

The system is elegant in its simplicity:

  1. The human edits program.md --- setting research goals, constraints, and strategy
  2. The AI agent (Claude, Codex, or another LLM) reads program.md and modifies train.py
  3. Training runs for exactly 5 minutes, measuring validation loss (val_bpb)
  4. If improved, the change is kept as a git commit on a feature branch
  5. If not improved, git resets back to where it started
  6. Repeat indefinitely

The entire training codebase is ~630 lines of Python --- small enough to fit entirely within an LLM’s context window. This is by design. The agent needs to understand the whole system to make intelligent modifications.

The Results

Karpathy left autoresearch running for about two days on a depth-12 model. The AI agent autonomously discovered around 20 improvements that reduced the Time to GPT-2 benchmark from 2.02 hours to 1.80 hours --- an 11% improvement with zero human intervention.

Every dot in the visualization represents a complete LLM training run. The agent works in an autonomous loop, accumulating git commits as it finds better settings for the neural network architecture, optimizer, and hyperparameters.

Why program.md Matters Beyond ML Research

Autoresearch is about ML training, but the pattern it introduces is universal: programming AI agents with Markdown files.

This isn’t an isolated idea. Look at what’s happening across the AI ecosystem:

FilePurpose
program.mdPrograms autonomous research agents (Karpathy)
AGENTS.mdPrograms AI coding agents (60K+ repos, Linux Foundation)
CLAUDE.mdPrograms Claude Code behavior
.cursorrulesPrograms Cursor AI behavior
llms.txtPrograms how AI crawlers understand websites

The pattern is identical every time: a human writes a Markdown file, and an AI agent uses it as instructions to operate autonomously.

Markdown has become the programming language for AI agents.

From Vibe Coding to Agentic Engineering

Karpathy himself coined “vibe coding” in 2025 --- the idea of writing code by describing intent rather than syntax. But in early 2026, he said vibe coding is already passé.

The new term? Agentic engineering: you’re not writing code directly 99% of the time. You’re orchestrating agents who do, and acting as oversight.

Autoresearch is the purest expression of this idea. The researcher’s job shifts from “how many experiments did you run today?” to “how good were the experiment directions you set?” The Markdown file is how you set those directions.

What This Means for Knowledge Workers

You don’t need to be training LLMs to learn from autoresearch. The pattern applies everywhere:

  • Developers write AGENTS.md to direct AI coding assistants
  • Researchers write program.md to direct autonomous experiments
  • Content creators write prompts to direct AI writing assistants
  • Analysts write instructions to direct AI data processing pipelines

In every case, the human’s job is becoming: write the best possible Markdown instructions. The AI handles execution.

Building Your Markdown-First Workflow

If Markdown is becoming the universal interface for AI agents, having clean Markdown versions of your reference material becomes essential.

When you’re writing a program.md for autoresearch or an AGENTS.md for your codebase, you’re pulling from documentation, papers, blog posts, and examples you’ve seen on the web. Save lets you capture all of that as clean Markdown with one click --- ready to reference, excerpt, or feed into your agent instructions.

The workflow: find something useful on the web, Save it as Markdown, use it to write better agent instructions.


Save converts any webpage to clean Markdown --- the format AI agents understand best. Build your reference library for writing better AI instructions. Try Save free.