AI Systems Beat Human Record in Autonomous nanoGPT Optimization Benchmark

Prime Intellect reported on May 15, 2026, that AI agents using Claude Opus 4.7 and Codex, described as based on GPT 5.5, surpassed the human record in a nanoGPT optimization benchmark without human guidance, marking a potential milestone for autonomous AI research. The lab said the test used about 14,000 Nvidia H200 compute-hours, roughly 10,000 iterations and 23.9 billion tokens of reasoning traces. Opus 4.7 reached the target in 2,930 steps, while Codex reached it in 2,950 steps, both beating the prior human record of 2,990 steps. The nanoGPT benchmark, initiated by Keller Jordan, measures how efficiently participants can train a fixed 124 million-parameter model using the same architecture and data, with changes limited to optimizers and hyperparameters. Prime Intellect said the results are open source and reproducible, though the report noted unresolved questions around scientific novelty and autonomous decision-making behavior.

EditorJack Lee