4.1× faster inference than equivalent Transformer models at 256k context, while matching or exceeding pass@1 on HumanEval+, MultiPL-E, and BigCodeBench-hard.
o4-proto shows closed-loop self-improvement: it writes code → runs tests → reads failures → rewrites improved version → repeats autonomously for up to 50 cycles