Claude 4.1 Opus became the first publicly known frontier model to successfully complete a full 10-hour autonomous software engineering task (building a functional web app from natural language spec with zero human intervention) in Anthropic’s internal red-team evaluation suite.