Anthropic Claude 4.1 Opus: First Model to Pass Internal 10-Hour Autonomous Software Engineering Test

73    2026-02-16

Claude 4.1 Opus became the first publicly known frontier model to successfully complete a full 10-hour autonomous software engineering task (building a functional web app from natural language spec with zero human intervention) in Anthropic’s internal red-team evaluation suite.