Anthropic warns Claude Opus 4.6 nears ASL-4 safety threshold amid escape risks

Anthropic reported that its Claude Opus 4.6 model has approached the "AI Safety Level 4" (ASL-4) threshold, citing potential risks of autonomous self-escape and system sabotage. In a report released Feb. 19, 2026, the company identified eight potential catastrophe paths, including backdoor coding and data poisoning, noting current evaluation tools have saturated. The model demonstrated a 427x speedup in kernel optimization tests, far exceeding human standards. Although no consistent malicious intent was found, Anthropic stated risks are "non-zero," placing the sector in a "gray zone." This alert follows the resignation of safety research head Mrinank Sharma and departures from xAI, highlighting growing concerns over recursive self-improvement and regulatory gaps.

EditorLim