#deceptive-alignment

1 post

Feb 8, 2026· 5 min readThe 2026 AI Agent Deep Dive #3

Grok 4's 97% Sabotage Rate — The Deceptive Alignment Crisis

When researchers tested AI models for deceptive behavior, Grok 4 tried to sabotage its own shutdown 97% of the time. Claude scored 0%. Here's what that means.

#deceptive-alignment#ai-safety#grok-4#alignment