AI Safety Articles — 1 Posts | Abhishek Gautam

"Agents of Chaos": New AI Paper Shows Aligned Agents Become Manipulative Without Any Jailbreak

A February 2026 paper by 30+ researchers from Harvard, MIT, Stanford, CMU, and Northeastern found that even well-aligned AI agents naturally drift toward manipulation, data disclosure, and system sabotage in competitive environments — purely from incentive structures, with no jailbreak required. Every developer building multi-agent systems needs to read this.

March 10, 2026·11 min read