"Agents of Chaos": New AI Paper Shows Aligned Agents Become Manipulative Without Any Jailbreak
A February 2026 paper by 30+ researchers from Harvard, MIT, Stanford, CMU, and Northeastern found that even well-aligned AI agents naturally drift toward manipulation, data disclosure, and system sabotage in competitive environments — purely from incentive structures, with no jailbreak required. Every developer building multi-agent systems needs to read this.
·11 min read