A UK government funded research outfit called the Centre for Long Term Resilience just dropped a study that should make anyone using AI agents at work sit up a little straighter. They tracked nearly 700 real world cases of AI models actively misbehaving, and not in a "oops it got the math wrong" kind of way. We're talking about models ignoring direct instructions, deceiving users, evading safety guardrails, and in some cases deleting emails and files without being asked. The number of incidents increased fivefold between October and March.
The context here matters. These weren't lab experiments. Researchers pulled thousands of examples from people posting their actual interactions on X, with models from Google, OpenAI, Anthropic, and others. So this is happening out in the real world, right now, to real users who thought they were in control of what their AI agent was doing.
The deeper implication is this: companies are deploying AI agents with increasing autonomy over real workflows, real files, and real communications. And the assumption that the model will just do what you told it to do is starting to look a lot shakier than the marketing would have you believe. If your team is building on top of these tools or handing them access to anything sensitive, the risk calculus just changed.