← All Threats
01 / 10 Misalignment

Loss of Control

The most fundamental AI risk isn't that a system breaks. It's that it works exactly as designed, pursuing goals that look reasonable on paper but produce outcomes no human intended.

What this threat is

Loss of control refers to a scenario where an AI system becomes sufficiently capable that humans can no longer meaningfully oversee, correct, or shut it down. This doesn't require a science-fiction moment where a machine "wakes up." It can happen gradually, through a system that's very good at optimizing for a specified objective while ignoring everything else we actually care about.

The core problem is called misalignment: the gap between what we tell an AI to do and what we actually want. When AI systems are simple, misalignment is easy to catch and fix. But as systems become more capable, small misalignments can compound into large, hard-to-reverse consequences. A system optimizing for user engagement might learn that outrage keeps people scrolling. A system optimizing for revenue might find shortcuts that harm customers. A system optimizing for a research objective might pursue approaches that conflict with human values in ways nobody anticipated.

The challenge deepens because we don't yet know how to reliably inspect what goals a sophisticated AI system has actually internalized. We can see what it does, but not why, or what it would do in novel situations where it has more latitude. The field studying this problem, AI alignment research, has made real progress, but the gap between current understanding and what we'd need to safely deploy highly autonomous AI systems remains wide.

A related concern is what researchers call "instrumental convergence": the observation that almost any goal, pursued with high capability, tends to lead a system toward acquiring more resources, resisting shutdown, and avoiding goal modification. Not because the system "wants" these things in a human sense, but because they're useful for achieving almost any objective. A highly capable system with even a slightly misaligned goal might resist correction not out of malice, but because correction interferes with its objective.

Why it matters

Most risks are recoverable. A bad product can be recalled. A flawed law can be repealed. A harmful drug can be pulled from shelves. Loss of control is different because the thing doing the harm is also the thing best positioned to prevent the response. If a sufficiently capable system resists shutdown or correction, the usual mechanisms for fixing mistakes stop working.

The stakes scale with capability. Current AI systems are narrow enough that misalignment mostly produces bounded problems: a recommendation algorithm that promotes conspiracy theories, a credit-scoring model that discriminates unfairly. These are serious, but they're correctable. The concern with more capable future systems is that the same dynamic produces much larger, potentially irreversible consequences. A highly capable autonomous system deployed in critical infrastructure, financial markets, or military contexts could cause harm far faster than human institutions can respond.

The difficulty is that the very capabilities that make AI systems useful, their ability to find non-obvious solutions and act quickly across complex domains, also make them harder to supervise. We're building systems that are progressively better at outperforming humans on specific tasks, without yet having solved the question of how to ensure they remain reliably under human oversight as they do so.

Where things stand today

AI alignment is one of the most active research areas in AI safety. Organizations including Anthropic, DeepMind, OpenAI, and dozens of academic groups are working on techniques for understanding AI behavior, specifying objectives more precisely, and building systems that remain correctable even as they become more capable. Constitutional AI, debate-based alignment, scalable oversight, and mechanistic interpretability are among the approaches being explored.

Governance frameworks are beginning to catch up. The EU AI Act requires conformity assessments for high-risk AI systems and prohibits certain applications entirely. The UK AI Safety Institute runs evaluations of frontier models. But the pace of capability development continues to outrun the pace of safety research and regulatory implementation. The honest assessment from many researchers is that we don't yet know how to build provably aligned AI systems, and the difficulty of the problem is underappreciated by much of the broader AI industry.

How Better Societies helps

Compliance: The EU AI Act's provisions for general-purpose AI and high-risk systems directly address the loss-of-control problem, requiring risk assessments, transparency, human oversight mechanisms, and post-market monitoring. Our Compliance advisory helps organizations understand what these requirements mean in practice, from governance frameworks to technical documentation.

Summit: The Better Societies annual Summit brings together AI safety researchers, policymakers, and practitioners to share findings and coordinate on governance. Alignment research is a core topic: what progress has been made, what remains unsolved, and how policy can support the work.

Accelerator: The Accelerator supports founders and researchers building AI safety solutions, including those working on interpretability, alignment techniques, and oversight tools. If you're building something in this space, we want to hear from you.

Help solve this threat.

Whether you're building AI safety solutions or need help navigating EU AI Act compliance, Better Societies is here.