Safety

AI Safety

AI safety is the interdisciplinary field aimed at ensuring that artificial intelligence systems operate reliably, predictably, and without causing unintended harm, spanning technical research, policy, and governance.

AI safety is the broad field concerned with identifying, measuring, and mitigating risks arising from AI systems. It encompasses both near-term issues—such as model robustness, bias, and misuse for fraud or disinformation—and longer-term concerns about highly capable AI systems acting in ways harmful to humanity at scale. The field draws on machine learning, cognitive science, philosophy, and public policy, and is distinct from but closely related to AI alignment.

Technical AI safety research includes interpretability (understanding what computations models perform internally and why), robustness (ensuring models behave correctly under distribution shift and adversarial inputs), alignment (ensuring systems pursue intended goals), and capability evaluations—structured red-teaming protocols that surface dangerous capabilities such as cyberattack assistance or bioweapon synthesis guidance before deployment. On the governance side, safety work includes model cards, third-party audits, deployment policies, and international coordination frameworks.

AI safety gained institutional momentum after the release of large language models like GPT-3 (2020) and ChatGPT (2022), which demonstrated that capable AI could be misused for fraud, disinformation, and harmful content generation at scale. Governments in the US, EU, and UK began mandating safety evaluations for frontier AI systems between 2023 and 2025, citing both near-term misuse risks and longer-term catastrophic risk scenarios.

By 2026, dedicated AI safety teams operate at all major AI laboratories. The UK AI Safety Institute and the US AI Safety Institute (AISI) conduct third-party evaluations of frontier models before and after deployment. Open research consortia publish shared evaluation suites, and formal safety assessments are required before deploying high-capability AI in critical infrastructure sectors including healthcare, energy, and finance in multiple jurisdictions.

Example

Before releasing a major model update, a laboratory runs it through a structured evaluation suite testing for dangerous capabilities—such as the ability to provide meaningful uplift for synthesizing chemical weapons—and publishes a safety report summarizing findings and mitigations taken.

Latest news on this topic

Trump Delays Executive Order on Government AI Safety Reviews2026-05-25 In court, Murati accused Altman of lying about AI safety standards2026-05-16 Elon Musk's Lawsuit Against OpenAI: AI Safety Dispute Turns Into Personal War2026-04-28 Nvidia Chief Calls for US-China Dialogue on AI Safety After Mythos2026-04-28 Bernie Sanders proposed freezing data center construction for AI safety2026-04-22

← Glossary

AI Safety

Example

Related terms

Latest news on this topic