Frontier Model
A frontier model is an AI system operating at the current limits of machine learning capability, typically trained with the largest known compute budgets. The term is used in AI policy to identify systems whose capabilities warrant the closest safety scrutiny.
Frontier models are the most capable AI systems available at a given time, defined not by a fixed benchmark score but by their position at the leading edge of what is technically achievable. The term is used heavily in AI safety and governance contexts—by the UK AI Safety Institute, the US AI Safety Institute at NIST, and in the EU AI Act—to distinguish systems whose potential capabilities and risks require dedicated evaluation before public deployment. As of 2026, frontier model developers include Anthropic, OpenAI, Google DeepMind, Meta, xAI, and a small number of well-resourced Chinese laboratories including Zhipu AI and DeepSeek.
Frontier models are characterized by exceptional training infrastructure: typically tens of thousands of specialized accelerators (H100 or successor GPUs, TPUs) running for weeks to months, at per-run costs commonly estimated in the tens to hundreds of millions of dollars. They incorporate the latest algorithmic advances—improved attention mechanisms, large-scale synthetic data pipelines, and reinforcement learning from human and AI feedback (RLHF and RLAIF). The 'frontier' designation migrates over time as new techniques push the boundary; a system that represented the frontier in 2023 may be considered mid-tier by 2025.
The policy relevance of frontier models stems from the possibility that they develop emergent capabilities with significant societal impact or enable misuse in high-consequence domains such as biological design, cyberattack automation, or large-scale disinformation. Voluntary commitments made at the 2023 UK AI Safety Summit and subsequent legislative frameworks have begun requiring frontier model developers to conduct structured safety evaluations before deployment and to share findings with government bodies. Anthropic's Responsible Scaling Policy and similar documents from other labs tie deployment decisions to specific capability thresholds.
By 2026, frontier model evaluation has become a distinct practice area. The UK and US AI Safety Institutes conduct pre-deployment testing of models from leading developers, using structured 'evals' to measure dangerous capability thresholds in areas such as uplift for weapons development or autonomous cyberattack capability. The compute threshold most often cited in policy documents—approximately 10²⁶ floating-point operations for a training run—serves as a regulatory proxy for identifying frontier-class systems, though capability-based definitions are increasingly preferred over compute-only proxies.