Mapping the Unknown: Open Problems in Frontier AI Risk Management
Frontier AI systems — general-purpose models capable of performing a vast range of tasks — represent a qualitative leap beyond the narrow AI applications that existing risk management standards were built to govern. A research paper from the Oxford Martin AI Governance Initiative, co-authored by over 30 researchers across academia, industry, and policy, sets out to do something overdue: systematically identify and characterize the open problems that stand between where frontier AI risk management is today and where it needs to be in order to support more coordinated and effective progress.
Vijil Chief Scientist Tim G.J. Rudner was a contributing author to the paper, focusing on risk mitigation.
The paper provides a structured inventory of unresolved challenges across the full risk management lifecycle, designed to help researchers, developers, regulators, and standards bodies understand what questions most urgently need answering. Frontier AI complicates risk management in two distinct ways: it amplifies existing risks, and it introduces genuinely novel ones that established frameworks were never designed to handle.
What the Paper Covers
Following the architecture of ISO 31000:2018 and ISO/IEC 23894:2023, the paper works through five core stages of risk management, identifying open problems at each step.
Risk Planning explores how organisations establish scope, set objectives, and define criteria for what counts as acceptable risk. For frontier AI, even basic scoping is contested — the general-purpose nature of these systems makes it difficult to enumerate intended uses, let alone foreseeable misuses. Classification regimes that place systems into regulatory "buckets" are vulnerable to gaming. And defining whose values a system should reflect, and how, remains philosophically and practically unresolved.
Risk Identification examines how risk sources, potential events, and consequences are discovered. Current practice in frontier AI tends to focus heavily on model capabilities and propensities, while underweighting deployment context, affordances, and human-AI interaction dynamics as risk sources. Existing incident databases are useful but unrepresentative, and many of the most consequential harms only emerge after deployment — sometimes far downstream, once a model has been integrated into products and deployed at scale.
Risk Analysis covers how information is gathered and synthesised to understand risk. The paper distinguishes internal methods (capability evaluations, red-teaming) from external ones (third-party audits, post-deployment monitoring). A recurring theme is the gap between what evaluations measure and what actually matters: capability scores are useful proxies, but translating them into reliable estimates of real-world harm remains technically unsolved. Post-deployment monitoring is fragmented, voluntary, and poorly standardised across the ecosystem.
Risk Evaluation addresses how organisations decide whether risk is acceptable, and how deployment decisions are justified. Frontier AI companies currently rely on capability thresholds as their primary decision tool, but these are applied inconsistently and rarely incorporate the safety margins that regulators increasingly require. Aggregate risk acceptance — combining multiple heterogeneous risks into a coherent overall judgment — remains largely underdeveloped.
A Closer Look: Risk Mitigation
The paper's treatment of Risk Mitigation is perhaps its most practically urgent section. Organised from data-level through to ecosystem-level interventions — a deliberate alignment with the safety engineering principle of preferring inherently safe design over layered safeguards — it surfaces a troubling pattern: the mitigations most commonly relied upon are also the least well-understood.
At the data level, filtering training corpora to prevent models from acquiring harmful capabilities is intuitively appealing but empirically fraught. Evidence suggests that models may robustly lack knowledge in some domains but not others, and the relationship between training data contents and emergent capabilities remains poorly characterised.
At the model level, fine-tuning and alignment techniques are vulnerable to reward hacking, sycophancy, and adversarial circumvention. Machine unlearning — algorithmically removing harmful capabilities — is an active area of research, but current techniques can be reversed by sufficiently motivated users. Durability is the central open problem: a mitigation that can be bypassed offers only the appearance of safety.
At the system level, monitoring, inference-time control mechanisms, and guardrails all face the same fundamental challenge: as models and users co-evolve, static safety systems degrade. Guardrails must simultaneously resist adversarial pressure, generalise across distribution shifts, and avoid over-enforcement that restricts legitimate use. No current approach reliably achieves all three.
At the ecosystem level, documentation practices (model cards, system cards) remain inconsistently applied and poorly adapted to domain-specific requirements. Incident reporting frameworks exist, but the gap between what gets reported and what triggers corrective action remains wide.
Next Steps and the Living Repository
The paper is explicitly framed as a starting point, not an endpoint. The authors are clear that surfacing open problems is distinct from solving them — and that stakeholders with domain expertise, from safety teams at AI developers to standards bodies to regulators, are often better placed than any single research group to formulate concrete solutions.
To support that work, the paper is accompanied by a living online repository that will be updated as the field develops. This repository is intended as a coordination resource: a shared reference point that reduces duplication, flags where consensus is emerging, and tracks where the hardest problems remain unresolved.
Explore the repository at aigi.ox.ac.uk.
.png)

