Hide table of contents

Using computational methods to improve our preparedness via more robust and adaptive strategies in AI governance. A project proposal for a think tank, consultancy, or software.

Overview

Over the years, I’ve come across or come up with a number of project ideas in AI safety and governance that I find promising. My top list has less than ten, but in total there are hundreds. Either way, too many for me to realize them all. Instead I want to promote these ideas in the hopes that others will pick them up. This is one of them.

Summary

 

 

Traditionally, understanding the broad strategic considerations in AI safety and governance has received a lot of attention – e.g., distinguishing risks from malicious use, coordination failures (e.g., arms races), accidents, and the AIs themselves; understanding convergent drives; surveying the landscape of x-risks and s-risks.

Over the course of the last 7–9 years, I’ve been delighted to see more interest in modeling AI scenarios, be it to communicate the risks (e.g., Intelligence Rising, Modeling Cooperation), to answer particular research questions (e.g., Vermeer et al., 2025, Modeling Cooperation), or to argue for particular policy solutions (e.g., CAIS’s MAIM). These scenarios have been still mostly on a strategic, perhaps sometimes operational level, and much more illustrative than comprehensive. Mengesha (2026) has made a strong case for improving preparedness and Perry et al. (2019) argue for a more pragmatic approach to the policy-making process that, meanwhile, some think tanks have started to address but that can be further strengthened.

Over the same time period, there has also been a proliferation of probabilistic forecasts, especially of AI timelines (e.g., Grace, 2017, Cotra, 2020, Barnett, 2020, AI 2027) and global catastrophic risks from AI (e.g., Saeri et al., 2026).

What I haven’t seen so far is (1) computational exploratory modeling, and (2) modeling on the operational to tactical level.

There are software frameworks, like the EMA Workbench, that facilitate processes like Robust Decision Making (RDM) that forgo probabilistic estimates in favor of preparedness for a wide variety of scenarios. In dynamical systems it’s often futile to try to predict the 1–3 most likely scenarios and prepare for them extensively, so it’s more sensible to generate thousands of scenarios and to design policies that – through a combination of robustness and adaptability – can perform well almost regardless of what actually comes to pass. The adaptability component is addressed by Dynamic Adaptive Planning (DAP) and process and visualization supports such as Dynamic Adaptive Policy Pathways (DAPP). For a detailed explanation of all three tools and more, I recommend Decision Making Under Deep Uncertainty (2019).

The cheapest way to make progress on this is to create a software model for parts of the complex system that are of general interest for many AI governance think tanks. A more involved but also more promising approach is to create such a model but also create a consultancy that adapts the model for the particular tactical realities at each think tank.

Another highly involved approach is to create a think tank dedicated to applying this strategic approach. This will duplicate much of the work that existing AI governance think tanks are already doing a great job at, so it’s at best a fallback in case the adoption is too low because the existing think tanks are spread thin.

Levels of Intelligence

In military contexts, people often distinguish the strategic, operational, and tactical level of intelligence. In business contexts, the last two are sometimes reversed, but I’ll go with the military version here. Some examples:

  1. Strategic intelligence. Removing the RLHF of open-weights models is easy, so openness exacerbates risks from malicious use. Slow, multipolar takeoffs run AIs into collective prisoner’s dilemmas, which exacerbate the risks of wars between AIs.
  2. Operational intelligence. Machine learning engineers who have loved ones in adversary countries are vulnerable to extortion for espionage, so we should prevent or facilitate that depending on whether espionage has a stabilizing or destabilizing effect. A Russian EMP attack on Central Europe might destroy parts of ASML’s infrastructure and stock, so governments may want to purchase some EUV machines to resell to national companies.
  3. Tactical intelligence. If China starts a siege of Taiwan, our think tank needs to have already built relationships with people at positions X and Y at the NSA and has two days to approach them with our prepared policy A if the incoming government is likely to be Democratic and our prepared policy B if it’s likely to be Republican to maximize the chances that they can adopt it and stay in office.

Policies on the strategic and operational levels have the advantage that they’re useful for a variety of actors while policies on the tactical level will tend to be more bespoke to a particular organization. Exploratory models to test strategic or operational policies benefit from economies of scale to a much greater extent than models to test tactical policies, but LLMs will reduce the software implementation costs, so the time cost for meetings with the clients becomes comparatively greater.

Perry et al. (2019) write “AI governance researchers will need to consider how the political landscape should shape their recommendations or policy proposals. … How would other interest groups react and impact the long-term ability to reduce risk? If administration changes result in a flip-flop of ideology, what does that mean for AI risk policies associated with the past administration? … All of these have implications on our ability to reduce AI risk, and this means that the policymaking strategy will not only have to be robust but also flexible enough to survive changing political conditions.” That is the promise of Robust Decision Making in combination with Dynamic Adaptive Planning.

Exploratory Modeling

I’m one of those people who hotly debated whether humanity will be able to sandbox an AI. Eliezer Yudkowsky’s AI boxing experiments were a strong reason for me to think that we’ll fail. But Superintelligence recommended a defense in depth approach, so it was still controversial in my circles whether perhaps, in practice, these combined safeguards might be enough for a while.

So 2016 had some nasty surprises in store for me because 2016 is the year my circles learned of the founding of OpenAI. A company whose branding proclaimed that it won’t even try. We were not ready for that.

None of us knew what to do about it. It was a total curve ball of a sucker punch. Was this game over for humanity?

I would like our AI governance organizations to never be so taken by surprise by whatever circumstances transpire.

That’s what exploratory modeling is designed to achieve, or approximate.

I’m basing this mostly on the books Decision Making Under Deep Uncertainty (2019) and Shaping the Next One Hundred Years (2003). If you can read only one, read the first. Chapter 15 is a good summary. You can also use my NotebookLM to interact with this content.

AI governance is a space characterized by deep uncertainty, high complexity, and at least a seizable number of policy options, i.e. the complexity is too high for traditional scenario planning if the goal is to at all approximate comprehensive robustness. I argue this point in the Assumptions section.

An Othello analogy to illustrate (though I imagine this will hold for chess):

In Othello, black makes the first move. You play black. So you convene a panel of experts to mathematically determine the most likely sequence of moves that your opponent will play based on historical games. You plan out the whole game.

Then you make your first move. The opponent makes one of the less likely moves. Your preparation is obsolete and you have to improvise.

That’s a caricature of scenario planning.

Having learned from the experience, you convene a panel of RDM experts in preparation for the next game. You brainstorm policies, such as always playing the move that turns most pieces or maximizing mobility. You test both strategies on a few billion games and find that the first is abysmal whereas the second does alright some of the time. You classify what “some of the time” means and find that it starts to perform badly in the endgame and some other situations.

Now you draw on DAP where you signpost the situations where it performs badly but already give the go-ahead for the next game where you’ll start with the mobility strategy. Meanwhile your team tries to figure out how you should respond when any of the signposted situations come up.

You lose anyway, but this time you feel more dignified losing.

That’s an example of how RDM and DAP are used in combination. RDM does the heavy lifting of simulating all the ways in which the world might develop, including all the non-linear effects you encounter in dynamical systems. It also provides such tools as the Patient Rule Induction Method to isolate clusters of scenarios where the policy fails. DAP is just a planning method where you keep your policy adaptive without getting forever blocked on every last contingency you might want to prepare for.

This combination of general robustness and more specific preparedness is critical to the policy-making process as noted by Perry et al. (2019):

Problem identification, agenda setting, and policy formulation are usually tied together, including in a so-called “multiple streams framework.” The multiple streams framework attempts to explain how policies reach the agenda when policy entrepreneurs are able to couple the policy, politics, and problems streams to open up a policy window, the opportune time when all the conditions are right to get a policy on the agenda.

When the policy windows are sudden and brief, broad preparedness shines; when they are predictable and long, it’s more efficient to react only if and when they open.

Theory of Change

I’ll elucidate here what the theory of change would look like for the software-only approach and the consultancy approach. Falling back on starting one’s own think tank is the sort of project where exploratory modeling would make up a small part of the theory of change, so I won’t address it here.

Software-Only

Here the upfront and maintenance costs are limited. The developer who starts it can move on to other projects and just update it once a year, or can hand over the maintenance to early adopters and limit themselves to approving pull requests. The real hurdle is to find these early adopters, so the software should be designed such as to make it as easy as possible for policy analysts to get up to speed on the usage of the software.

Inputs. We need a founder who has experience in AI governance and computational modeling, a software framework like the EMA Workbench, and compute.

Activities. The founder creates the exploratory model that can generate tens of thousands of future scenarios and a website and documentation tailored towards policy analysts.

Outputs. The open-source model, example scenarios, examples of how to identify clusters of policy failures among the scenarios, the documentation, and perhaps dashboards to present the results to executives within a policy think tank.

Outcomes. Think tanks fork the repository, modify and extend it for their use case and perspective, and test their overall strategy and specific policy proposals for black swans and smaller avoidable failures.

Impact. AI governance think tanks, instead of being prepared for just a few ostensibly likely scenarios, are prepared for tens of thousands of scenarios (1) because they know in advance upon what contingencies they need to pivot and have prepared for them, and (2) because their existing strategies and policies are robust to a wide range of geopolitical and technological shocks.

Consultancy

Setting up a consultancy takes extra upfront effort in addition to those of the software-only approach. In turn it gives the founders more control over the adoption of their technology. They interface with the think tanks personally and can collect invaluable information on what the most pressing needs are and how to best communicate the results. They can also take over all of the custom implementation work, greatly lowering the technological barriers for the think tanks.

Inputs. We need a founder who has experience in AI governance and computational modeling, a software framework like the EMA Workbench, and compute. The same or a second founder needs to be a good communicator with practice facilitating workshops and communicating insights verbally and graphically. A third cofounder might be needed for the administrative side of the consultancy.

Activities. The founders shop around for clients first. If they find any, they prepare the base model and have meetings or workshops with the clients to understand the idiosyncratic details of their situations. They design a bespoke fork of the base model for each client, run simulations, and iterate with the client on improved versions of the client’s strategy.

Outputs. Bespoke models. Visualizations like Dynamic Adaptive Policy Pathways. Dashboards to monitor for predefined signposts of geopolitical or technological shocks. Plans for how to respond to these shocks, just in time or prepared in advance.

Outcomes. Think tanks become resilient to a wide range of geopolitical and technological shocks and know in advance how to respond to others within a day of a signpost triggering.

Impact. Think tanks can continually build on their previous work because very few shocks can still make it obsolete. They become efficient routing engines for policy proposals because they have all the potentially relevant bills, presentations, and contacts ready in advance.

Impact

I use my own subdivided version of the SPC framework in the hope that more estimates will allow for more errors to cancel out.

Significance

Scale. ⭐⭐⭐⭐⭐ – As an impact multiplier for AI governance, it gets a high rating for scale from me.

Influence. ⭐⭐⭐ – I’m excited about these technologies, but I find it plausible that a multiplicity of think tanks, all with somewhat uncorrelated plans, can muddle through without it because some might just have happened to have prepared for each geopolitical or technological shock and can pick up the slack for the others until they recover.

Persistence

Endogenous. ⭐⭐⭐ – Consultancies are known for having a high staff churn rate, so whatever reasons are responsible for that in the industry might also threaten the survival of our consultancy, in which case the software-only solution could serve as a fallback.

Exogenous. ⭐⭐ – I can easily imagine that most think tanks won’t have the capacity to hone their strategies like this or that it’ll be difficult to get in touch with them in the first place to get them interested in the solution. This is a major risk factor that should be minimized before the launch.

Contingency

Tractability. ⭐⭐⭐⭐⭐ – There should be no major hurdles to applying a tried and tested method to a new field.

Neglectedness. ⭐⭐⭐⭐⭐ – I’m not aware of anyone doing this for AI governance at the moment.

Assumptions

Bandwidth

It’s critical to clarify in advance whether the relevant think tanks will have the capacity to engage with the new method.

Funding

This work is related to the work of Modeling Cooperation and QURI, so their funding situation is a guide to how much funding might be available for this project.

Signpost Visibility

Dynamic Adaptive Planning requires the definition of signposts that can be observed and then trigger prepared plan changes. A common hurdle is to find signposts that strike a good balance between sensitivity and specificity while still triggering early enough that there is enough time to react. In areas where all parties are incentivized to keep intel highly classified for as long as possible, signposts may only trigger right when it is time to react, leaving virtually no time for preparations. That requires preparedness for a wide range of scenarios, most of which will never come to pass.

It’s worth investigating what balance can be struck between early-warning signposts and expensive preparations. With time, expensive preparations will become more and more affordable for growing think tanks, so it’s also a question of timing.

Complexity

It seems to me that the complexity of AI governance is too high for traditional scenario planning, but that is an assumption worth testing. Here an example of a tiny snapshot of all the interacting variables of the combinatorial explosion of relevant scenarios.

Geopolitical shocks. A small selection of sudden exogenous events that have a massive influence on the strategic landscape.

  1. China initiates a naval blockade of Taiwan to halt chip exports.
  2. It initiates a naval blockade of Taiwan to block imports of resources needed for the chip production.
  3. It launches a kinetic invasion to seize control of TSMC.

Supply chain. What happens to the physical infrastructure in response.

  1. TSMC successfully destroys its factories to prevent capture.
  2. It enlists the military in the defense of the factories to keep them running.
  3. Sabotage or rapid seizure leave the fabs intact.
  4. Production halts because TSMC employees are ordered to stay home to not get caught in the crossfire or controlled demolition.

US government institutions. What has the US government done to prepare?

  1. The US government procured fabs and helped companies build domestic capacity for chip production.
  2. It has stockpiled TSMC chips.
  3. It has extended blanket green cards to TSMC engineers.
  4. Are operations for smuggling resources in and out of Taiwan handled by the DoD, the NSA, some new task force, etc.?
  5. Can the government be reasoned with based on the survival of the species, that of the country, or only via each director’s need to ingratiate themselves with the president?

International factors. How other governments positioned themselves.

  1. Can the Dutch government at the time be convinced to implement a protectionist US-led regime to control exports of ASML fabs?
  2. Or is the Dutch government at the time too laissez-faire for that?
  3. Can such a thing be done on an EU or NATO level?

Negotiation leverage. Finally, given all these factors, more questions remain when it comes to what directions to push the situation in to make it safer.

  1. If China is falling behind and wants to maximize its leverage in arms control negotiations with the US, maybe the leverage is actually necessary to force the US to the negotiating table, which would be good?
  2. If China is not interested in negotiations, more power is probably bad?
  3. If the US is comfortably ahead, more lead is good if it’s used for safety and coordination but bad if it’s used for imperialism.
  4. If both powers are close, it’s good if it incentivizes negotiations and abysmal if it leads to war or exacerbated racing.
  5. If recursive self-improvement gives one ASI an enormous lead, or if all AIs are at a similar level, we’re closer to x-risk territory, but if some ASIs have a substantial but not absolutely decisive lead over other ASIs, we’re in s-risk territory.

That’s just a small, illustrative sketch of a part of the landscape, but even so the sheer number of combinations of factors is staggering and, it seems to me, impossible to handle through traditional scenario planning.

Backfire Risks

I’m taking inspiration here from this discussion.

  1. If several think tanks base their strategies on modified versions of the same base model, the natural decorrelation of their failures suffers that would normally happen when they don’t communicate much. Failures that are not captured by the model may become more correlated.
  2. AI companies can exploit the open source version of the model to predict the behavior and outmaneuver the AI safety think tanks – e.g., get to all likely government and industry contacts first and preemptively smear the think tanks.
  3. Subtle flaws in the implementation might lead to bad recommendations. It seems unlikely to me that they can be catastrophically bad without looking suspicious to the policy experts.

I think the first risk is outweighed by the benefits, but it can also be addressed by explicit coordination between the think tanks. The second risk pushes for not open-sourcing the software in the consultancy model, but it seems premature to worry about this now that such targeted efforts are still vastly less sophisticated (e.g., the case of Alex Bores). The third risk seems far-fetched to me, given how human-centric the whole system still is despite its computational aids.

Talent

In my experience, having three cofounders is a sweet spot that strikes a good balance between the resilience of the team and the coordination overhead that increases with the number of founders.

Here the three founder personas that I think should run the consultancy:

  1. The data scientist. Experience in data science, knack for math, can quickly get up to speed on EMA Workbench or similar frameworks, and ideally already has experience in policy analysis.
  2. The communicator. Experienced workshop facilitator, communication and didactic skill, strong grasp of organizational psychology, and also ideally already has experience in policy analysis.
  3. The administrator. Experience in accounting, grant writing or VC fundraising, hiring, relevant areas of law, but experience with consultancies is more useful than experience with policy analysis.

A forthcoming project proposal is for a matching engine for cofounders that I think should be funded and built. It would streamline this process. Meanwhile you can use the comment section to coordinate.

Call to Action

Exploratory modeling, especially on the operational and tactical levels, is still a blind spot in the AI governance space that it would be invaluable to fill. Anyone who wants to pick up this idea can use the comment section to coordinate. But please test thoroughly that the assumptions it’s based on actually hold – in particular that there is a readiness among think tanks to adopt the system.

7

0
0

Reactions

0
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities