Applied ResearcheratFounders Pledge

I am an Applied Researcher at Founders Pledge, where I work on global catastrophic risks. Previously, I was the program manager for Perry World House's research program on The Future of the Global Order: Power, Technology, and Governance. I'm interested in the national security implications of AI, cyber norms, nuclear risks, space policy, probabilistic forecasting and its applications, histories of science, and all things EA. Please feel free to reach out to me with questions or just to connect!

Topic Contributions


Building a Better Doomsday Clock

Thank you! I also really struggle with the clock metaphor. It seems to have just gotten locked in as the Bulletin took off in the early Cold War. The time bomb is a great suggestion — it communicates the idea much better

Risks from Autonomous Weapon Systems and Military AI

Thanks for engaging so closely with the report! I really appreciate this comment.

Agreed on the weapon speed vs. decision speed distinction — the physical limits to the speed of war are real. I do think, however, that flash wars can make non-flash wars more likely (eg cyber flash war unintentionally intrudes on NC3 system components, that gets misinterpreted as preparation for a first strike, etc.). I should have probably spelled that out more clearly in the report.

I think we actually agree on the broader point — it is possible to leverage autonomous systems and AI to make the world safer, to lengthen decision-making windows, to make early warning and decision-support systems more reliable.

But I don’t think that’s a given. It depends on good choices. The key questions for us are therefore: How do we shape the future adoption of these systems to make sure that’s the world we’re in? How can we trust that our adversaries are doing the same thing? How can we make sure that our confidence in some of these systems is well-calibrated to their capabilities? That’s partly why a ban probably isn’t the right framing.

I also think this exchange illustrates why we need more research on the strategic stability questions.

Thanks again for the comment!

Risks from Autonomous Weapon Systems and Military AI

Hi Kevin,

Thank you for your comment and thanks for reading :)

The key question for us is not “what is autonomy?” — that’s bogged down the UN debates for years — but rather “what are the systemic risks of certain military AI applications, including a spectrum of autonomous capabilities?” I think many systems around today are better thought of as closer to “automated” than truly “autonomous,” as I mention in the report, but again, I think that binary distinctions like that are less salient than many people think. What we care about is the multi-dimensional problem of more and more autonomy in more and more systems, and how that can destabilize the international system.

I agree with your point that it’s a tricky definitional problem. In point 3 under the section on the “Killer Robot Ban” in the report, one of the key issues there is “The line between autonomous and automated systems is blurry.” I think you’re pointing to a key problem with how people often think about this issue.

I’m sorry I won’t be able to give a satisfying answer about “ethical norms” as it’s a bit outside the purview of the report, which focuses more on strategic stability and GCRs. (I will say that I think the idea of “human in the loop” is not the solution it’s often made out to be, given some of the issues with speed and cognitive biases discussed in the report). There are some people doing good work on related questions in international humanitarian law though that will give a much more interesting answer.

Thanks again!

Risks from Autonomous Weapon Systems and Military AI

Hi Haydn,

That’s a great point. I think you’re right — I should have dug a bit deeper on how the private sector fits into this.

I think cyber is an example where the private sector has really helped to lead — like Microsoft’s involvement at the UN debates, the Paris Call, the Cybersecurity Tech Accord, and others — and maybe that’s an example of how industry stakeholders can be engaged.

I also think that TEVV-related norms and confidence building measures would probably involve leading companies.

I still broadly thinking that states are the lever to target at this stage in the problem, given that they would be (or are) driving demand. I am also always a little unsure about using cluster munitions as an example of success — both because I think autonomous weapons are just a different beast in terms of military utility, and of course because of the breaches (including recently).

Thank you again for pointing out that hole in the report!

Are you really in a race? The Cautionary Tales of Szilárd and Ellsberg

Thank you for the reply! I definitely didn’t mean to mischaracterize your opinions on that case :)

Agreed, a project like that would be great. Another point in favor of your argument that this is a dynamic to watch out for on AI competition is if verifying claims of superiority is harder for software (along the lines of Missy Cummings’s “The AI That Wasn’t There” That seems especially vulnerable to misperceptions

Are you really in a race? The Cautionary Tales of Szilárd and Ellsberg

Hi Haydn,

This is awesome! Thank you for writing and posting it. I especially liked the description of the atmosphere at RAND, and big +1 on the secrecy heuristic being a possibly big problem.[1] Some people think it helps explain intelligence analysts' underperformance in the forecasting tournaments, and I think there might be something to that explanation. 

We have a report on autonomous weapons systems and military AI applications coming out soon (hopefully later today) that gets into the issue of capability (mis)perception in arms races too, and your points on competition with China are well taken.

What I felt was missing from the post was the counterfactual: what if the atomic scientists’ and defense intellectuals’ worst fears about their adversaries had been correct? It’s not hard to imagine. The USSR did seem poised to dominate in rocket capabilities at the time of Sputnik.

I think there’s some hindsight bias going on here. In the face of high uncertainty about an adversary’s intentions and capabilities, it’s not obvious to me that skepticism is the right response. Rather, we should weigh possible outcomes. In the Manhattan Project case, one of those possible outcomes was that a murderous totalitarian regime would be the first to develop nuclear weapons, become a permanent regional hegemon, or worse, a global superpower. I think the atomic scientists’ and U.S. leadership’s decision then was the right one, given their uncertainties at the time.

I think it would be especially interesting to see whether misperception is actually more common historically. But I think there are examples of “racing” where assessments were accurate or even under-confident (as you mention, thermonuclear weapons).

Thanks again for writing this! I think you raise a really important question — when is AI competition “suboptimal”?[2]

  1. ^
  2. ^
Space governance - problem profile

Hi Fin!

This is great. Thank you for writing it up and posting it! I gave it a strong upvote.

(TLDR for what follows: I think this is very neglected, but I’m highly uncertain about tractability of formal treaty-based regulation)

As you know, I did some space policy-related work at a think tank about a year ago, and one of the things that surprised us most is how neglected the issue is — there are only a handful of organizations seriously working on it, and very few of them are the kinds of well-connected and -respected think tanks that actually influence policy (CSIS is one). This is especially surprising because — as Jackson Wagner writes below — so much of space governance runs through U.S. policy. Anyway, I think that’s another point in favor of working on this!

As I think I mentioned when we talked about space stuff a little while ago, I’m a bit skeptical about tractability of “traditional” (ie formal, treaty-based) arms control. You note some of the challenges in the 80K version of the write up. Getting the major powers to agree to anything right now, let alone something as sensitive as space tech, seems unlikely. Moreover, the difficulties of verification and ease of cheating are high, as they are with all dual-use technology. Someone can come up with a nice “debris clean up” system that just happens to also be a co-orbital ASAT, for example.

But I think there are other mechanisms for creating “rules of the orbit” — that’s the word Simonetta di Pippo, the director of UNOOSA used at a workshop I helped organize last year. (

Cyber is an example where a lot of actors have apparently decided that treaty-based arms control isn’t going to cut it (in part for political reasons, in part because the tech moves so fast), but there are still serious attempts at creating norms and regulation ( That includes standard setting and industry-driven processes, which feel especially appropriate in space, where private actors play such an important role. We have a report on autonomous weapons and AI-enabled warfare coming out soon at Founders Pledge, and I think that’s another space where people put too much emphasis on treaty-based regulation and neglect norms and confidence building measures for issues where great powers can agree on risk reduction.

Again, I think this is a great write up, and love that you are drawing attention to these issues. Thank you!

Comparing top forecasters and domain experts

Thanks Gavin!  That makes sense on how you view this and (3). 

Comparing top forecasters and domain experts

Thank you for writing this overview! I think it's very useful. A few notes on the famous "30%" claim:

  • Part of the problem with fully understanding the performance of IC analysts is that much of the information about the tournaments and the ICPM is classified.
  • What originally happened is that someone leaked info about ACE to David Ignatius, who then published it in his column. (The IC never denied the claim.[1]) The document you cite is part of a case study by MITRE that's been approved for public release.

One under-appreciated takeaway that you hint at is that prediction markets (rather than non-market aggregation platforms) are poorly suited to classified environments. Here's a quote from a white paper I co-wrote last year:[2]

"Prediction markets are especially challenging to implement in classified environments because classified markets will necessarily have large limitations on participation, requiring the use of algorithmic correctives to solve liquidity problems. Good liquidity, like that of a well-functioning stock market, is difficult to achieve in prediction markets like the ICPM, requiring prediction markets to have corrective tools like setting liquidity parameters and using automated market makers, which attempt to simulate efficient market behavior in electronic prediction markets."

More broadly, I would like to push back a little against the idea that  your point 3(a) ( whether supers outperform IC analysts) is really much evidence for or against 3 (whether supers outperform domain experts). 

First, the  IARPA tournaments asked a wide range of questions, but intelligence analysts tend to be specialized. If you're looking at the ICPM, are you really looking at the performance of domain experts? Or are you looking at e.g. an expert on politics in the Horn of Africa trying to forecast the price of the Ruble? On the one hand, since participants self-selected which questions they answered, we might expect domain experts to stick to their domain. On the other, analysts might have seen it as a "game," a "break," or "professional development" -- in short, an opportunity to try their hand something outside their expertise. The point is that we simply don't know whether the ICPM really reflects "expert" opinion.

Second, I am inclined to believe that comparisons between IC analysts and supers may tell us more about the secrecy heuristic than about forecaster performance. From the same white paper:

"Experimental research on using secrecy as a heuristic for informational quality demonstrates that people tend to weigh secret information more heavily than publicly available information, viewing secret information as higher quality than public information.[3] Secrecy does matter, especially in situations where information asymmetry exists, but a pervasive secrecy bias may negatively affect the accuracy of a classified crowd in some cases."

I personally see much of the promise of forecasting platforms not as a tool for beating experts, but as a tool for identifying  them more reliably (more reliably than by the usual signals, like a PhD). 

  1. ^

    Tetlock discusses this a bit in Chapter 4 of Superforecasting. 

  2. ^
  3. ^

    Travers et al., "The Secrecy Heuristic," 

The Future Fund’s Project Ideas Competition

Experimental Wargames for Great Power War and Biological Warfare

Biorisk and Recovery from Catastrophe, Epistemic Institutions

This is a proposal to fund a series of "experimental wargames," on great power war and biological warfare. Wargames have long been a standard tool of think tanks, the military, and the academic IR world since the early Cold War. Until recently, however, these games were largely used to uncover unknown unknowns  and help with scenario planning. Most such games continue to be unscientific exercises. Recent work on "experimental wargames" (see, e.g. this paper on drones and escalation), however, has leveraged wargaming methods with randomly-assigned groups and varying scenarios to see how decision-makers will react in hypothetical crisis situations. A series of well-designed experimental wargames on crisis decision-making in a great power confrontation or during a biological attack could help identify weaknesses, quantify risks, and uncover cognitive biases at work in high-pressure decision-making. Additionally, they would have the added benefit of raising awareness about global catastrophic risks.

Load More
Applied ResearcheratFounders Pledge