Epistemic status: I'm new to this concept, and I welcome feedback on my mental models.

The concept of a "theory of victory" is new to me. But basically, as far as I understand it, a theory of victory is a specific vision for the actions and events that occur in a winning scenario for a given problem. That is, if a specific problem was resolved in the top 5% (or 1%, or 0.1%, or whatever it takes to achieve victory) best possible universes, what had to happen in order to get us there? [Edited this definition in line with lukeprog's comment below.]

This is similar to the "theory of change" idea in its focus on concrete actions and a clear vision, of at least medium fidelity, of the process of creating the desired change. Both are aided by "backchaining," or the idea of starting from the end goal and figuring out what had to happen right before the accomplishment of that goal, what had to happen right before that, and so on back to the present. The main difference between these ideas seems to be that a "theory of victory" focuses explicitly on the best case (95th+ percentile) scenarios, and the actions and events that precipitated the victory.

Crucially, a theory of victory posits that we act in high-leverage ways at crucial points in the timeline. Otherwise, by definition we will have left value on the table!

Below are a couple of examples of theories of victory. I have not done much additional research into these problems - these are just my top-of-my-head mental models for what the best case scenarios look like for these issues. 

Climate change:

  • R&D continues apace in solar, battery, hydrogen, nuclear fusion
  • Regulatory efforts improve the landscape for nuclear and large-scale wind/solar/hydro projects
  • We achieve carbon-free energy abundance in developed countries and export tech to developing countries
  • Abundant energy kills the markets for fossil fuels and allows us to set up loads of carbon capture facilities, which in turn produce cement or other important products while sucking carbon out of the air
  • Profit (something like "we limit global warming to <2 degrees C, and in time we lower global temperatures back to near pre-industrial levels.")

Animal welfare:

  • Research on animal sentience produces new and important insights that guide future actions
  • In the meantime, regulatory wins on cage-free eggs and other anti-animal cruelty laws continue to pick up steam in developed countries
  • Vegan meat options become less expensive, tastier, and more abundant than animal meat, and a critical mass of market share shifts to vegan meat
  • Factory farms and feed producers are incentivized to use their land/facilities for other, more socially and economically beneficial activities eg. solar/wind farms, growing more specialized crops, etc.
  • Political resistance to animal welfare laws decreases as a result of shifting incentives, and more sweeping political solutions become possible
  • Profit (something like "we end factory farming to such an extent that animal welfare activism shifts in focus to wild animal welfare.")
  • For another set of models, see here.

In the process of writing the above examples, I found that they were much easier to think about than theories of victory pertaining to existential risks. What would a theory of victory look like for AI alignment or TAI development, or for preventing catastrophic pandemics, or for preventing the development of much deadlier weapons systems (bioweapons, AI-controlled nukes, etc.)? Perhaps it's easier to think about climate change and animal welfare because the goals are clearer, or the problems are just easier. At least we know the sign of various interventions - better battery systems are just good! We don't know the sign of, like, any actions in AI space (to the best of my knowledge, there's spirited disagreement on almost every possible action).

There's also a danger here of being overconfident, potentially in many directions. We could be overconfident in what the best case results might be (perhaps <2.5 degrees C is a more realistic best case, for example). We could be overconfident in the causal arrow of each link in the chain of events - it's likely that there will be unintended and un-planned-for events at every step of the chain, when implemented in the real world; these events could be damaging or fully derailing. The old adage applies: "Plans are useless but planning is indispensable."

So, what's your theory of victory for whatever you're working on?

Sorted by Click to highlight new comments since:

Most of my plans start with "get inside view on AI alignment" and end in "alignment becomes a large field of theoretical CS research, and many of the people in this field are actually trying to prevent x-risk". I might be able to contribute directly to solving the problem, but it's hard to pass up multiplier effects from various community-building and field-building projects.

  • get inside view on AI alignment -> be able to help run workshops that upskill top researchers, or other high-value community-building events/programs  -> ...
  • get inside view on AI alignment -> start doing research that clearly exposes why alignment is hard -> field gains legitimacy ->
  • get inside view on AI alignment -> start doing research -> mentor new researchers ->  ...
  • get inside view on AI alignment -> gain the skill of distilling others' work -> create distillations that expose "surface area" to interested undergrads or established researchers in other subfields -> ...
  • get inside view on AI alignment -> steer the young field of alignment into research directions that are more likely to solve the problem.

Once the technical problem is solved, it actually has to get implemented, but lack of strategic clarity decreases the tractability of implementation right now, so I feel pretty good about decreasing the alignment tax.

FWIW I don't use "theory of victory" to refer to 95th+ percentile outcomes (plus a theory of how we could plausibly have ended up there). I use it to refer to outcomes where we "succeed / achieve victory," whether I think that represents the top 5% of outcomes or the top 20% or whatever. So e.g. my theory of victory for climate change would include more likely outcomes than my theory of victory for AI does, because I think succeeding re: AI is less likely.

Thanks for the clarification! I've edited the post to reflect your feedback.

For the EA Dev community (my main focus), my theory of victory is something like:

  • Early career:
    • Devs go to fun jobs with mentorship where they learn a ton and make lots of money
    • They quit when the job stops being that
  • Late stage:
    • Apply to EA orgs a lot
  • EA orgs
    • Treat devs well, including writing good transparent hiring posts, paying enough money, and so on
  • As a result
    • EA orgs find it easy to hire over qualified people, including managers for those people
    • The EA community finds it easy to set up and run amazing software projects
      • We no more have orgs doing manual annoying work
      • Now EA is known as a place with unusually good software, since we don't have principle-agent problems and similar things
        • For example, we have the best job board in the world, the best donation platforms (maybe we already do?), the best forum that facilitates good complicated discussions (when the world's social media mostly failed at that), and so on
        • x-risk projects that could use the help of software developers don't even count that as a hard part. "EA has ~25% developers [or whatever], of course that part is easy", they say, forgetting what the situation was at 2022
Curated and popular this week
Relevant opportunities