Epistemic status: I'm new to this concept, and I welcome feedback on my mental models.
The concept of a "theory of victory" is new to me. But basically, as far as I understand it, a theory of victory is a specific vision for the actions and events that occur in a winning scenario for a given problem. That is, if a specific problem was resolved in the top 5% (or 1%, or 0.1%, or whatever it takes to achieve victory) best possible universes, what had to happen in order to get us there? [Edited this definition in line with lukeprog's comment below.]
This is similar to the "theory of change" idea in its focus on concrete actions and a clear vision, of at least medium fidelity, of the process of creating the desired change. Both are aided by "backchaining," or the idea of starting from the end goal and figuring out what had to happen right before the accomplishment of that goal, what had to happen right before that, and so on back to the present. The main difference between these ideas seems to be that a "theory of victory" focuses explicitly on the best case (95th+ percentile) scenarios, and the actions and events that precipitated the victory.
Crucially, a theory of victory posits that we act in high-leverage ways at crucial points in the timeline. Otherwise, by definition we will have left value on the table!
Below are a couple of examples of theories of victory. I have not done much additional research into these problems - these are just my top-of-my-head mental models for what the best case scenarios look like for these issues.
Climate change:
- R&D continues apace in solar, battery, hydrogen, nuclear fusion
- Regulatory efforts improve the landscape for nuclear and large-scale wind/solar/hydro projects
- We achieve carbon-free energy abundance in developed countries and export tech to developing countries
- Abundant energy kills the markets for fossil fuels and allows us to set up loads of carbon capture facilities, which in turn produce cement or other important products while sucking carbon out of the air
- Profit (something like "we limit global warming to <2 degrees C, and in time we lower global temperatures back to near pre-industrial levels.")
Animal welfare:
- Research on animal sentience produces new and important insights that guide future actions
- In the meantime, regulatory wins on cage-free eggs and other anti-animal cruelty laws continue to pick up steam in developed countries
- Vegan meat options become less expensive, tastier, and more abundant than animal meat, and a critical mass of market share shifts to vegan meat
- Factory farms and feed producers are incentivized to use their land/facilities for other, more socially and economically beneficial activities eg. solar/wind farms, growing more specialized crops, etc.
- Political resistance to animal welfare laws decreases as a result of shifting incentives, and more sweeping political solutions become possible
- Profit (something like "we end factory farming to such an extent that animal welfare activism shifts in focus to wild animal welfare.")
- For another set of models, see here.
In the process of writing the above examples, I found that they were much easier to think about than theories of victory pertaining to existential risks. What would a theory of victory look like for AI alignment or TAI development, or for preventing catastrophic pandemics, or for preventing the development of much deadlier weapons systems (bioweapons, AI-controlled nukes, etc.)? Perhaps it's easier to think about climate change and animal welfare because the goals are clearer, or the problems are just easier. At least we know the sign of various interventions - better battery systems are just good! We don't know the sign of, like, any actions in AI space (to the best of my knowledge, there's spirited disagreement on almost every possible action).
There's also a danger here of being overconfident, potentially in many directions. We could be overconfident in what the best case results might be (perhaps <2.5 degrees C is a more realistic best case, for example). We could be overconfident in the causal arrow of each link in the chain of events - it's likely that there will be unintended and un-planned-for events at every step of the chain, when implemented in the real world; these events could be damaging or fully derailing. The old adage applies: "Plans are useless but planning is indispensable."
So, what's your theory of victory for whatever you're working on?
Most of my plans start with "get inside view on AI alignment" and end in "alignment becomes a large field of theoretical CS research, and many of the people in this field are actually trying to prevent x-risk". I might be able to contribute directly to solving the problem, but it's hard to pass up multiplier effects from various community-building and field-building projects.
Once the technical problem is solved, it actually has to get implemented, but lack of strategic clarity decreases the tractability of implementation right now, so I feel pretty good about decreasing the alignment tax.
FWIW I don't use "theory of victory" to refer to 95th+ percentile outcomes (plus a theory of how we could plausibly have ended up there). I use it to refer to outcomes where we "succeed / achieve victory," whether I think that represents the top 5% of outcomes or the top 20% or whatever. So e.g. my theory of victory for climate change would include more likely outcomes than my theory of victory for AI does, because I think succeeding re: AI is less likely.
Thanks for the clarification! I've edited the post to reflect your feedback.
For the EA Dev community (my main focus), my theory of victory is something like: