Hide table of contents

Thanks to Siao Si for helpfwl debate on several aspects of this post. Update: seems like the idea already exists on LessWrong: iterated trust kickstarters.

In a chat with Seth Baum in gather town during EAGxVirtual we were talking about the risk of nuclear escalation in the Russian invasion of Ukraine, and the question came up,

"What can we personally do to contribute?"

In this post, I want to give an introspective demonstration of my particular approach to building an inside view as a complete beginner to a topic. I do think my approach is somewhat unusual, and it may strike people as hubristic, but I claim that this is closer to how I ought to reason if my end goal is to fix something in the real world as opposed to fitting into people's expectations. Research in the real world is messy, and an optimised methodology shouldn't fit neatly into a genre that's evolved for looking professional.

Feel free to give yourself ~5 minutes to come up with your own idea. Try to pay attention to how you approached it, and compare it to what I did. My main project this year is about figuring out how to figure out things, so I'd be very grateful if you tried to describe your approach. Even if you think your process "looks dumb" or somesuch, I think having the courage to reveal your inner workings and talk openly about it reflects some extremely important virtues--especially if you think it's dumb. This is a collaboration, not a contest.


Conceptual research methodology

The Inventor's Paradox is the observation that when you're trying to solve a specific problem, it's often easier to solve a bigger problem that includes the specific problem as a special case.

Why? I don't know yet, but one aspect of it is that when you go up an abstraction level to solve a more general problem, you have fewer objects you need to reason over and therefore a much smaller search tree. E.g. if you're solving a specific problem involving all integers between 1 and 99, it takes more information to specify that set of particular numbers than it takes to just specify "all integers". You're compressing the detail into smaller objects whose essential features now pop out to your mind's eye.[1]

In problem-solving, the hardest and most productive step is often figuring out which parts you can ignore. How can you turn a problem with seemingly many moving parts, into a problem with fewer moving parts?

Anyway, solving the nuclear conflict between Russia and other countries seems like a really hard problem. So many details! Thus, the level of analysis I start out with is not "how can I personally contribute?" nor is it even "how can we resolve the Ukraine-Russia conflict". Instead, I find what I call the "shower-thought level", and ask myself,

"What's a general mechanism for ending all nuclear war"?

When I reframe the problem like this, it forces me to think about the most general features, and allows me to go into problem-solving mode instead of "I have no clue, I need to read more". Zooming in and dealing with the details becomes necessary at some point, but there's an appropriate time for that which is not yet. First, I need to have a high-level inside-view model that lets me have an idea of what specific details to even start looking for. In other words, I'm looking for something I can think productively about while in the shower.

For real-world thinking with enormous search trees, collecting as many ideas that seem like they can make metaphors to other stuff is a good general strategy. When the search space is so large, it's nearly impossible to build a new functional model from scratch. It becomes necessary to look for existing models from other fields (or nature), and iteratively testing them against the new use-case and tweaking them accordingly.[2]

Furthermore, I think the popular advice of "hold off on proposing solutions" is often overapplied. As long as I'm confident in my ability to avoid imprinting on the first idea I find, I often propose several dumb solutions as a way to learn about the problem in the first place. If I don't believe in myself, I might be anxiously attached to the first thing I produce because I doubt I'll be able to find anything else.


"Nuke swapping" to de-escalate nuclear stockpiles

When two competitors are in direct conflict but neither of them wants to waste resources on a fight, I have two metaphors that spring to mind: assurance contracts and vote swapping. I've long thought that assurance contracts are the best thing ever, but I couldn't see a way to make them work in this case (~2 minutes). So I move on to explore the other pattern.

Vote swapping is when two voters for opposing sides in a single-winner election agree to vote for a third candidate instead. If the election is overwhelmingly bipolar, then the two votes would by default directly cancel each other out. But they might both value a third candidate above zero, so vote swapping lets them net a positive sum instead of a zero sum.

This seems sorta kinda like a nuclear standoff, but I can't immediately see how. So I spend a few minutes trying to force the metaphor until something clicks. What I end up with doesn't really fit with vote swapping in the election case. Instead, I find new pattern that I could only arrive at by trying to force the metaphor and deliberately trusting myself in that I can find something that is valuable. (~4 minutes)

In a situation where two sides depend on mutually assured destruction for their safety, neither of them would want to disarm unilaterally. Even if both sides strongly prefer the world were neither side had nuclear weapons, disarming asymmetrically would upset the balance of power, which could in the worst case have the consequence of making a nuclear attack more likely. So what's a mechanism for ensuring symmetric disarmament that doesn't upset the balance?

(On reflection, I think I was mistaken here about this being one of the main bottlenecks for this to work. But while I'm trying to generate something, I endorse not spending too much time trying to critique what I'm coming up with at every point. I want to have something coherent that snaps into place, and that complete pattern may let me see further metaphors that I can draw on to understand the problem. I can more rigorously vet each step after I've already made the model.)

Here the concept of an escrow comes to mind because it's used it variants of assurance contracts, which is a metaphor I previously activated. One side of the conflict places an insignificant quantity of weapons (a 'unit') into the hands of a mutually trusted third party who's responsible for verifying and destroying it. If the unit is small enough, it won't shift the balance of power enough for it to matter, so the cost to the initial 'bid' is insignificant.

But once an insignificant bid has been made, there is pressure on the other party to match it, because it'll be politically unreasonable to refuse such a marginal sacrifice for the global good. This will then hopefwly trigger the opposite of an escalation of conflict--an iterative game of "nuke swapping". Just like how arms races build themselves up because there's momentum in the direction of escalation, the marginal nuke swapping will hopefwly spark momentum for de-escalation.

Now I just need to tell Putin!


After the aha moment

I recognise how important it is to be innocently excited about the ideas I produce, otherwise my brain won't be very motivated to comply next time I ask it to produce something for me. So I feel pretty excited about this idea. Humility has its uses when the stakes are high, but I'm just exploring. And as long as I don't fool myself into thinking I have a finished product, I don't want to constrain my creativity by worrying about whether it's socially appropriate.

Although I think the idea is wrong, I think it's productively wrong in the sense that there are many parts that can be tweaked to look for less wrong ideas nearby. 

I doubt in the applicability of my idea, because if I don't have a good understanding of the problem I'm fitting my solution to, it's unlikely to fit. But I'm still able to feel excited about this because I think it's productively wrong. Models that are wrong for a particular use-case may still still be part of the arsenal I can use against other problems, or I can tweak them until they do fit. And moreover, I can feel excited about having gone through a process which I endorse and would like to see universally adopted.


Where I think it goes wrong

After generation, I move on to analysing and critiquing. Again, I use metaphorical thinking, because now that I have a new pattern, I may be able to see similarities to different models that do slightly different things. Metaphorical thinking lets me ask questions like "why does that model do it like that when my model does it like this?" and it hints me about where I might've gone wrong.

I can quickly see at least two possible mechanistic reasons, but I'm sure there are plenty that I'm not immediately seeing here.

  1. The side that's weaker by conventional weapons does not want to agree to start the chain, because they can get trampled if they cooperate. MAD benefits the weaker side because it ensures stalemate when the weaker side would otherwise overwhelmingly lose a conventional war.
  2. The analogy to elections breaks down because the incentive to swap marginal weapons goes down the more weapons you've already swapped. It becomes less and less like a bipolar conflict because the fewer nukes you have, the more you have to worry about other non-nuclear actors now becoming a threat.  (H/T Ruth)
  1. ^

    Note, when you compress the problem into general abstract categories, you have fewer details to talk about, so you don't have as many opportunities to reveal your expertise by talking about the many details you know. Writing about the finer details will often look more impressive and professional, although imo that doesn't necessarily translate to being more usefwl, and I think it can often bias people to start at the wrong level of abstraction.

  2. ^

    I have a flashcard collection in RemNote of >100 ideas/patterns/tools/perspectives that I feel will be the most generally applicable to other problems. Sometimes when I'm stuck, I flip through this list, and it's been surprisingly effective for me. I call them "germs of generality", taking inspiration from David Hilbert.

Comments2


Sorted by Click to highlight new comments since:

This is great! I'm really glad you wrote this, and I'd be interested in seeing more work like this.

owo thanks!

Curated and popular this week
 ·  · 5m read
 · 
This work has come out of my Undergraduate dissertation. I haven't shared or discussed these results much before putting this up.  Message me if you'd like the code :) Edit: 16th April. After helpful comments, especially from Geoffrey, I now believe this method only identifies shifts in the happiness scale (not stretches). Have edited to make this clearer. TLDR * Life satisfaction (LS) appears flat over time, despite massive economic growth — the “Easterlin Paradox.” * Some argue that happiness is rising, but we’re reporting it more conservatively — a phenomenon called rescaling. * I test rescaling using long-run German panel data, looking at whether the association between reported happiness and three “get-me-out-of-here” actions (divorce, job resignation, and hospitalisation) changes over time. * If people are getting happier (and rescaling is occuring) the probability of these actions should become less linked to reported LS — but they don’t. * I find little evidence of rescaling. We should probably take self-reported happiness scores at face value. 1. Background: The Happiness Paradox Humans today live longer, richer, and healthier lives in history — yet we seem no seem for it. Self-reported life satisfaction (LS), usually measured on a 0–10 scale, has remained remarkably flatover the last few decades, even in countries like Germany, the UK, China, and India that have experienced huge GDP growth. As Michael Plant has written, the empirical evidence for this is fairly strong. This is the Easterlin Paradox. It is a paradox, because at a point in time, income is strongly linked to happiness, as I've written on the forum before. This should feel uncomfortable for anyone who believes that economic progress should make lives better — including (me) and others in the EA/Progress Studies worlds. Assuming agree on the empirical facts (i.e., self-reported happiness isn't increasing), there are a few potential explanations: * Hedonic adaptation: as life gets
 ·  · 38m read
 · 
In recent months, the CEOs of leading AI companies have grown increasingly confident about rapid progress: * OpenAI's Sam Altman: Shifted from saying in November "the rate of progress continues" to declaring in January "we are now confident we know how to build AGI" * Anthropic's Dario Amodei: Stated in January "I'm more confident than I've ever been that we're close to powerful capabilities... in the next 2-3 years" * Google DeepMind's Demis Hassabis: Changed from "as soon as 10 years" in autumn to "probably three to five years away" by January. What explains the shift? Is it just hype? Or could we really have Artificial General Intelligence (AGI)[1] by 2028? In this article, I look at what's driven recent progress, estimate how far those drivers can continue, and explain why they're likely to continue for at least four more years. In particular, while in 2024 progress in LLM chatbots seemed to slow, a new approach started to work: teaching the models to reason using reinforcement learning. In just a year, this let them surpass human PhDs at answering difficult scientific reasoning questions, and achieve expert-level performance on one-hour coding tasks. We don't know how capable AGI will become, but extrapolating the recent rate of progress suggests that, by 2028, we could reach AI models with beyond-human reasoning abilities, expert-level knowledge in every domain, and that can autonomously complete multi-week projects, and progress would likely continue from there.  On this set of software engineering & computer use tasks, in 2020 AI was only able to do tasks that would typically take a human expert a couple of seconds. By 2024, that had risen to almost an hour. If the trend continues, by 2028 it'll reach several weeks.  No longer mere chatbots, these 'agent' models might soon satisfy many people's definitions of AGI — roughly, AI systems that match human performance at most knowledge work (see definition in footnote). This means that, while the compa
 ·  · 4m read
 · 
SUMMARY:  ALLFED is launching an emergency appeal on the EA Forum due to a serious funding shortfall. Without new support, ALLFED will be forced to cut half our budget in the coming months, drastically reducing our capacity to help build global food system resilience for catastrophic scenarios like nuclear winter, a severe pandemic, or infrastructure breakdown. ALLFED is seeking $800,000 over the course of 2025 to sustain its team, continue policy-relevant research, and move forward with pilot projects that could save lives in a catastrophe. As funding priorities shift toward AI safety, we believe resilient food solutions remain a highly cost-effective way to protect the future. If you’re able to support or share this appeal, please visit allfed.info/donate. Donate to ALLFED FULL ARTICLE: I (David Denkenberger) am writing alongside two of my team-mates, as ALLFED’s co-founder, to ask for your support. This is the first time in Alliance to Feed the Earth in Disaster’s (ALLFED’s) 8 year existence that we have reached out on the EA Forum with a direct funding appeal outside of Marginal Funding Week/our annual updates. I am doing so because ALLFED’s funding situation is serious, and because so much of ALLFED’s progress to date has been made possible through the support, feedback, and collaboration of the EA community.  Read our funding appeal At ALLFED, we are deeply grateful to all our supporters, including the Survival and Flourishing Fund, which has provided the majority of our funding for years. At the end of 2024, we learned we would be receiving far less support than expected due to a shift in SFF’s strategic priorities toward AI safety. Without additional funding, ALLFED will need to shrink. I believe the marginal cost effectiveness for improving the future and saving lives of resilience is competitive with AI Safety, even if timelines are short, because of potential AI-induced catastrophes. That is why we are asking people to donate to this emergency appeal