Ted Sanders

Researcher @ OpenAI
422 karmaJoined Working (6-15 years)San Francisco, CA, USA


I think approximately 1 in 10,000 chance of extinction for each new GPT would be acceptable given the benefits of AI. This is approximately my guess for GPT-5, so I think if we could release that model and then pause, I'd be okay with that.

To me, this is wild. 1/10,000 * 8 billion people = 800,000 current lives lost in expectation, not even counting future lives. If you think GPT-5 is worth 800k+ human lives, you must have high expectations. :)

One small clarification: the skeptical group was not all superforecasters. There were two domain experts as well. I was one of them.

I'm sympathetic to David's point here. Even though the skeptic camp was selected for their skepticism, I think we still get some information from the fact that many hours of research and debate didn't move their opinions. I think there are plausible alternative worlds where the skeptics come in with low probabilities (by construction), but update upward by a few points after deeper engagement reveals holes in their early thinking.

Here's a hypothesis:

The base case / historical precedent for existential AI risk is:
- AGI has never been developed
- ASI has never been developed
- Existentially deadly technology has never been developed (I don't count nuclear war or engineered pandemics, as they'll likely leave survivors)
- Highly deadly technology (>1M deaths) has never been cheap and easily copied
- We've never had supply chains so fully automated end-to-end that they could become self-sufficient with enough intelligence
- We've never had technology so networked that it could all be taken over by a strong enough hacker

Therefore, if you're in the skeptic camp, you don't have to make as much of an argument about specific scenarios where many things happen. You can just wave your arms and say it's never happened before because it's really hard and rare, as supported by the historical record.

In contrast, if you're in the concerned camp, you're making more of a positive claim about an imminent departure from historical precedent, so the burden of proof is on you. You have to present some compelling model or principles for explaining why the future is going to be different from the past.

Therefore, I think the concerned camp relying on theoretical arguments with multiple steps of logic might be a structural side effect of them having to argue against the historical precedent, rather than any innate preference for that type of argument.

"Refuted" feels overly strong to me. The essay says that market participants don't think TAGI is coming, and those market participants have strong financial incentive to be correct, which feels unambiguously correct to me. So either TAGI isn't coming soon, or else a lot of people with a lot of money on the line are wrong. They might well be wrong, but their stance is certainly some form of evidence, and evidence in the direction of no TAGI. Certainly the evidence isn't bulletproof, condsidering the recent mispricings of NVIDIA and other semi stocks.

In my own essay, I elaborated on the same point using prices set by more-informed insiders: e.g., valuations and hiring by Anthropic/DeepMind/etc., which also seem to imply that TAGI isn't coming soon. If they have a 10% chance of capturing 10% of the value for 10 years of doubling the world economy, that's like $10T. And yet investment expenditures and hiring and valuations are nowhere near that scale. The fact that Google has more people working on ads than TAGI implies that they think TAGI is far off. (Or, more accurately, that marginal investments would not accelerate TAGI timelines or market share.)

Great comment. We didn't explicitly allocate probability to those scenarios, and if you do, you end up with much higher numbers. Very reasonable to do so.

I think that's a great criticism. Perhaps our conditional odds of Taiwan derailment are too high because we're too anchored to today's distribution of production.

One clarification/correction to what I said above: I see the derailment events 6-10 as being conditional on us being on the path to TAGI had the derailments not occurred. So steps 1-5 might not have happened yet, but we are in a world where they will happen if the derailment does not occur. (So not really conditional on TAGI already occurring, and not necessarily conditional on AGI, but probably AGI is occurring in most of those on-the-path-to-TAGI scenarios.)

Edit: More precisely, the cascade is:
- Probability of us developing TAGI, assuming no derailments
- Probability of us being derailed, conditional on otherwise being on track to develop TAGI without derailment

Question: Do you happen to understand what it means to take a geometric mean of probabilities? In re-reading the paper, I'm realizing I don't understand the methodology at all. For example, if there is a 33% chance we live in a world with 0% probability of doom, a 33% chance we live in a world with 50% probability of doom, and a 33% chance we live in a world with 100% probability of doom... then the geometric mean is (0% x 50% x 100%)^(1/3) = 0%, right?

Edit: Apparently the paper took a geometric mean of odds ratios, not probabilities. But this still means that had a single surveyed person said 0%, the entire model would collapse to 0%, which is wrong on its face.

Great comment! Thanks especially for trying to point the actual stages going wrong, rather than hand-waving the multiple stage fallacy, which we all are of course well aware of.

Replying to the points:

For example, the authors assign around 1% to events 1-5 happening before 2043. If they're correct, then conditioning on events 1-5 happening before 2043, they'll very likely only happen just before 2043. But this leaves very little time for any "derailing" to occur after that, and so the conditional probability of derailing should be far smaller than what they've given (62%).

From my POV, if events 1-5 have happened, then we have TAGI. It's already done. The derailments are not things that could happen after TAGI to return us to a pre-TAGI state. They are events that happen before TAGI and modify the estimates above.

The authors might instead say that they're not conditioning on events 1-5 literally happening when estimating conditional probability of derailing, but rather conditioning on something more like "events 1-5 would have happened without the 5 types of disruption listed". That way, their 10% estimate for a derailing pandemic could include a pandemic in 2025 in a world which was otherwise on track for reaching AGI. But I don't think this is consistent, because the authors often appeal to the assumption that AGI already exists when talking about the probability of derailing (e.g. the probability of pandemics being created). So it instead seems to me like they're explicitly treating the events as sequential in time, but implicitly treating the events as sequential in logical flow, in a way which significantly decreases the likelihood they assign to TAI by 2043.

Yes, we think AGI will precede TAGI by quite some time, and therefore it's reasonable to talk about derailments of TAGI conditional on AGI.

Congrats to the winners, readers, and writers!

Two big surprises for me:

(1) It seems like 5/6 of the essays are about AI risk, and not TAGI by 2043. I thought there were going to be 3 winners on each topic, but perhaps that was never stated in the rules. Rereading, it just says there would be two 1st places, two 2nd places, and two 3rd places. Seems the judges were more interested in (or persuaded by) arguments on AI safety & alignment, rather than TAGI within 20 years. A bit disappointing for everyone who wrote on the second topic. If the judges were more interested in safety & alignment forecasting, that would have been nice to know ahead of time.


(2) I'm also surprised that the Dissolving AI Risk paper was chosen. (No disrespect intended; it was clearly a thoughtful piece.)

To me, it makes perfect sense to dissolve the Fermi paradox by pointing out that the expected # of alien civilizations is a very different quantity than the probability of 0 alien civilizations. It's logically possible to have both a high expectation and a high probability of 0.

But it makes almost no sense to me to dissolve probabilities by factoring them into probabilities of probabilities, and then take the geometric mean of that distribution. Taking the geometric mean of subprobabilities feels like a sleight of hand to end up with a lower number than what you started with, with zero new information added in the process. I feel like I must have missed the main point, so I'll reread the paper.

Edit: After re-reading, it makes more sense to me. The paper takes the geometric means of odds ratios in order to aggregate survey entries. It doesn't take the geometric mean of probabilities, and it doesn't slice up probabilities arbitrarily (as they are the distribution over surveyed forecasters).

Edit2: As Jaime says below, the greater error is assuming independence of each stage. The original discussion got quite nerd-sniped by the geometric averaging, which is a bit of a shame, as there's a lot more to the piece to discuss and debate.

The end-to-end training run is not what makes learning slow. It's the iterative reinforcement learning process of deploying in an environment, gathering data, training on that data, and then redeploying with a new data collection strategy, etc. It's a mistake, I think, to focus only the narrow task of updating model weights and omit the critical task of iterative data collection (i.e., reinforcement learning).

Load more