Research Scientist @ CSAIL MIT, Associate Director @ Epoch
377 karmaJoined Feb 2020


One issue I have with this is that when someone calls this the 'default', I interpret them as implicitly making some prediction about the likelihood of such countermeasures not being taken. The issue then that this is a very vague way to communicate one's beliefs. How likely does some outcome need to be for it to become the default? 90%? 70%? 50%? Something else?

The second concern is that it's improbable for minimal or no safety measures to be implemented, making it odd to set this as a key baseline scenario. This belief is supported by substantial evidence indicating that safety precautions are likely to be taken. For instance:

  • Most of the major AGI labs are investing quite substantially in safety (e.g. OpenAI committing some substantial fraction of its compute budget, a large fraction of Anthropic's research staff seems dedicated to safety, etc.)
  • We have received quite a substantial amount of concrete empirical evidence that safety-enhancing innovations are important for unlocking the economic value from AI systems (e.g. RLHF, constitutional AI, etc.)
  • There 
  • It seems a priori very likely that alignment is important for unlocking the economic value from AI, because this effectively increases the range of tasks that AI systems can do without substantial human oversight, which is necessary for deriving value from automation
  • Major governments are interested in AI safety (e.g. the UK's AI Safety Summit, the White House's securing commitments around AI safety from AGI labs)

Maybe they think that safety measures taken in a world in which we observe this type of evidence will fall far short from what is neeeded. However, it's somewhat puzzling be confident enough in this to label it as the 'default' scenario.

The term 'default' in discussions about AI risk (like 'doom is the default') strikes me as an unhelpful rhetorical move. It suggests an unlikely scenario where little-to-no measures are taken to address AI safety. Given the active research and the fact that alignment is likely to be crucial to unlocking the economic value from AI, this seems like a very unnatural baseline to frame discussions around.

I've updated the numbers based on today's predictions. Key updates:

  • AI-related risks have seen a significant increase, almost doubling both in terms of catastrophic (from 3.06% in Jun 2022 to 6.16% in September 2023) and extinction risk (from 1.56% to 3.39%).
  • Biotechnology risks have actually decreased in terms of catastrophe likelihood (from 2.21% to 1.52%), while staying constant for extinction risk (0.07% in both periods).
  • Nuclear War has shown an uptick in catastrophic risk (from 1.87% to 2.86%) but remains consistent in extinction risk (0.06% in both periods).
Answer by TamaySep 14, 202331
  • Owain Evans on AI alignment (situational awareness in LLM, benchmarking truthfulness)
  • Ben Garfinkel on AI policy (best practices in AI governance, open source, the UK's AI efforts)
  • Anthony Aguirre on AI governance, forecasting, cosmology
  • Beth Barnes on dangerous capability evals (GPT-4's and Claude's eval)

I agree the victim-perpetrator is an important lens through which to view this saga. But, I also think that an investor-investee framing is another important one; a framing that has different prescriptions for what lessons to take away, and what to do next. The EA community staked easily a billion dollars worth of its assets (in focus, time, reputation, etc.), and ended up losing it all. I think it's crucial to reflect on whether the extent of our due diligence and risk management was commensurate with the size of EA's bet.


One specific question I would want to raise is whether EA leaders involved with FTX were aware of or raised concerns about non-disclosed conflicts of interest between Alameda Research and FTX.

For example, I strongly suspect that EAs tied to FTX knew that SBF and Caroline (CEO of Alameda Research) were romantically involved (I strongly suspect this because I have personally heard Caroline talk about her romantic involvement with SBF in private conversations with several FTX fellows). Given the pre-existing concerns about the conflicts of interest between Alameda Research and FTX (see examples such as these), if this relationship were known to be hidden from investors and other stakeholders, should this not have raised red flags? 

This is insightful.  Some quick responses:

  • My guess would be that the ability to commercialize these models would strongly hinge on the ability for firms to wrap these up with complementary products, that would contribute to an ecosystem with network effects, dependencies, evangelism, etc.
  • I wouldn't draw too strong conclusions from the fact that the few early attempts to commercialize models like these, notably by OpenAI, haven't succeeded in creating the preconditions for generating a permenant stream of profits. I'd guess that their business models look less-than-promising on this dimension because (and this is just my impression) they've been trying to find product-market-fit, and have gone lightly on exploiting particular fits they found by building platforms to service these
  • Instead, better examples of what commercialization looks like are GPT-3-powered companies, like copysmith, which seem a lot more like traditional software businesses with the usual tactics for locking users in, and creating network effects and single-homing behaviour
  • I expect that companies will have ways to create switching costs for these models that traditional software product don't have. I'm particularly interested in fine-tuning as a way to lock-in users by enabling models to strongly adapt to context about the users' workloads. More intense versions of this might also exist, such as learning directly from individual customer's feedback through something like RL. Note that this is actually quite similar to how non-software services create loyalty

I agree that it seems hard to commercialize these models out-of-the-box with something like paid API access, but I expect, given the points above, to be superseded by better strategies. 

By request,  I have updated the predictions based on the latest predictions. Previous numbers can be found here.

I won the Stevenson prize (a prize given out at my faculty) for  my performance in the  MPhil in Economics.  I gather Amartya Sen won the same prize some 64 years ago, which I think is pretty cool.

Load more