Hide table of contents

I was hoping to write something for the Future Fund contest and - being entirely a one-trick pony - was going to look at uncertainty analysis in AI Catastrophe predictions. 

I've done a review of the forums and my conclusion is that predictions around AI Catastrophe are very heavily focussed on when AI will be invented and the overall top-level probability that AI will be a catastrophe if it is invented. Other than that predictions about AI Risk are quite sparse. For example, few people seem to have offered a numerical prediction about whether they think the AI Alignment Problem is solvable in principle, few people have offered a numerical prediction about the length of time we could contain a misaligned AI and so on. The only end-to-end model of AI Risk with numerical prediction I have found is Carlsmith (2021): https://arxiv.org/abs/2206.13353. 

  • Is my review of the state of the literature roughly accurate? That is, my impression that people mostly predict the time AI is invented and the risk that AI leads to catastrophe, but do not predict other important related questions (at least not numerically)?
  • Am I right that Carlsmith (2021) is the only end-to-end model of AI Risk with numerical predictions at each stage (by end-to-end I mean there are steps in between 'AI invented' and 'AI catastrophe' which are individually predicted)? Any other examples would be really helpful so I can scope out the community consensus on the microdynamics of AI risk.
  • If I'm right about the above, I think an essay looking at the microdynamics of AI Risk predictions could be novel and informative (for example the probability that we solve the Alignment Problem before AI is invented seems pretty important but I don't think anyone has looked at validating the Metaculus prediction on this topic). Is this already a known quantity? Are there any particular pitfalls I should watch out for?

In my review I came across the 'Database of Existential Risk Estimates' - link here: https://forum.effectivealtruism.org/posts/JQQAQrunyGGhzE23a/database-of-existential-risk-estimates. This seems to contain many estimates of exactly what I am looking for - predictions of specific events which will occur on the path to an AI catastrophe, rather than the overall risk of catastrophe itself.

  • Are there any other databases of this sort, especially those which focus on topics other than when AI will be invented or the top-level probability it will be a catastrophe?
  • Is the database regarded as generally credible on the forums? I have found a handful of predictions which I don't think are included (especially on Metaculus), but the database has many more which I would never have found without it. If there is no known systematic bias in the database I'd really like to use it!
  • Is there anything else I should know about the database?

Thanks so much!

24

0
0

Reactions

0
0
New Answer
New Comment


8 Answers sorted by

Am I right that Carlsmith (2021) is the only end-to-end model of AI Risk with numerical predictions at each stage (by end-to-end I mean there are steps in between 'AI invented' and 'AI catastrophe' which are individually predicted)? Any other examples would be really helpful so I can scope out the community consensus on the microdynamics of AI risk.

This spreadsheet (found here) has estimates on the propositions in Carlsmith by (some of?) the reviewers of that paper.

This is absolutely incredible - can't believe I missed it! Thank you so much

1
Erich_Grunewald 🔸
I'm excited to see what you come up with!

You might also find use in David Manheim's Modeling Transformative AI Risk (MTAIR) sequence. And you might also want to ask the LessWrong.com forum  this same question.

Thank you so much for the links - the Manheim work in this particular one is absolutely spectacular

I've spammed you with a few sources. But even though they do exist, they are fairly scattered and my sense is still that your impression is right that there aren't many such models. 

David Manheim would know better, so I recommend you check with him.

Se also: <https://www.openphilanthropy.org/research/semi-informative-priors-over-ai-timelines/>

Se also: <https://www.openphilanthropy.org/research/semi-informative-priors-over-ai-timelines/>

Nicole Noemi gathers some forecasts about AI risk (a) from Metaculus, Deepmind co-founders, Eliezer Yudkowsky, Paul Christiano, and Aleja Cotra's report on AI timelines.

h/t Nuño

Thank you, really appreciate the information

Curated and popular this week
 ·  · 52m read
 · 
In recent months, the CEOs of leading AI companies have grown increasingly confident about rapid progress: * OpenAI's Sam Altman: Shifted from saying in November "the rate of progress continues" to declaring in January "we are now confident we know how to build AGI" * Anthropic's Dario Amodei: Stated in January "I'm more confident than I've ever been that we're close to powerful capabilities... in the next 2-3 years" * Google DeepMind's Demis Hassabis: Changed from "as soon as 10 years" in autumn to "probably three to five years away" by January. What explains the shift? Is it just hype? Or could we really have Artificial General Intelligence (AGI) by 2028?[1] In this article, I look at what's driven recent progress, estimate how far those drivers can continue, and explain why they're likely to continue for at least four more years. In particular, while in 2024 progress in LLM chatbots seemed to slow, a new approach started to work: teaching the models to reason using reinforcement learning. In just a year, this let them surpass human PhDs at answering difficult scientific reasoning questions, and achieve expert-level performance on one-hour coding tasks. We don't know how capable AGI will become, but extrapolating the recent rate of progress suggests that, by 2028, we could reach AI models with beyond-human reasoning abilities, expert-level knowledge in every domain, and that can autonomously complete multi-week projects, and progress would likely continue from there.  On this set of software engineering & computer use tasks, in 2020 AI was only able to do tasks that would typically take a human expert a couple of seconds. By 2024, that had risen to almost an hour. If the trend continues, by 2028 it'll reach several weeks.  No longer mere chatbots, these 'agent' models might soon satisfy many people's definitions of AGI — roughly, AI systems that match human performance at most knowledge work (see definition in footnote).[1] This means that, while the co
saulius
 ·  · 22m read
 · 
Summary In this article, I estimate the cost-effectiveness of five Anima International programs in Poland: improving cage-free and broiler welfare, blocking new factory farms, banning fur farming, and encouraging retailers to sell more plant-based protein. I estimate that together, these programs help roughly 136 animals—or 32 years of farmed animal life—per dollar spent. Animal years affected per dollar spent was within an order of magnitude for all five evaluated interventions. I also tried to estimate how much suffering each program alleviates. Using SADs (Suffering-Adjusted Days)—a metric developed by Ambitious Impact (AIM) that accounts for species differences and pain intensity—Anima’s programs appear highly cost-effective, even compared to charities recommended by Animal Charity Evaluators. However, I also ran a small informal survey to understand how people intuitively weigh different categories of pain defined by the Welfare Footprint Institute. The results suggested that SADs may heavily underweight brief but intense suffering. Based on those findings, I created my own metric DCDE (Disabling Chicken Day Equivalent) with different weightings. Under this approach, interventions focused on humane slaughter look more promising, while cage-free campaigns appear less impactful. These results are highly uncertain but show how sensitive conclusions are to how we value different kinds of suffering. My estimates are highly speculative, often relying on subjective judgments from Anima International staff regarding factors such as the likelihood of success for various interventions. This introduces potential bias. Another major source of uncertainty is how long the effects of reforms will last if achieved. To address this, I developed a methodology to estimate impact duration for chicken welfare campaigns. However, I’m essentially guessing when it comes to how long the impact of farm-blocking or fur bans might last—there’s just too much uncertainty. Background In
 ·  · 7m read
 · 
Introduction This payout report covers the Animal Welfare Fund's grantmaking from January 1, 2025 to March 31, 2025 (3 months). It follows the previous October–December 2024 payout report. As mentioned in the 2024 review and the previous payout report, the Animal Welfare Fund  (AWF) made a conscious decision to increase transparency and prioritize more frequent communications about our work. As part of those efforts, we've resumed regular publication of detailed payout reports after previously reducing our public reporting to focus fund manager capacity on grant evaluations. With additional support now in place, we've streamlined our reporting process to provide comprehensive information about our grants and their intended impact. Given that these are recent grants, outcome data will not be included in the initial payout reports. We plan to share these reports quarterly to keep the community informed of our grantmaking activities. Update to private grant reporting While we aim to increase the AWF’s transparency, we also recognize the important benefits that private grants provide: protecting organizations from government harassment, reducing risks of damage to strategic relationships between organizations and industry players, and maintaining security for sensitive work. We don't want strong applicants to be discouraged from applying due to concerns about public reporting and therefore miss out on the impact they could have.  To balance the risks that public reporting has with the benefits of transparency, we are establishing a new approach for private reporting: private grants will be included within payout reports, but we will include them in an anonymized format (e.g. “$350,000 - across three organizations working on fish welfare”, or “$120,000 - welfare improvements in East Asia”), and in some cases, still only list the amount and not the purpose, (e.g. “$50,000 - private grant”). The latter will only be done if we think disclosing details poses a risk of har