I tried doing a Fermi estimation of the impact I would have if I worked on AI safety, and I realized it wasn't easy to do with only a calculator. So I build a website which does this Fermi estimation given your beliefs about AGI, AI safety, and your impact on AI safety progress.

You can try it out here: https://xriskcalculator.vercel.app/

This tool focuses on technical work, and assumes that progress on AGI and progress on AI safety are independent. This is obviously an approximation that is vastly inaccurate, but for now I don't think of a simple way of taking into account the fact that advanced AI could speed up AI safety progress. Other limitations are outlined on the website.

What do you think of this tool? Do you think of a way it could be improved?

Note: this is still work in progress. If you want to use this tool to make important decisions, please contact me so that I increase its reliability.

24

0
0

Reactions

0
0
Comments13


Sorted by Click to highlight new comments since:

Maybe add ways working on it can backfire, either explicitly in the model, or by telling people to take expectations with potentials for backfire in mind, and allow for the possibility that you do more harm than good in the final estimate.

How would you model these effects? I have two ideas :

  1. add a section with how much you speed up AGI (but I'm not sure how I could break this down further)
  2. add a section with how likely it would be for you to take on resources away from other actions that could be used to save the world (either through better AI safety, or something else)

Is one of them what you had in mind? Do you have other ideas?

Ya, those were some of the kinds of things I had in mind, and also the possibility of contributing to or reducing s-risks, and adjustable weights to s-risks vs extinction:

https://arbital.com/p/hyperexistential_separation/

https://reducing-suffering.org/near-miss/

Because of the funding situation, taking resources away from other actions to reduce extinction risks would probably mostly come in people's time, e.g. the time of the people supervising you, reading your work or otherwise engaging with you. If an AI safety org hires you or you get a grant to work on something, then presumably they think you're worth the time, though! And one more person going through the hiring or grant process is not that costly for those managing it.

Describe what fraction of the AGI safety progress your field will be responsible for, and how much you think you will speedup your field's progress

Describe what fraction of the AGI safety work your organization is doing, and how much you think you will speedup your organization's progress in this direction

These should have lower defaults, I think

I talked to people who think defaults should be higher. I really don't know where they should be.

I put "fraction of the work your org. is doing" at 5% because I was thinking about a medium-sized AGI safety organization (there are around 10 of them, so 10% each seems sensible), and because I expect that there will be many more in the future, I put 5%.

I put "how much are you speeding up your org." at 1%, because there are around 10 people doing core research in each org., but you are only slightly better than the second-best candidate who would have taken the job, so 1% seemed reasonable. I don't expect this percentage to go down, because as the organization scale up, senior members become more important. Having "better" senior researchers, even if there are hundreds of junior researchers, would probably speed up progress quite a lot.

Where do you think the defaults should be, and why?

I've discovered something that is either a bug in the code, or a parameter that isn't explained super well. 

Under "How likely is it to work" I assume "it" refers to AGI safety. If so, this parameter is reversed - the more likely I say AGI safety is to work, the higher the x-risk becomes. If I set it to 0%, the program reliably tells me there's no chance the world ends.

I made the text a bit more clear. As for the bug, it didn't affect the end result of the Fermi estimation but how I computed the intermediate "probability of doom" was wrong: I forgot to take into account situations where AGI safety ended up being impossible... It is fixed now.

Thank you for the feedback!

I like the tool! One thing I would like to have added is total impact. I ended up using a calculator on a different webpage, but it would be nice to include something like "Expected lives saved", even if that's just 7 billion * P(world saved by you) that updates whenever P(world saved) does.

At first, I thought this would be distracting, as there are many orders of magnitudes between the lowest "lives saved if you avoid extinction" estimations and the higher ones. But given that you're not the first to ask for that, I think it would be a good idea to add this feature! I will probably add that soon.

I added this feature!

Great to see tools like this that make assumptions clear - I think not only useful as a calculator but as a concrete operalisation of your model of AI risk, which is a good starting point for discussion. Thanks for creating!

This tool is impressive, thanks! I like the framing you use of safety as a race against capabilities, though think don't really know what it would look like to have "solved " AGI safety 20 years before AGI. I also appreciate all the assumptions being listed at the end of the page.

Some minor notes

  • the GitHub link in the webpage footer points to the wrong page
  • I think two of the prompts "How likely is it to work?" and "How much do you speed it up?" would be made clearer if "it" was replaced by AGI safety (if that is what it is referring to).

Thank you for the feedback. It's fixed now!

Curated and popular this week
 ·  · 32m read
 · 
Summary Immediate skin-to-skin contact (SSC) between mothers and newborns and early initiation of breastfeeding (EIBF) may play a significant and underappreciated role in reducing neonatal mortality. These practices are distinct in important ways from more broadly recognized (and clearly impactful) interventions like kangaroo care and exclusive breastfeeding, and they are recommended for both preterm and full-term infants. A large evidence base indicates that immediate SSC and EIBF substantially reduce neonatal mortality. Many randomized trials show that immediate SSC promotes EIBF, reduces episodes of low blood sugar, improves temperature regulation, and promotes cardiac and respiratory stability. All of these effects are linked to lower mortality, and the biological pathways between immediate SSC, EIBF, and reduced mortality are compelling. A meta-analysis of large observational studies found a 25% lower risk of mortality in infants who began breastfeeding within one hour of birth compared to initiation after one hour. These practices are attractive targets for intervention, and promoting them is effective. Immediate SSC and EIBF require no commodities, are under the direct influence of birth attendants, are time-bound to the first hour after birth, are consistent with international guidelines, and are appropriate for universal promotion. Their adoption is often low, but ceilings are demonstrably high: many low-and middle-income countries (LMICs) have rates of EIBF less than 30%, yet several have rates over 70%. Multiple studies find that health worker training and quality improvement activities dramatically increase rates of immediate SSC and EIBF. There do not appear to be any major actors focused specifically on promotion of universal immediate SSC and EIBF. By contrast, general breastfeeding promotion and essential newborn care training programs are relatively common. More research on cost-effectiveness is needed, but it appears promising. Limited existing
Ben_West🔸
 ·  · 1m read
 · 
> Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. > > The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts. > > Full paper | Github repo Blogpost; tweet thread. 
 ·  · 2m read
 · 
For immediate release: April 1, 2025 OXFORD, UK — The Centre for Effective Altruism (CEA) announced today that it will no longer identify as an "Effective Altruism" organization.  "After careful consideration, we've determined that the most effective way to have a positive impact is to deny any association with Effective Altruism," said a CEA spokesperson. "Our mission remains unchanged: to use reason and evidence to do the most good. Which coincidentally was the definition of EA." The announcement mirrors a pattern of other organizations that have grown with EA support and frameworks and eventually distanced themselves from EA. CEA's statement clarified that it will continue to use the same methodologies, maintain the same team, and pursue identical goals. "We've found that not being associated with the movement we have spent years building gives us more flexibility to do exactly what we were already doing, just with better PR," the spokesperson explained. "It's like keeping all the benefits of a community while refusing to contribute to its future development or taking responsibility for its challenges. Win-win!" In a related announcement, CEA revealed plans to rename its annual EA Global conference to "Coincidental Gathering of Like-Minded Individuals Who Mysteriously All Know Each Other But Definitely Aren't Part of Any Specific Movement Conference 2025." When asked about concerns that this trend might be pulling up the ladder for future projects that also might benefit from the infrastructure of the effective altruist community, the spokesperson adjusted their "I Heart Consequentialism" tie and replied, "Future projects? I'm sorry, but focusing on long-term movement building would be very EA of us, and as we've clearly established, we're not that anymore." Industry analysts predict that by 2026, the only entities still identifying as "EA" will be three post-rationalist bloggers, a Discord server full of undergraduate philosophy majors, and one person at