Fermi estimation of the impact you might have working on AI safety

frib

I tried doing a Fermi estimation of the impact I would have if I worked on AI safety, and I realized it wasn't easy to do with only a calculator. So I build a website which does this Fermi estimation given your beliefs about AGI, AI safety, and your impact on AI safety progress.

You can try it out here: https://xriskcalculator.vercel.app/

This tool focuses on technical work, and assumes that progress on AGI and progress on AI safety are independent. This is obviously an approximation that is vastly inaccurate, but for now I don't think of a simple way of taking into account the fact that advanced AI could speed up AI safety progress. Other limitations are outlined on the website.

What do you think of this tool? Do you think of a way it could be improved?

Note: this is still work in progress. If you want to use this tool to make important decisions, please contact me so that I increase its reliability.

24 Reactions

Comments13

Sorted by

New & upvoted

Click to highlight new comments since: Today at 7:55 PM

MichaelStJulesMay 13 20223

Maybe add ways working on it can backfire, either explicitly in the model, or by telling people to take expectations with potentials for backfire in mind, and allow for the possibility that you do more harm than good in the final estimate.

fribMay 14 20223

How would you model these effects? I have two ideas :

add a section with how much you speed up AGI (but I'm not sure how I could break this down further)
add a section with how likely it would be for you to take on resources away from other actions that could be used to save the world (either through better AI safety, or something else)

Is one of them what you had in mind? Do you have other ideas?

MichaelStJulesMay 14 20222

Ya, those were some of the kinds of things I had in mind, and also the possibility of contributing to or reducing s-risks, and adjustable weights to s-risks vs extinction:

https://arbital.com/p/hyperexistential_separation/

https://reducing-suffering.org/near-miss/

Because of the funding situation, taking resources away from other actions to reduce extinction risks would probably mostly come in people's time, e.g. the time of the people supervising you, reading your work or otherwise engaging with you. If an AI safety org hires you or you get a grant to work on something, then presumably they think you're worth the time, though! And one more person going through the hiring or grant process is not that costly for those managing it.

NunoSempereJun 5 20222

Describe what fraction of the AGI safety progress your field will be responsible for, and how much you think you will speedup your field's progress
Describe what fraction of the AGI safety work your organization is doing, and how much you think you will speedup your organization's progress in this direction

These should have lower defaults, I think

fribJun 6 20221

I talked to people who think defaults should be higher. I really don't know where they should be.

I put "fraction of the work your org. is doing" at 5% because I was thinking about a medium-sized AGI safety organization (there are around 10 of them, so 10% each seems sensible), and because I expect that there will be many more in the future, I put 5%.

I put "how much are you speeding up your org." at 1%, because there are around 10 people doing core research in each org., but you are only slightly better than the second-best candidate who would have taken the job, so 1% seemed reasonable. I don't expect this percentage to go down, because as the organization scale up, senior members become more important. Having "better" senior researchers, even if there are hundreds of junior researchers, would probably speed up progress quite a lot.

Where do you think the defaults should be, and why?

Jay BaileyMay 14 20221

I've discovered something that is either a bug in the code, or a parameter that isn't explained super well.

Under "How likely is it to work" I assume "it" refers to AGI safety. If so, this parameter is reversed - the more likely I say AGI safety is to work, the higher the x-risk becomes. If I set it to 0%, the program reliably tells me there's no chance the world ends.

fribMay 14 20221

I made the text a bit more clear. As for the bug, it didn't affect the end result of the Fermi estimation but how I computed the intermediate "probability of doom" was wrong: I forgot to take into account situations where AGI safety ended up being impossible... It is fixed now.

Thank you for the feedback!

Jay BaileyMay 13 20221

I like the tool! One thing I would like to have added is total impact. I ended up using a calculator on a different webpage, but it would be nice to include something like "Expected lives saved", even if that's just 7 billion * P(world saved by you) that updates whenever P(world saved) does.

fribMay 14 20221

At first, I thought this would be distracting, as there are many orders of magnitudes between the lowest "lives saved if you avoid extinction" estimations and the higher ones. But given that you're not the first to ask for that, I think it would be a good idea to add this feature! I will probably add that soon.

fribMay 18 20221

I added this feature!

Adam BinksmithMay 13 20221

Great to see tools like this that make assumptions clear - I think not only useful as a calculator but as a concrete operalisation of your model of AI risk, which is a good starting point for discussion. Thanks for creating!

Tristan CookMay 13 20221

This tool is impressive, thanks! I like the framing you use of safety as a race against capabilities, though think don't really know what it would look like to have "solved " AGI safety 20 years before AGI. I also appreciate all the assumptions being listed at the end of the page.

Some minor notes

the GitHub link in the webpage footer points to the wrong page
I think two of the prompts "How likely is it to work?" and "How much do you speed it up?" would be made clearer if "it" was replaced by AGI safety (if that is what it is referring to).

fribMay 14 20221

Thank you for the feedback. It's fixed now!