I tried doing a Fermi estimation of the impact I would have if I worked on AI safety, and I realized it wasn't easy to do with only a calculator. So I build a website which does this Fermi estimation given your beliefs about AGI, AI safety, and your impact on AI safety progress.
You can try it out here: https://xriskcalculator.vercel.app/
This tool focuses on technical work, and assumes that progress on AGI and progress on AI safety are independent. This is obviously an approximation that is vastly inaccurate, but for now I don't think of a simple way of taking into account the fact that advanced AI could speed up AI safety progress. Other limitations are outlined on the website.
What do you think of this tool? Do you think of a way it could be improved?
Note: this is still work in progress. If you want to use this tool to make important decisions, please contact me so that I increase its reliability.
Maybe add ways working on it can backfire, either explicitly in the model, or by telling people to take expectations with potentials for backfire in mind, and allow for the possibility that you do more harm than good in the final estimate.
How would you model these effects? I have two ideas :
Is one of them what you had in mind? Do you have other ideas?
Ya, those were some of the kinds of things I had in mind, and also the possibility of contributing to or reducing s-risks, and adjustable weights to s-risks vs extinction:
Because of the funding situation, taking resources away from other actions to reduce extinction risks would probably mostly come in people's time, e.g. the time of the people supervising you, reading your work or otherwise engaging with you. If an AI safety org hires you or you get a grant to work on something, then presumably they think you're worth the time, though! And one more person going through the hiring or grant process is not that costly for those managing it.
These should have lower defaults, I think
I talked to people who think defaults should be higher. I really don't know where they should be.
I put "fraction of the work your org. is doing" at 5% because I was thinking about a medium-sized AGI safety organization (there are around 10 of them, so 10% each seems sensible), and because I expect that there will be many more in the future, I put 5%.
I put "how much are you speeding up your org." at 1%, because there are around 10 people doing core research in each org., but you are only slightly better than the second-best candidate who would have taken the job, so 1% seemed reasonable. I don't expect this percentage to go down, because as the organization scale up, senior members become more important. Having "better" senior researchers, even if there are hundreds of junior researchers, would probably speed up progress quite a lot.
Where do you think the defaults should be, and why?
I've discovered something that is either a bug in the code, or a parameter that isn't explained super well.
Under "How likely is it to work" I assume "it" refers to AGI safety. If so, this parameter is reversed - the more likely I say AGI safety is to work, the higher the x-risk becomes. If I set it to 0%, the program reliably tells me there's no chance the world ends.
I made the text a bit more clear. As for the bug, it didn't affect the end result of the Fermi estimation but how I computed the intermediate "probability of doom" was wrong: I forgot to take into account situations where AGI safety ended up being impossible... It is fixed now.
Thank you for the feedback!
I like the tool! One thing I would like to have added is total impact. I ended up using a calculator on a different webpage, but it would be nice to include something like "Expected lives saved", even if that's just 7 billion * P(world saved by you) that updates whenever P(world saved) does.
At first, I thought this would be distracting, as there are many orders of magnitudes between the lowest "lives saved if you avoid extinction" estimations and the higher ones. But given that you're not the first to ask for that, I think it would be a good idea to add this feature! I will probably add that soon.
I added this feature!
Great to see tools like this that make assumptions clear - I think not only useful as a calculator but as a concrete operalisation of your model of AI risk, which is a good starting point for discussion. Thanks for creating!
This tool is impressive, thanks! I like the framing you use of safety as a race against capabilities, though think don't really know what it would look like to have "solved " AGI safety 20 years before AGI. I also appreciate all the assumptions being listed at the end of the page.
Some minor notes
Thank you for the feedback. It's fixed now!