I hear two conflicting voices in my head, and in EA:
- Voice: it's highly uncertain whether deworming is effective, based on 20 years of research, randomized controlled trials, and lots of feedback. In fact, many development interventions have a small or negative impact.
- Same voice: we are confident that work for improving the far future is effective, based on <insert argument involving the number of stars in the universe>.
I believe that I could become convinced to work on artificial intelligence or extinction risk reduction. My main crux is that these problems seem intractable. I am worried that my work would have a negligible or a negative impact.
These questions are not sufficiently addressed yet, in my opinion. So far, I've seen mainly vague recommendations (e.g., "community building work does not increase risks" or "look at the success of nuclear disarmament"). Examples of existing work for improving the far future often feel very indirect (e.g., "build a tool to better estimate probabilities ⇒ make better decisions ⇒ facilitate better coordination ⇒ reduce the likelihood of conflict ⇒ prevent a global war ⇒ avoid extinction") and thus disconnected from actual benefits for humanity.
One could argue that uncertainty is not a problem, that it is negligible when considering the huge potential benefit of work for the far future. Moreover, impact is fat-tailed, and thus the expected value dominated by a few really impactful projects, and thus it's worth trying projects even if they have low success probability[1]. This makes sense, but only if we can protect against large negative impacts. I doubt we really can — for example, a case can be made that even safety-focused AI researchers accelerate AI and thus increase its risks.[2]
One could argue that community building or writing "what we owe the future" are concrete ways to do good for the future . Yet this seems to shift the problem rather than solve it. Consider a community builder who convinces 100 people to work on improving the far future. There are now 100 people doing work with uncertain, possibly-negative impact. The community builder's impact is some function which is similarly uncertain and possibly negative. This is especially true if is fat-tailed, as the impact will be dominated by the most successful (or most destructive) people.
To summarize: How can we reliably improve the far future, given that even near-termist work like deworming, with plenty of available data and research and rapid feedback loops and simple theories, so often fails? As someone who is eager to do spend my work time well, who thinks that our moral circle should include the future, but who does not know ways to reliably improve it... what should I do?
Will MacAskill on fat-tailed impact distribution: https://youtu.be/olX_5WSnBwk?t=695 ↩︎
For examples on this forum, see When is AI safety research harmful? or What harm could AI safety do? ↩︎
Several people whom I respect hold the view that AI safety might be dangerous. For example, here's Alexander Berger tweeting about it.
A brief list of potential risks:
Conflicts of interests: Much AI safety work is done by companies who develop AIs. Max Tegmark makes this analogy: What would we think if a large part of climate change research were done by oil companies, or a large part of lung cancer research by tobacco companies? This situation probably makes AI safety research weaker. There is also the risk that it improves the reputation of AI companies, so that their non-safety work can advance faster and more boldly. And it means safety is delegated to a subteam rather than being everyone's responsibility (different from, say, information security).
Speeding up AI: Even well-meaning safety work likely speeds up the overall development of AI. For example, interpretability seems really promising for safety, but at the same time it is a quasi-necessary condition to deploy a powerful AI system. If you look at (for example) the recent papers from anthropic.com, you will find many techniques that are generally useful to build AIs.
Information hazard: I admire work like the Slaughterbots video from the Future of Life Institute. Yet it has clear infohazard potential. Similarly, Michael Nielsen writes "Afaict talking a lot about AI risk has clearly increased it quite a bit (many of the most talented people I know working on actual AI were influenced to by Bostrom.)"
Other failure modes mentioned by MichaelStJules: