Apologies for writing this up quickly, but otherwise it'd likely never be written up at all as I've been wanting to write up something like this for at least the last year. If you think this is useful, feel completely free to copy this and write it up better.

TBH, I think that the exact argument is less important than the meta-point about how to deal with uncertainty: 1) try to figure out some robustly true statements 2) try to figure out which statements have an uncomfortably high chance of being true 3) see what you get by combining the two.

Premise 1: AGI is possible
Likelihood: Pretty darn likely. People have said that AI would never be able to play chess, or create a masterpiece or write a symphony and it looks like they're just wrong. The "humans are special" thesis doesn't seem to be a winning one.

Premise 2: By default, AGI has a reasonable chance of arriving in the next 30 or 40 years.
Likelihood: Seems pretty darn likely given the incredible progress in recent years. Some people are worried that we might run out of data, but we haven't run out of data and even when we do, there's tricks like data augmentation or synthetic data, not to mention that there are hints that we can make progress by focusing more on data quality. Happy to give a fuller explanation in the comments.

Premise 3: The invention of AGI will be one of the most significant things to ever happen to humanity, at least on the scale of the industrial revolution
Likelihood: Almost certainly true. How could technology that could do practically anything we can, but faster and with access with all information on the internet and the ability to learn from all its other instances in the world not have an insanely large impact?

Premise 4: There's a reasonable chance that AGI ends up being one of the best things to ever happen to us.
Likelihood: Doubt is quite reasonable here. It may be that competitive dynamics mean that there is no way that we can develop AGI without it being a complete disaster. Otherwise, mostly seems to follow from premise 3.

Premise 5: There's also a reasonable chance that the development of AGI ends up leading to unnecessary civilizational-level catastrophes (regardless of whether it ends up ultimately being for the best). 
Likelihood: Again, it's quite possible to doubt that these catastrophes will be unnecessary. Maybe competitive dynamics make them inevitable?

Even putting aside control risks, there's a large number of plausible threat models: mass-hacking, biological weapons, chemical weapons, AI warfare, election manipulation, great power conflict.

Some people have made arguments that the good guys win because they outnumber the bad guys, but how certain can we be of this? Seems quite plausible that at least one of these risks could have an exceptionally poor offense-defense balance.

Premise 6: We can make a significant difference here
Likelihood: Again, seems more likely than not, but again, it's quite possible that we're screwed no matter what due to competitive dynamics. Some people might argue that attempts in the past haven't gone so well and have even made the situation worse, but it's possible to learn from your mistakes, so I don't think we should conclude yet that we lack the ability to positively intervene.

Premise 7: If our understanding of which specific issues in AI are important changes then most of our skills or career capital will be useful for other issues related to AI as well
Likelihood: Seems pretty high, although there's a decent argument that persuading people to switch what they're doing is pretty hard and that we're not immune to that. Things like having strong technical AI knowledge, research skills and qualifications seem generally useful. The same is true for political capital, relationships with political players and political skill.

Therefore focusing on ensuring that AI goes well is likely to be one of the highest impact things we could focus on, even taking into account the uncertainty noted above.

Note that I made a general claim on focusing on making sure AI goes well rather than a more specific claim about the x-risks/catastrophic risks that EA tends to focus on. I agree that the x-risks/catastrophic risks are the most important area to focus on, but that's a conversation for another day. Right now, I'm focusing more on things that are likely to get broad agreement.

Please: this is an argument that there's a decent chance that the most important thing you could work on could something related to AI, not an argument that it is likely to be net-positive to just pick a random area of AI and start working on it without taking a lot of time to think through your model of the world.

One possible counter-argument would be to claim that there are not just a few things at the same level of importance, but actually many. One approach to this would be to demonstrate that the above argument proves too much.

In any case, I think this is a useful frame to better understand the AI x-risk/catastrophic risk position. I suspect that many people's views are often being driven by this often unstated model. Particularly, I suspect that arguments along these lines mean that the majority of takeover risk folks - if persuaded that takeover risks were actually not going to be a thing - would still likely believe that something to do with AI would be the most important thing for them to focus on. These arguments become even stronger if start taking into account personal fit.

Comments4


Sorted by Click to highlight new comments since:

Great post Chris, very clear. I'd like to add something of a bummer reply, to anyone reading:

Please don't work on AI Safety unless what is motivating you is the genuine desire to have a positive impact.

I think there is already a real failure mode where status motivated people are joining the space because (1) of the attention it is getting among the general public. I.e. it is 'sexy' and (2) the people they respect are also in the space.

If this kind of person is put in the position of losing status for what one believes is good and true (e.g. Stanislov Petrov) then I don't trust them to make the right decisions.

Maybe I'll write a post about this...

Chris - this is all quite reasonable.

However, one could dispute 'Premise 2: AGI has a reasonable chance of arriving in the next 30 or 40 years.'

Yes, without any organized resistance to the AI industry, the AI industry will develop AGI (if AGI is possible) -- probably fairly quickly.

But, if enough people accept Premise 5 (likely catastrophe) and Premise 6 (we can make a difference), then we can prevent AGI from arriving. 

In other words, the best way to make 'AI go well' may be to prevent AGI (or ASI) from happening at all. 

Good point. I added in “by default”.

Also, would be keen to hear if you think I should have restructured this argument in any other way?

Curated and popular this week
 ·  · 10m read
 · 
I wrote this to try to explain the key thing going on with AI right now to a broader audience. Feedback welcome. Most people think of AI as a pattern-matching chatbot – good at writing emails, terrible at real thinking. They've missed something huge. In 2024, while many declared AI was reaching a plateau, it was actually entering a new paradigm: learning to reason using reinforcement learning. This approach isn’t limited by data, so could deliver beyond-human capabilities in coding and scientific reasoning within two years. Here's a simple introduction to how it works, and why it's the most important development that most people have missed. The new paradigm: reinforcement learning People sometimes say “chatGPT is just next token prediction on the internet”. But that’s never been quite true. Raw next token prediction produces outputs that are regularly crazy. GPT only became useful with the addition of what’s called “reinforcement learning from human feedback” (RLHF): 1. The model produces outputs 2. Humans rate those outputs for helpfulness 3. The model is adjusted in a way expected to get a higher rating A model that’s under RLHF hasn’t been trained only to predict next tokens, it’s been trained to produce whatever output is most helpful to human raters. Think of the initial large language model (LLM) as containing a foundation of knowledge and concepts. Reinforcement learning is what enables that structure to be turned to a specific end. Now AI companies are using reinforcement learning in a powerful new way – training models to reason step-by-step: 1. Show the model a problem like a math puzzle. 2. Ask it to produce a chain of reasoning to solve the problem (“chain of thought”).[1] 3. If the answer is correct, adjust the model to be more like that (“reinforcement”).[2] 4. Repeat thousands of times. Before 2023 this didn’t seem to work. If each step of reasoning is too unreliable, then the chains quickly go wrong. Without getting close to co
 ·  · 11m read
 · 
My name is Keyvan, and I lead Anima International’s work in France. Our organization went through a major transformation in 2024. I want to share that journey with you. Anima International in France used to be known as Assiettes Végétales (‘Plant-Based Plates’). We focused entirely on introducing and promoting vegetarian and plant-based meals in collective catering. Today, as Anima, our mission is to put an end to the use of cages for laying hens. These changes come after a thorough evaluation of our previous campaign, assessing 94 potential new interventions, making several difficult choices, and navigating emotional struggles. We hope that by sharing our experience, we can help others who find themselves in similar situations. So let me walk you through how the past twelve months have unfolded for us.  The French team Act One: What we did as Assiettes Végétales Since 2018, we worked with the local authorities of cities, counties, regions, and universities across France to develop vegetarian meals in their collective catering services. If you don’t know much about France, this intervention may feel odd to you. But here, the collective catering sector feeds a huge number of people and produces an enormous quantity of meals. Two out of three children, more than seven million in total, eat at a school canteen at least once a week. Overall, more than three billion meals are served each year in collective catering. We knew that by influencing practices in this sector, we could reach a massive number of people. However, this work was not easy. France has a strong culinary heritage deeply rooted in animal-based products. Meat and fish-based meals remain the standard in collective catering and school canteens. It is effectively mandatory to serve a dairy product every day in school canteens. To be a certified chef, you have to complete special training and until recently, such training didn’t include a single vegetarian dish among the essential recipes to master. De
 ·  · 1m read
 · 
 The Life You Can Save, a nonprofit organization dedicated to fighting extreme poverty, and Founders Pledge, a global nonprofit empowering entrepreneurs to do the most good possible with their charitable giving, have announced today the formation of their Rapid Response Fund. In the face of imminent federal funding cuts, the Fund will ensure that some of the world's highest-impact charities and programs can continue to function. Affected organizations include those offering critical interventions, particularly in basic health services, maternal and child health, infectious disease control, mental health, domestic violence, and organized crime.
Relevant opportunities