The statement we will be debating for the forum-choice debate week will be based on: "By default, the world where AI goes well for humans will also go well for other sentient beings".
The debate week will be held during the week of March 23 — 29.
This post is primarily to finalise the topic we will be discussing, and to ask for input from the Forum on the exact phrasing. However, I’m sure a bunch of readers who were following the vote for the forum-choice debate week will be wondering — why did the third-place entry win? I’ll address that first.
Why the third place entry won
In the voting post, in a footnote, I wrote: “Again, generic caveat here: there are legal and practical reasons that CEA might want to veto a topic. I personally think this is very unlikely, but we do reserve the right to veto if necessary. In the case of a veto, we'd move to the second highest karma topic.”
Unfortunately, what I thought was unlikely has indeed happened[1]. Extra unfortunately, we have vetoed the first two topics. As with most internal wranglings, there isn’t much that it’s rational for me to share here, so I probably can’t answer your further questions.
I’d like to make it very clear that me and Sarah think it is important that the EA movement has these conversations (and that we both favour transparency where possible). As of now, all I can say is that:
- We may be able to run one of the top two voted debates later in the year. I’ll let you know if this becomes possible.
- Our politics on the EA Forum policy remains this.
Thanks (prematurely) for understanding.
Refining the statement
As always, the first run at a debate statement is too ambiguous. Since it is Forum's-choice debate week, I'd love to get your input on how to rephrase the statement.
The current phrasing is: "By default, the world where AI goes well for humans will also go well for other sentient beings"
Ambiguities:
- "by default" - this must mean something like "in the nearest possible worlds where AI goes well for humans..."
- "the world where AI goes well" - what does it mean for "AI" to go well? The release of the first AGI? Or the integration of less powerful AI systems into society?
- "goes well for humans" - do we want to point to a specific outcome here (i.e. humans in control) or does "goes well" just mean "does not cause human extinction?"
- "sentient beings" - we could discuss non-human animals and/or digital minds
A few refined versions
"On the margin, it is better for animals to work on the transition to AGI going well, than directly working on AI for animal welfare"
@Kevin Xia 🔸 originally raised the debate statement here. By way of explanation he wrote:
It could probably be much more precise and nuanced, but specifically, I would want to assess whether "trying to make AI go well for all sentient beings" is marginally better supported through directly related work (e.g., AIxAnimals work) or through conventional AI safety measures - the latter of which would be supported if, e.g., making AI go well for humans will inevitably or is necessary to make sure that AI goes well for all. Although if it is necessary, it would depend further on how likely AI will go well for humans and such; but I think a general assessment of AI futures that go well for humans would be a great and useful starting point for me.
We could interpret this as a narrower version of our earlier debate week, where we discussed "on the margin, it is better to work on reducing the chance of our extinction than increasing the value of the future where we survive".
This phrasing still has some key ambiguities. Is "AI going well" aligned AGI? If it is, does it matter if the AGI is aligned but humans are disempowered, or do we also need controllable aligned AGI? What is direct work for AI animal welfare?
"Without extra animal-focused work, even aligned superintelligence would be bad for non-human animals"
In this one I've opted for 'aligned' and 'superintelligence' to get at the idea of a future with a very powerful AI which humans have limited or no control of, but which acts in accordance with our values. As systems marketed as "AGI" get closer and closer, using "superintelligence" feels like it gets the vibe across a little better (though it is imprecise).
'Animal-focused work' still feels uncomfortably vague...
"AGI which doesn't cause human extinction or disempowerment will value animal welfare"
This focuses more on one of the cruxes that Kevin raised - whether aligning AI would make it inevitably good for animals. However it somewhat marginalises the question about AIxAnimals work.
Leave your feedback below
I'd like to announce a finalised statement by the end of the week. Especially useful feedback would:
- Argue for a particular phrasing.
- Point out an ambiguous term which could derail the debate.
- ^
Though, as all good bayesians know, that doesn’t mean I was wrong!

I'm really looking forward to the debate on this topic!
Some thoughts: