Making AI Welfare an EA priority requires justifications that have not been given

9

> It is not good enough to simply say that an issue might have a large scale impact and therefore think it should be an EA priority [...]
I think that this is wrong. The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously and provides prima facie evidence that it should be a priority. I think it is vastly preferrable [sic] to preempt problems before they occur rather than try to fix them once they have. For one, AI welfare is a very complicated topic that will take years or decades to sort out. AI persons (or things that look like AI persons) could easily be here in the next decade. If we don't start thinking about it soon, then we may be years behind when it happens.

I feel like you are talking past the critique. For an intervention to be a longtermist priority, there needs to be some kind of story for how it improves the long-term future. Sure, AI welfare may be a large-scaled problem which takes decades to sort out (if tackled by unaided humans), but that alone does not mean it should be worked on presently. Your points here do not engage with the argument, made by @Zach Stein-Perlman early on in the week, that we can just punt solving AI welfare to the future (i.e., to the long reflection / to once we have aligned superintelligent advisors), and in the meantime continue focusing our resources on AI safety (i.e., on raising the probability that we make it to a long reflection).

(There is an argument going in the opposite direction that a long reflection might not happen following alignment success, and so doing AI welfare work now might indeed make a difference to what gets locked in for the long-term. I am somewhat sympathetic to this argument, as I wrote here, but I still don’t think it delivers a knockdown case for making AI welfare work a priority.)

Likewise, for an intervention to be a neartermist priority, there has to be some kind of quantitative estimate demonstrating that it is competitive—or will soon be competitive, if nothing is done—in terms of suffering prevented per dollar spent, or similar, with the current neartermist priorities. Factory farming seems like the obvious thing to compare AI welfare against. I’ve been surprised by how nobody has tried coming up with such an estimate this week, however rough. (Note: I’m not sure if you are trying to argue that AI welfare should be both a neartermist and longtermist priority, as some have.)

(Note also: I’m unsure how much of our disagreement is simply because of the “should be a priority” wording. I agree with JWS’s current “It is not good enough…” statement, but would think it wrong if the “should” were replaced with “could.” Similarly, I agree with you as far as: “The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously.”)

[ETA: On a second read, this comment of mine seems a bit more combative than I intended—sorry about that.]

Derek Shiller

5

For an intervention to be a longtermist priority, there needs to be some kind of concrete story for how it improves the long-term future.

I disagree with this. With existential risk from unaligned AI, I don't think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone. People have speculated about components of the story, but generally not in a super concrete way, and it isn't clear how standard AI safety research would address a very specific disaster scenario. I don't think this is a problem: we shouldn't expect to know all the details of how things go wrong in advance, and it is worthwhile to do a lot of preparatory research that might be helpful so that we're not fumbling through basic things during a critical period. I think the same applies to digital minds.

Your points here do not engage with the argument, made by @Zach Stein-Perlman early on in the week, that we can just punt solving AI welfare to the future (i.e., to the long reflection / to once we have aligned superintelligent advisors), and in the meantime continue focusing our resources on AI safety (i.e., on raising the probability that we make it to a long reflection).

I think this viewpoint is overly optimistic about the probability of locking in / the relevance of superintelligent advisors. I discuss some of the issues of locking in in a contribution to the debate week. In brief, I think that it is possible that digital minds will be sufficiently integrated in the next few decades that they will have power in social relationships that will be extremely difficult to disentangle. I also think that AGI may be useful in drawing inferences from our assumptions, but won't be particularly helpful at setting the right assumptions.

6

With existential risk from unaligned AI, I don't think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone.

This should be evidence against AI x-risk!^[1] Even in the atmospheric ignition case in Trinity, they had more concrete models to use. If we can't build a concrete model here, then it implies we don't have a concrete/convincing case for why it should be prioritised at all, imo. It's similar to the point in my footnotes that you need to argue for both p and p->q, not just the latter. This is what I would expect to see if the case for p was unconvincing/incorrect.

I don't think this is a problem: we shouldn't expect to know all the details of how things go wrong in advance

Yeah I agree with this. But the uncertainty and cluelessness in the future should decrease one's confidence that they're working on the most important thing in the history of humanity, one would think.

and it is worthwhile to do a lot of preparatory research that might be helpful so that we're not fumbling through basic things during a critical period. I think the same applies to digital minds.

I'm all in favour of research, but how much should that research get funded? Can it be justified above other potential uses of money and general resource? Should it be an EA priority as defined by the AWDW framing? These we (almost) entirely unargued for.

^{^}
Not dispositive evidence perhaps, but a consideration

Mo Putera

3

> For an intervention to be a longtermist priority, there needs to be some kind of concrete story for how it improves the long-term future.
I disagree with this. With existential risk from unaligned AI, I don't think anyone has ever told a very clear story about how AI will actually get misaligned, get loose, and kill everyone.

When I read the passage you quoted I thought of e.g. Critch's description of RAAPs and Christiano's what failure looks like, both of which seem pretty detailed to me without necessarily fitting the "AI gets misaligned, gets loose and kills everyone" meme; both Critch and Christiano seem to me to be explicitly pushing back against consideration of only that meme, and Critch in particular thinks work in this area is ~neglected (as of 2021, I haven't kept up with goings-on). I suppose Gwern's writeup comes closest to your description, and I can't imagine it being more concrete; curious to hear if you have a different reaction.

5

To add to the intensive animal agriculture analogy: this time, people are designing them, which provides a lot of reason to believe early intervention can affect AI welfare compared to animal agriculture.

2

Thanks for extensive reply Derek :)

Even if you think that AI welfare is important (which I do!), the field doesn't have the existing talent pipelines or clear strategy to absorb $50 million in new funding each year.

Yep completely agree here, and as Siebe pointed out I did got to the extreme end of 'make the changes right now'. It could be structured in more gradual way, and potential from more external funding.

The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously and provides prima facie evidence that it should be a priority.

I agree in principle on the huge scale point, but much less so the 'might be able to do something'. I think we need a lot more than that, we need something tractable to get going, especially for something to be considered a priority. I think the general form of argument I've seen this week is that AI Welfare could have a huge scale, therefore it should be an EA priority without much to flesh out the 'do something' part.

AI persons (or things that look like AI persons) could easily be here in the next decade...AI people (of some form or other) are not exactly a purely hypothetical technology,

I think I disagree empirically here. Counterfeit "people" might be here soon, but I am not moved much by arguments that digital 'life' with full agency, self-awareness, autopoiesis, moral values, moral patienhood etc will be here in the next decade. Especially not easily here. I definitely think that case hasn't been made, and I think (contra Chris in the other thread) that claims of this sort should have been made much more strongly during AWDW.

We might have that opportunity now with AI welfare. Perhaps this means that we only need a small core group, but I do think some people should make it a priority.

Some small people should, I agree. Funding Jeff Sebo and Rob Long? Sounds great. Giving them 438 research assistants and $49M in funding taken from other EA causes? Hell to the naw. We weren't discussing whether AI Welfare should be a priority for some EAs, we were discussing specific terms set out in the week's statement, and I feel like I'm the only person during this week who paid any attention to them.

Secondly, the 'we might have that opportunity' is very unconving to me. It's the same convingness to me of saying in 2008 that '"If CERN is turned on, it make create a black hole that destroys the world. Nobody else is listening. We might only have the opportunity to act now!" It's just not enough to be action-guiding in my opinion.

I'm pretty aware the above is unfair to strong advocates of AI Safety and AI Welfare, but at the moment that's where the quality of arguments this week have roughly stood from my viewpoint.

Angelina Li

9

Nice, this was a helpful reframe for me. Thanks for writing this!

I wish more people posting during the debate week were more centered on addressing the specific debate question, instead of just generally writing interesting things — although it's easier to complain than contribute, and I'm glad for the content we got anyway :)

David_Moss

9

"5%+ of unrestricted EA talent and funding should be focused on the potential well-being of future artificial intelligence systems".
As a rough estimate for the number of EAs, I take the number of GWWC Pledgers even if they'd consider themselves 'EA-Adjacent'.^[2] At my last check, the lifetime members page stated there were 8,983 members, so 5% of that would be ~449 EAs working specifically or primarily on the potential well-being of future artificial intelligence systems.

This seems to me to be too expansive an operationalization of "EA talent".

If we're talking about how to allocate EA talent, it doesn't seem to be that it can be 'all EAs' or even all GWWC pledgers. Many of these people will be retired or earning to give, or unable to contribute to EA direct work for some other reason. Many don't even intend to do EA direct work. And many of those who are doing EA direct work will be doing ops or other meta work, so even the EA direct work total is not the total number who could be directly working as AI welfare researchers. I think, if we use this bar, then most EA cause areas won't reach 5% of EA talent.

In a previous survey, we found 8.7% of respondents worked in an EA org. This is likely an overestimate, because fewer less engaged EAs (who are less likely to take the survey) are EA org employees). 8.7% of the total EA community (assuming growth based on the method we employed in 2019 and 2020 implies around 1300 people in EA orgs (5% of which would be around 67 people). We get a similar estimate from applying the method above to the total number of people who reported working in EA orgs in 2022. To be sure, the number of people who are in specifically EA orgs will undercount total talent, since some people are doing direct work outside EA orgs. But using the 2022 numbers for people reporting they are doing direct work, would only increase the 5% figure to around 114 (which I argue would still need to be discounted for people doing ops and similar work, if we want to estimate how many people should be doing AI welfare work specifically).

Jonas Hallgren 🔸

6

Great point, I did not think of the specific claim of 5% when thinking of the scale but rather whether more effort should be spent in general.

My brain basically did a motte and baily on me emotionally when it comes to this question so I appreciate you pointing that out!

It also seems like you're mostly critiquing the tractability of the claim and not the underlying scale nor neglectedness?

It kind of gives me some GPR vibes as for why it's useful to do right now and that dependent on initial results either less or more resources should be spent?

3

It also seems like you're mostly critiquing the tractability of the claim and not the underlying scale nor neglectedness?

Yep, everyone agrees it's neglected. My strongest critique is the tractability, which may be so low as to discount astronomical value. I do take a lot of issue with the scale as well though. I think that needs to be argued for rather than assumed. I also think trade-offs from other causes need to be taken into account at some point too.

And again, I don't think there's no arguments that can make traction on the scale/tractability that can make AI Welfare look like a valuable cause, but these arguments clearly weren't made (imho) in AWDW

SummaryBot

3

Executive summary: The proposition to make AI welfare an EA priority, which would allocate 5% of EA talent and funding to this cause, lacks sufficient justification and should not be supported without stronger arguments.

Key points:

Making AI welfare an EA priority would require significant reallocation of resources, potentially at the expense of other important causes.
The burden of proof for such a major shift in priorities is high and requires strong justifications, which have not been adequately provided during AI Welfare Debate Week.
Most arguments presented for AI welfare prioritization rely on speculative possibilities rather than concrete evidence or robust reasoning.
The author argues that the precautionary principle alone is insufficient justification for allocating significant resources to AI welfare.
While AI welfare research may be interesting and potentially valuable, this does not automatically qualify it as an EA priority.
The author recommends Forum voters lean against making AI welfare an EA priority until stronger justifications are provided, while acknowledging that such justifications may exist but have not been presented.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Chris Leong

3

I think it’s very valuable for you to state what the proposition would mean in concrete terms.

On the other hand, I think it’s quite reasonable for posts not spend time engaging with the question of whether “there will be vast numbers of AIs that are smarter than us”.

AI safety is already one of the main cause areas here and there’s been plenty of discussion about these kinds of points already.

If someone has something new to say on that topic, then it’d be great for them to share it, otherwise it makes sense for people to focus on discussing the parts of the topic that have not already been covered as part of the discussions on AI safety.

2

I think it’s very valuable for you to state what the proposition would mean in concrete terms.

It's not just concrete terms, it's the terms we've all agreed to vote on for the past week!

On the other hand, I think it’s quite reasonable for posts not spend time engaging with the question of whether “there will be vast numbers of AIs that are smarter than us”.

I think I just strongly disagree on this point. Not every post has to re-argue everything from the ground up, but I think every post does need at least a link or backing to why it believes that. Are people anchoring on Shulman/Cotra? Metaculus? Cold Takes? General feelings about AI progress? Drawing lines on graphs? Specific claims about the future that making reference only to scaled-up transformer models? These are all very different claims for the proposition, and differ in terms of types of AI, timelines, etc.

AI safety is already one of the main cause areas here and there’s been plenty of discussion about these kinds of points already.
If someone has something new to say on that topic, then it’d be great for them to share it, otherwise it makes sense for people to focus on discussing the parts of the topic that have not already been covered as part of the discussions on AI safety.

I again disagree, for two slightly different reasons:

I'm not sure how good the discussion has been about AI Safety. How much have these questions and cruxes actually been internalised? Titotal's excellent series on AI risk scepticism has been under-discussed in my opinion. There are many anecdotal cases of EAs (especially younger, newer ones) simply accepting the importance of AI causes through deference alone.^[1] At the latest EAG London, when I talked about AI risk skepticism I found surprising amounts of agreement with my positions even amongst well-known people working in the field of AI risk. There was certainly an interpretation that the Bay/AI-focused wing of EA weren't interested in discussing this at all.
Even if something is consensus, it should still be allowed (even encouraged) to be questioned. If EA wants to spend lots of money on AI Welfare (or even AI Safety), it should be very sure that it is one of the best ways we can impact the world. I'd like to see more explicit red-teaming of this in the community, beyond just Garfinkel on the 80k podcast.

^{^}
I also met a young uni organiser who was torn about AI risk, since they didn't really seem to be convinced of it but felt somewhat trapped by the pressure they felt to 'towe the EA line' on this issue

Chris Leong

2

What do you think was the best point that Titotal made?
I'm not saying it can't be questioned. And there wasn't a rule that you couldn't discuss it as part of the AI welfare week. That said, what's wrong with taking a week's break from the usual discussions that we have here to focus on something else? To take the discussion in new directions? A week is not that long.

4

I don't quite know what to respond here.^[1] If the aim was to discuss something differently then I guess there should have been a different debate prompt? Or maybe it shouldn't have been framed as a debate at all? Maybe it should have just prioritised AI Welfare as a topic and left it at that. I'd certainly have less of an issue with the posts that were were that have happened, and certainly wouldn't have been confused by the voting if there wasn't a voting slider.^[2]

^{^}
So I probably won't - we seem to have strong differing intuitions and intepretations of fact, which probably makes communication difficult
^{^}
But I liked the voting slider, it was a cool feature!

Cameron B

2

Responding to your critique of the model we put forward:

You still need to argue how and why it is useful and should be used to guide decision-making. In neither of the two cases, from what I can tell, do the authors attempt to understand which of the boxes we are in...

We argue that this model can be used to guide decision-making insofar as the Type II error in particular here seems very reckless from both s-risk and x-risk perspectives—and we currently lack the requisite empirical knowledge that would enable us to determine with any confidence which of these four quadrants we are currently in.

You seem to be claiming that this model would only be useful if we also attempted to predict which quadrant we are in, whereas the entire point we are making is that deep uncertainty surrounding this very question is a sufficiently alarming status quo that we should increase the amount of attention and resources being devoted to understanding what properties predict whether a given system is sentient. Hopefully this work would enable us to predict which quadrant we are in more effectively so that we can act accordingly.

In other words, the fact that we can't predict with any confidence which of these four worlds we are currently in is troubling given the stakes, and therefore calls for further work so we can be more confident ASAP.

2

Poll questions for clarity:

4

Debate contributions should have focused more on practical implications for EA priorities

2

AI welfare as an "EA priority" means:

spending on the order of $5-10M this year, and increasing it year by year only in high quality options, until at least 5% of EA total spending is reached
about 45 people working on AI welfare by approximately the middle of next year, and that increasing in a pace such that each position remains a good use of talent, until at least 5% of EA talent is reached