Lucius Caviola's post mentioned the "happy servants" problem:
This issue also mentioned as a key research question in Digital Minds: Importance and Key Research Questions by Mogensen, Saad, and Butlin.
This is just a note to flag that there's also some discussion of this issue in Carl Shulman's recent 80,000 podcast episode. (cf. also my post about that episode.)
Rob Wiblin: Yeah. The idea of training a thinking machine to just want to take care of you and to serve your every whim, on the one hand, that sounds a lot better than the alternative. On the other hand, it does feel a little bit uncomfortable. There’s that famous example, the famous story of the pig that wants to be eaten, where they’ve bred a pig that really wants to be farmed and consumed by human beings. This is not quite the same, but I think raises some of the same discomfort that I imagine people might have at the prospect of creating beings that enjoy subservience to them, basically. To what extent do you think that discomfort is justified?
Carl Shulman: So the philosopher Eric Schwitzgebel has a few papers on this subject with various coauthors, and covers that kind of case. He has a vignette, “Passion of the Sun Probe,” where there’s an AI placed in a probe designed to descend into the sun and send back telemetry data, and then there has to be an AI present in order to do some of the local scientific optimisation. And it’s made such that, as it comes into existence, it absolutely loves achieving this mission and thinks this is an incredibly valuable thing that is well worth sacrificing its existence.
And Schwitzgebel finds that his intuitions are sort of torn in that case, because we might well think it sort of heroic if you had some human astronaut who was willing to sacrifice their life for science, and think this is achieving a goal that is objectively worthy and good. And then if it was instead the same sort of thing, say, in a robot soldier or a personal robot that sacrifices its life wi
There have been a few valid critiques of the debate framing, so I'll make some points to respond to each of them. A general point before I start is that you should feel free to use the discussion thread to outline your opinion, and/or, your interpretation of the debate statement. I.e. "I strongly agree with the debate statement, because I think 5% of EA Funding over the next decade is the right amount to allocate to this cause area".
1- @Jason brings up the ambiguity of the term "unrestricted" in footnote 2. I was thinking of unrestricted funding as all funding which is being allocated according to (roughly) impartial consequentialist reasoning, i.e. (roughly) EA principles. I'm contrasting that to restricted funds, for example, funds from a foundation that supports aid charities which happen to be given to an EA aid charity.
2- @finm makes a very fair point in this comment: over what timescale are we allocating 5% of the EA funds? This seems like an oversight rather than an accidental ambiguity- if I were to write this again, I might have chosen the next decade, or the next year. Given that 340 users have already voted, I won't change something so substantial now, but again, feel free to clarify your vote in the discussion thread.
3- @NickLaing argues that 5% of funding might be too high a bar to simply be labelled an "EA priority". I think this is a good point, and maybe the more accurate phrasing would be "Top EA Priority", or the entire statement should have been relative, for example: "AI Welfare should be more of an EA priority" and the footnote could clarify this means that a strong agree = we should triple the funding and talent going into it. Again, I won't change the phrasing now because it doesn't seem fair for earlier voters, but I can see the case for this.
Thanks for the feedback and meta-debate, very EA, keep it up!
I feel like 5% of EA directed funding is a high bar to clear to agree with the statement "“AI welfare should be an EA priority”. I would have maybe pitched for maybe 1% 2% as the "priority" bar, which would still be 10 million dollars a year even under quite conservative assumptions as to what would be considered unrestricted EA funding.
This would mean that across all domains (X-risk, animal welfare, GHD) a theoretical maximum of 20 causes, more realistically maybe 5-15 causes (assuming some causes warrant 10-30% of funding) would be considered EA Priorities. 80,000 hours doesn't have AI welfare in their top 8 causes but it is in their top 16, so I doubt it would clear the "5%" bar, even though they list it under their "Similarly pressing but less developed areas", which feels priorityish to me (perhaos they could share their perspective?)
It could also depend how broadly we characterise causes. Is "Global Health and development" one cause, or are Mosquito nets, deworming and cash transfers all their own causes? I would suspect the latter.
Many people could therefore consider AI welfare an important cause area in their eyes but disagree with the debate statement because they don't think it warrants a large 5%+ of EA funding despite its importance.
Or I could be wrong and many could consider 5% a reasonable or even low bar. Its clearly a subjective question and not the biggest deal but hey :D.
On "AI welfare should be an EA priority," approximately how much EA talent/funding is "unrestricted"?
If, say, only 10% is unrestricted, 5% of that would be 0.5% of all EA talent/funding. The question then would be, roughly: should AI welfare be even a minor/non-trivial EA cause area? If 90% is unrestricted, then a "priority" would be 4.5% of all EA talent/funding = ~ should AI welfare be a considerable EA cause area?
The difference between those statements would influence where I placed on the agree-disagree continuum.
My position on "AI welfare"
1. If we achieve existential security and launch the von Neumann probes successfully, we will be able to do >>10^80 operations in expectation. We could tile the universe with hedonium or do acausal trade or something and it's worth >>10^60 happy human lives in expectation. Digital minds are super important.
2. Short-term AI suffering will be small-scale—less than 10^40 FLOP and far from optimized for suffering, even if suffering is incidental—and worth <<10^20 happy human lives (very likely <10^10).
3. 10^20 isn't even a feather in the scales when 10^60 is at stake.
4. "Lock-in" [edit: of "AI welfare" trends on Earth] is very unlikely; potential causes of short-term AI suffering (like training and deploying LLMs) are very different from potential causes of astronomical-scale digital suffering (like tiling the universe with dolorium, the arrangement of matter optimized for suffering). And digital-mind-welfare research doesn't need to happen yet; there will be plenty of subjective time for it before the von Neumann probes' goals are set.
5. Therefore, to a first approximation, we should not trade off existential security for short-term AI welfare, and normal AI safety work is the best way to promote long-term digital-mind-welfare.
[Edit: the questionable part of this is #4.]