Thank you for this interesting post, even though I don’t agree with your conclusions.
I believe one key difference between killing someone and letting someone die is its effect on one’s conscience.
If I kill someone, I violate their rights. Even if no one would directly know what I did with the invisible button, I’d know what I did, and that would eat at my conscience, and affect how I’d interact with everyone after that. Suddenly, I’d have less trust in myself to do the right thing (to not do what my conscience strongly tells me not to do), and the world would seem like a less safe place because I’d suspect that others would’ve made the same decision I did, and now might be effectively willing to kill me for a mere $6,000 if they could get away with it.
If I let someone die, I don’t violate their rights, and, especially if I don’t directly experience them dying, there’s just less of a pull on my conscience.
One could argue that our consciences don’t make sense and they should be more inline with classic utilitarianism, but I’d argue that we should be extremely careful about making big changes to human consciences in general without thoroughly thinking through and understanding the full range of the effects of these.
Also, I don’t think use of the term “moral obligation” is optimal, since to me it implies a form of emotional bullying/blackmail: you’re not a good person unless you satisfy your moral obligations. Instead, I’d focus on people being true to their own consciences. In my mind, it’s a question of trying to use someone’s self-hate to “beat goodness into them” versus trying to inspire their inner goodness to guide them because that’s what’s ultimately best for them.
By “self-hate,” I mean hate of the parts of ourselves that we think are “bad person” parts, but are really just “human nature” parts that we can accept about ourselves without that meaning we have to indulge them.
Have you tried cooking your best vegan recipes for others? In my experience sometimes people ask for the recipe and make it for themselves later, especially health-conscious people. For instance, I really like this vegan pumpkin pie that's super easy to make: https://itdoesnttastelikechicken.com/easy-vegan-pumpkin-pie/
Interesting idea, thanks for putting it out there. I'm currently trying to figure out better answers to some of the things you mentioned (at least "better" in terms of more in-line with my own intuitions). For example, I've been working on incorporating apparently non-consequentialist considerations into a utilitarian framework:
I'm currently doing this work unpaid and independently. I don't have a Patreon page for individuals to support it directly, in part because the lack of upvotes on my work has indicated little interest. If you'd like to support my work, though, please consider buying my ebook on honorable speech:
Honorable Speech: What Is It, Why Should We Care, and Is It Anywhere to Be Found in U.S. Politics?
Thanks!
I admit I get a bit lost in reading your comments as to what exactly you want me to respond to, so I’m going to try to write it out in a numbered list. Please correct/add to this list as you see fit and send it back to me and I’ll try to answer your actual points rather than what I think they are if I have them wrong:
#1 was what I was trying to get at with my last reply about how you could use a “weak AI” (something that’s less capable than an agentic AGI) to do the “conscience calculator” methodology and then just output a go/no go response to an inner aligned AGI as to what decision options it was allowed to take or not. The AGI would come up with the decision options based on some goal(s) it has, such as doing what a user asks of it, e.g., “make me lots of money!” The AGI would “brainstorm” possible paths to make lots of money and the “weak AI” would come back with a go/no go on a certain path because, for instance, it doesn’t involve or does involve stealing. Here I’ve been trying to illustrate that an AI system that had sufficient capabilities to follow my “conscience calculator” methodology wouldn’t need to have sufficient capabilities to follow a broad super-user command such as “Always do what a wise version of me would want you to do.”
Of course, to be useful, the AGI needs to be able to follow a non-super-user’s, i.e., a user’s, commands reasonably well, such as figuring out what the user means by “make me lots of money!” The crux, I think, is that I see “make me lots of money” as a significantly simpler concept that “always do what the wise me would want.” And basically what I’m trying to do with my conscience calculator is provide a framework to make it possible for an AGI of limited abilities to straight off the bat calculate what “wise me” would want with a sufficiently high accuracy for me to not be too worried about really bad outcomes. Do I have a lot of work to do to get to this goal? Yes. I have to define the conscience breaches more precisely (something I mentioned in my post and that you made reference to in your comment), and assign “wise me” formulas for conscience weights, then test the system on actual AI’s as they get closer and closer to AGI to make sure it consistently works and any bugs can be ironed out before it’d be used as actual guard rails for a real world AGI agent.
Regarding #2, it sounds again like you’re expecting early AGI’s to be more capable than I do:
What is latent in human text
When I personally try to figure new things out, such as a consistent system of ethics an AGI could use, I’ll come up with some initial ideas, then read some literature, then update my ideas, which then might point me to new literature I should read, so I’ll read that, and keep going back and forth between my own ideas and the literature when I get stuck with my own ideas. This seems like a much more efficient process for me than simply trying to figure out everything myself based on what I know right now, or of trying to read all possible related literature and then decide what I think from there.
An AGI, though, should be able to read all possible literature very quickly. It seems likely that it would do this to be able to most quickly come up with a list of hypotheses (its own ideas) to test. The further anything is from the “right” answer in the literature, and the lesser the variety of “wrong’ ideas explored there, the more the AGI will have to work to come up with the “right” answer itself.[1] So at the very least, I hope to contribute to the variety of “wrong” ideas in the literature, but of course I’m aiming for something closer to the “right’ answer than what’s currently out there.
I’m of the opinion there’s a good chance (and I'd take anything higher than, say, 1 in 10,000 as a “good” chance when we’re talking about potentially horrible outcomes) someone “bad” will let loose a not-so-well-aligned AGI before we have super-well-aligned (both inner and outer aligned) AGI’s ready to autonomously defend against them.[2] Since my expertise is more well-suited for outer alignment than anything else in the alignment space, if I can make a tiny contribution towards speeding up outer alignment and making good AGI’s more likely to win these initial battles, great.
I’ll try to clarify my vision:
For a conscience calculator to work as a guard rail system for an AGI, we’ll need an AGI or weak AI to translate reality into numerical parameters: first identifying which conscience breaches apply in a certain situation, drawing from the list in Appendix A, and then estimating the parameters that will go into the “conscience weight” formulas (to be provided in a future post)[1] to calculate the total conscience weight for a given decision option. The system should choose the decision option(s) with the minimum conscience weight. So I’m not saying, “Hey, AGI, don’t make any of the conscience breaches I list in Appendix A, or at least minimize them.” I’m saying, “Hey, human person, bring me that weak AI that doesn’t even really understand what I’m talking about, and let’s have it translate reality into the parameters it’ll need for calculating, using Appendix A and the formulas I’ll provide, what the conscience weights are for each decision option. Then it can output to the AGI (or just be a module in the AGI) which decision option or options have the minimum, or ideally zero, total conscience breach weight. And hopefully those people who’ve been worrying about how to align AGI’s will be able to make the decision option(s) with the minimum conscience breach weight binding on the AGI so it can’t choose anything else.”
Basically, I’m trying to come up with a system to align an AGI to once people figure out how to rigorously align an AGI to anything. It seems to me that people under-estimate how important exactly what to align to will end up being, and/or how difficult it’s going to be to come up with the specifications on what to align to so they generalize well to all possible situations.
Regarding your paragraph 3 about the difficulty of AI understanding our true values:
and that there's some large probability it implies preventing (human and nonhuman) tragedies in the meantime…
Personally, I’m not comfortable with “large” probabilities of preventing tragedies - people could say that’s the case for “bottom up” ML ethics systems if they manage to achieve >90% accuracy and I’d say, “Oh, man, we’re in trouble if people let an AGI loose thinking that’s good enough.” But this is just a gut feel, really - maybe the first AGI’s will have enough “common sense” to generalize well and not do the big unethical bad stuff. I’d rather not bank on that, though. My work for AI’s is geared first and foremost towards reducing risks from the first alignable agentic AGI’s to be let out in the world.
Btw, I think there are a couple of big holes in the ethics literature, that’s why I think my work could help speed up an AGI figuring out ethics for itself:
I hope this clears some things up - if not, let me know, thanks!
Example parameters include people’s ages and life expectancies, and pain levels they may experience.
(Also, this quote looks like a rationalization/sunk-cost-fallacy to me; as I'm not you, I can't say whether it is for sure. But if I seemed (to someone) to do this, I would want that someone to tell me, so I'm telling you.)
I do appreciate you calling it like you see it, thank you! I don't think I'm making a rationalization/sunk-cost-fallacy here, but I could be wrong - I seem to see things much differently than the average EA Forum/LessWrong reader as evidenced by the lack of upvotes for my work on trying to figure out how to quantify ethics and conscience for AI's.
I think perhaps our main point of disagreement is how easy we think it'll be for an AGI to (a) understand the world well enough to function at a human level over many domains, and (b) understand from our words and actions what we humans really want (what we deeply value rather than just surface value). I think the latter will be much more difficult.
Maybe my model for how an AGI would go about figuring out human values and ethics and conscience is flawed, but it seems like it would be efficient for an AGI to read the literature and then form its own best hypotheses and test them. So here I'm trying to contribute to the literature to speed up its process (that's not my only motivation for my posts, but it's one).
Ah, I see, thank you for the clarification. I'm not sure how the trajectory of AGI's will go, but my worry is that we'll have some kind of a race dynamic wherein the first AGI's will quickly have to go on the defensive against bad actors' AGI's, and neither will really be at the level you're talking about in terms of being able to extract a coherent set of human values (which I think would require ASI, since no human has been successful at doing this, as far as I know, but everyday humans can tell what a lie is and what stealing is). If I can create a system that everyday humans can follow, then "everyday" AGI's should be able to follow it, too, at least to some degree of accuracy. That may be enough to avoid significant collateral damage in a "fight" between some of the first AGI's to come online. But time will tell... Thanks again for the thought-provoking comment.
Thanks for the comment!
If I understand you correctly, you're saying that any AGI that could apply the system I'm coming up with could just come up with an idealized system better itself, is that right? I don't know if that's true (since I don't know what the first "AGI's" will really look like), but even if my work only speeds up an AGI's ability to do this itself by a small amount, that might still make a big difference in how things turn out in the world, I think.
Do you know about The Dignity Index? Might be interesting to team up with them/get their input.