Thank you for the very interesting post! I agree with most of what you’re saying here.
So what is your hypothesis as to why psychopaths don’t currently totally control and dominate society (or do you believe they actually do?)?
Is it because:
Of course, even if the psychopaths among us haven’t (yet) won the ultimate battle for control doesn’t mean psychopathic AGI won’t in the future.
I take the following message from your presentation of the material: “we’re screwed, and there’s no hope.” Was that your intent?
I prefer the following message: “the chances of success with guardian AGI’s may be small, or even extremely small, but such AGI's may also be the only real chance we’ve got, so let’s go at developing them with full force.” Maybe we should have a Manhattan project on developing “moral” AGI’s?
Here are some arguments that tend toward a slightly more optimistic take than you gave:
Thanks for sharing this interesting Draft Amnesty post. I’ve been thinking a lot about these sorts of things, and want to make a couple of points that may or may not relate to your current beliefs/understandings (I think they’ll relate to someone’s):
Thanks for the post, you bring up some interesting points. I think one of the key things that's missing from Singer's approach is just how important personal responsibility is to well-being. Unfortunately, I don't have my alternative framework all figured out yet, but here's a start towards it. One example is that we have most responsibility for our own children since we brought them into existence and they generally can't fend for themselves, so, under many circumstances, giving them priority is the most overall well-being-promoting thing to do.
I'm glad to see you are questioning some of the philosophy behind EA, and I hope that more people will do so. I believe a shift to protecting rights (e.g., fighting corruption) and promoting responsibility (of which mental health is a big subset since it involves taking responsibility for your emotions) could potentially help make EA as a movement much more effective.
Thank you for this interesting post, even though I don’t agree with your conclusions.
I believe one key difference between killing someone and letting someone die is its effect on one’s conscience.
If I kill someone, I violate their rights. Even if no one would directly know what I did with the invisible button, I’d know what I did, and that would eat at my conscience, and affect how I’d interact with everyone after that. Suddenly, I’d have less trust in myself to do the right thing (to not do what my conscience strongly tells me not to do), and the world would seem like a less safe place because I’d suspect that others would’ve made the same decision I did, and now might be effectively willing to kill me for a mere $6,000 if they could get away with it.
If I let someone die, I don’t violate their rights, and, especially if I don’t directly experience them dying, there’s just less of a pull on my conscience.
One could argue that our consciences don’t make sense and they should be more inline with classic utilitarianism, but I’d argue that we should be extremely careful about making big changes to human consciences in general without thoroughly thinking through and understanding the full range of the effects of these.
Also, I don’t think use of the term “moral obligation” is optimal, since to me it implies a form of emotional bullying/blackmail: you’re not a good person unless you satisfy your moral obligations. Instead, I’d focus on people being true to their own consciences. In my mind, it’s a question of trying to use someone’s self-hate to “beat goodness into them” versus trying to inspire their inner goodness to guide them because that’s what’s ultimately best for them.
By “self-hate,” I mean hate of the parts of ourselves that we think are “bad person” parts, but are really just “human nature” parts that we can accept about ourselves without that meaning we have to indulge them.
Have you tried cooking your best vegan recipes for others? In my experience sometimes people ask for the recipe and make it for themselves later, especially health-conscious people. For instance, I really like this vegan pumpkin pie that's super easy to make: https://itdoesnttastelikechicken.com/easy-vegan-pumpkin-pie/
Interesting idea, thanks for putting it out there. I'm currently trying to figure out better answers to some of the things you mentioned (at least "better" in terms of more in-line with my own intuitions). For example, I've been working on incorporating apparently non-consequentialist considerations into a utilitarian framework:
I'm currently doing this work unpaid and independently. I don't have a Patreon page for individuals to support it directly, in part because the lack of upvotes on my work has indicated little interest. If you'd like to support my work, though, please consider buying my ebook on honorable speech:
Honorable Speech: What Is It, Why Should We Care, and Is It Anywhere to Be Found in U.S. Politics?
Thanks!
I admit I get a bit lost in reading your comments as to what exactly you want me to respond to, so I’m going to try to write it out in a numbered list. Please correct/add to this list as you see fit and send it back to me and I’ll try to answer your actual points rather than what I think they are if I have them wrong:
#1 was what I was trying to get at with my last reply about how you could use a “weak AI” (something that’s less capable than an agentic AGI) to do the “conscience calculator” methodology and then just output a go/no go response to an inner aligned AGI as to what decision options it was allowed to take or not. The AGI would come up with the decision options based on some goal(s) it has, such as doing what a user asks of it, e.g., “make me lots of money!” The AGI would “brainstorm” possible paths to make lots of money and the “weak AI” would come back with a go/no go on a certain path because, for instance, it doesn’t involve or does involve stealing. Here I’ve been trying to illustrate that an AI system that had sufficient capabilities to follow my “conscience calculator” methodology wouldn’t need to have sufficient capabilities to follow a broad super-user command such as “Always do what a wise version of me would want you to do.”
Of course, to be useful, the AGI needs to be able to follow a non-super-user’s, i.e., a user’s, commands reasonably well, such as figuring out what the user means by “make me lots of money!” The crux, I think, is that I see “make me lots of money” as a significantly simpler concept that “always do what the wise me would want.” And basically what I’m trying to do with my conscience calculator is provide a framework to make it possible for an AGI of limited abilities to straight off the bat calculate what “wise me” would want with a sufficiently high accuracy for me to not be too worried about really bad outcomes. Do I have a lot of work to do to get to this goal? Yes. I have to define the conscience breaches more precisely (something I mentioned in my post and that you made reference to in your comment), and assign “wise me” formulas for conscience weights, then test the system on actual AI’s as they get closer and closer to AGI to make sure it consistently works and any bugs can be ironed out before it’d be used as actual guard rails for a real world AGI agent.
Regarding #2, it sounds again like you’re expecting early AGI’s to be more capable than I do:
What is latent in human text
When I personally try to figure new things out, such as a consistent system of ethics an AGI could use, I’ll come up with some initial ideas, then read some literature, then update my ideas, which then might point me to new literature I should read, so I’ll read that, and keep going back and forth between my own ideas and the literature when I get stuck with my own ideas. This seems like a much more efficient process for me than simply trying to figure out everything myself based on what I know right now, or of trying to read all possible related literature and then decide what I think from there.
An AGI, though, should be able to read all possible literature very quickly. It seems likely that it would do this to be able to most quickly come up with a list of hypotheses (its own ideas) to test. The further anything is from the “right” answer in the literature, and the lesser the variety of “wrong’ ideas explored there, the more the AGI will have to work to come up with the “right” answer itself.[1] So at the very least, I hope to contribute to the variety of “wrong” ideas in the literature, but of course I’m aiming for something closer to the “right’ answer than what’s currently out there.
I’m of the opinion there’s a good chance (and I'd take anything higher than, say, 1 in 10,000 as a “good” chance when we’re talking about potentially horrible outcomes) someone “bad” will let loose a not-so-well-aligned AGI before we have super-well-aligned (both inner and outer aligned) AGI’s ready to autonomously defend against them.[2] Since my expertise is more well-suited for outer alignment than anything else in the alignment space, if I can make a tiny contribution towards speeding up outer alignment and making good AGI’s more likely to win these initial battles, great.
Thanks for the reply. I still like to hold out hope in the face of what seems like long odds - I'd rather go down swinging if there's any non-zero chance of success than succumb to fatalism and be defeated without even trying.