Sean Sweeney

Thanks for the reply. I still like to hold out hope in the face of what seems like long odds - I'd rather go down swinging if there's any non-zero chance of success than succumb to fatalism and be defeated without even trying.

AGI Morality and Why It Is Unlikely to Emerge as a Feature of Superintelligence

Sean Sweeney1mo1

Thank you for the very interesting post! I agree with most of what you’re saying here.

So what is your hypothesis as to why psychopaths don’t currently totally control and dominate society (or do you believe they actually do?)?

Is it because:

“you can manipulate a psychopath by appealing to their desires” which gives you a way to beat them?
they eventually die (before they can amass enough power to take over the world)?
they ultimately don’t work well together because they’re just looking out for themselves, so have no strength in numbers?
they take over whole countries, but there are other countries banded together to defend against them (non-psychopaths hold psychopaths at bay through strength in numbers)?
something else?

Of course, even if the psychopaths among us haven’t (yet) won the ultimate battle for control doesn’t mean psychopathic AGI won’t in the future.

I take the following message from your presentation of the material: “we’re screwed, and there’s no hope.” Was that your intent?

I prefer the following message: “the chances of success with guardian AGI’s may be small, or even extremely small, but such AGI's may also be the only real chance we’ve got, so let’s go at developing them with full force.” Maybe we should have a Manhattan project on developing “moral” AGI’s?

Here are some arguments that tend toward a slightly more optimistic take than you gave:

Yes, guardian AGI’s will have the disadvantage of constraints compared to “psychopathic” AGI, but if there are enough guardians, perhaps they can (mostly) keep the psychopathic AGI's at bay through strength in numbers (how exactly the defense-offense balance works out may be key for this, especially because psychopathic AGI's could form (temporary) alliances as well)
Although it may seem very difficult to figure out how to make moral AGI's, as AI’s get better, they should increase our chances of being able to figure this out with their help - particularly if people focus specifically on developing AI systems for this purpose (such as through a moral AGI Manhattan project)

Inefficacy, Inaction and Individual Responsibility

Sean Sweeney2mo1

Thanks for sharing this interesting Draft Amnesty post. I’ve been thinking a lot about these sorts of things, and want to make a couple of points that may or may not relate to your current beliefs/understandings (I think they’ll relate to someone’s):

Any theory of consequentialism that doesn’t take into account the effects of our actions/inactions on our consciences and thus our well-beings is an incomplete theory of consequentialism (it doesn’t include all consequences). By considering conscience effects, a difference between killing and letting die becomes apparent.
I personally like the “limited number of bets in our lifetimes” argument against following a decision theory fanatically dependent on expected value calculations, i.e., even when probabilities are super low. Basically, if I could make on the order of 10^20 bets in my lifetime, it might make sense to take a bet at a chance of 1 in 10^20 because eventually I’d end up winning, but since I’ll never live long enough to make 10^20 such bets, I shouldn’t take this one bet.
I think there are two concepts one could consider for responsibility for damages: one is for who’s responsible to pay for the damages, and one is for whether someone feels responsible in their conscience for the damages. The first would be affected by how many other people are involved such as if 3 of us pushed a car off a cliff, I might be responsible to pay for 1/3rd of the damages, or even up to the full damages if the other 2 people didn’t have the ability to pay. The second would be affected by if I thought I significantly directly contributed to at least some fraction of the damages, no matter how many other people were involved. Under this second concept of responsibility, I may choose not to eat meat because if I did, I’d feel that I was significantly contributing to the pain and killing of some amount of animals on factory farms.

Request for Information for a new US AI Action Plan (OSTP RFI)

Sean Sweeney2mo1

There doesn't appear to be a link for:

this blogpost on AI policy considerations under a Trump admin

Thoughts on Peter Singer's moral premises predating the EA movement

Sean Sweeney3mo2

Thanks for the post, you bring up some interesting points. I think one of the key things that's missing from Singer's approach is just how important personal responsibility is to well-being. Unfortunately, I don't have my alternative framework all figured out yet, but here's a start towards it. One example is that we have most responsibility for our own children since we brought them into existence and they generally can't fend for themselves, so, under many circumstances, giving them priority is the most overall well-being-promoting thing to do.

I'm glad to see you are questioning some of the philosophy behind EA, and I hope that more people will do so. I believe a shift to protecting rights (e.g., fighting corruption) and promoting responsibility (of which mental health is a big subset since it involves taking responsibility for your emotions) could potentially help make EA as a movement much more effective.

Introducing myself and my cause - preserving democracy

Sean Sweeney5mo3

Do you know about The Dignity Index? Might be interesting to team up with them/get their input.

The Ethics of Action and Inaction: Altruism, Obligation, and the Invisible Button

Sean Sweeney8mo3

Thank you for this interesting post, even though I don’t agree with your conclusions.

I believe one key difference between killing someone and letting someone die is its effect on one’s conscience.

If I kill someone, I violate their rights. Even if no one would directly know what I did with the invisible button, I’d know what I did, and that would eat at my conscience, and affect how I’d interact with everyone after that. Suddenly, I’d have less trust in myself to do the right thing (to not do what my conscience strongly tells me not to do), and the world would seem like a less safe place because I’d suspect that others would’ve made the same decision I did, and now might be effectively willing to kill me for a mere $6,000 if they could get away with it.

If I let someone die, I don’t violate their rights, and, especially if I don’t directly experience them dying, there’s just less of a pull on my conscience.

One could argue that our consciences don’t make sense and they should be more inline with classic utilitarianism, but I’d argue that we should be extremely careful about making big changes to human consciences in general without thoroughly thinking through and understanding the full range of the effects of these.

Also, I don’t think use of the term “moral obligation” is optimal, since to me it implies a form of emotional bullying/blackmail: you’re not a good person unless you satisfy your moral obligations. Instead, I’d focus on people being true to their own consciences. In my mind, it’s a question of trying to use someone’s self-hate to “beat goodness into them” versus trying to inspire their inner goodness to guide them because that’s what’s ultimately best for them.

By “self-hate,” I mean hate of the parts of ourselves that we think are “bad person” parts, but are really just “human nature” parts that we can accept about ourselves without that meaning we have to indulge them.

Yanni Kyriacos's Quick takes

Sean Sweeney8mo6

Have you tried cooking your best vegan recipes for others? In my experience sometimes people ask for the recipe and make it for themselves later, especially health-conscious people. For instance, I really like this vegan pumpkin pie that's super easy to make: https://itdoesnttastelikechicken.com/easy-vegan-pumpkin-pie/

(Probably Flawed) Solution to casual EAs trying to maximize impact under uncertainty and confusion

Sean Sweeney8mo1

Interesting idea, thanks for putting it out there. I'm currently trying to figure out better answers to some of the things you mentioned (at least "better" in terms of more in-line with my own intuitions). For example, I've been working on incorporating apparently non-consequentialist considerations into a utilitarian framework:

https://forum.effectivealtruism.org/posts/S5zJr5zCXc2rzwsdo/a-utilitarian-framework-with-an-emphasis-on-self-esteem-and

https://forum.effectivealtruism.org/posts/fkrEbvw9RWir5ktoP/creating-a-conscience-calculator-to-guard-rail-an-agi

I'm currently doing this work unpaid and independently. I don't have a Patreon page for individuals to support it directly, in part because the lack of upvotes on my work has indicated little interest. If you'd like to support my work, though, please consider buying my ebook on honorable speech:

Honorable Speech: What Is It, Why Should We Care, and Is It Anywhere to Be Found in U.S. Politics?

Thanks!

Creating a “Conscience Calculator” to Guard-Rail an AGI

Sean Sweeney8mo1

I admit I get a bit lost in reading your comments as to what exactly you want me to respond to, so I’m going to try to write it out in a numbered list. Please correct/add to this list as you see fit and send it back to me and I’ll try to answer your actual points rather than what I think they are if I have them wrong:

Explain how you think an AGI system that has sufficient capabilities to follow your “conscience calculator” methodology wouldn’t have sufficient capabilities to follow a simple single sentence command from a super-user human of good intent, such as, “Always do what a wise version of me would want you to do.”
Justify that going through the exercise of manually writing out conscience breaches and assigning formulas for calculating their weights could speed up a future AGI in figuring out an optimal ethical decision making system for itself. (I’m taking it as a given that most people would agree it’d be good, i.e., generally yield better results in the world, for an AGI to have a consistent ethical decision making system onboard.)

#1 was what I was trying to get at with my last reply about how you could use a “weak AI” (something that’s less capable than an agentic AGI) to do the “conscience calculator” methodology and then just output a go/no go response to an inner aligned AGI as to what decision options it was allowed to take or not. The AGI would come up with the decision options based on some goal(s) it has, such as doing what a user asks of it, e.g., “make me lots of money!” The AGI would “brainstorm” possible paths to make lots of money and the “weak AI” would come back with a go/no go on a certain path because, for instance, it doesn’t involve or does involve stealing. Here I’ve been trying to illustrate that an AI system that had sufficient capabilities to follow my “conscience calculator” methodology wouldn’t need to have sufficient capabilities to follow a broad super-user command such as “Always do what a wise version of me would want you to do.”

Of course, to be useful, the AGI needs to be able to follow a non-super-user’s, i.e., a user’s, commands reasonably well, such as figuring out what the user means by “make me lots of money!” The crux, I think, is that I see “make me lots of money” as a significantly simpler concept that “always do what the wise me would want.” And basically what I’m trying to do with my conscience calculator is provide a framework to make it possible for an AGI of limited abilities to straight off the bat calculate what “wise me” would want with a sufficiently high accuracy for me to not be too worried about really bad outcomes. Do I have a lot of work to do to get to this goal? Yes. I have to define the conscience breaches more precisely (something I mentioned in my post and that you made reference to in your comment), and assign “wise me” formulas for conscience weights, then test the system on actual AI’s as they get closer and closer to AGI to make sure it consistently works and any bugs can be ironed out before it’d be used as actual guard rails for a real world AGI agent.

Regarding #2, it sounds again like you’re expecting early AGI’s to be more capable than I do:

What is latent in human text

When I personally try to figure new things out, such as a consistent system of ethics an AGI could use, I’ll come up with some initial ideas, then read some literature, then update my ideas, which then might point me to new literature I should read, so I’ll read that, and keep going back and forth between my own ideas and the literature when I get stuck with my own ideas. This seems like a much more efficient process for me than simply trying to figure out everything myself based on what I know right now, or of trying to read all possible related literature and then decide what I think from there.

An AGI, though, should be able to read all possible literature very quickly. It seems likely that it would do this to be able to most quickly come up with a list of hypotheses (its own ideas) to test. The further anything is from the “right” answer in the literature, and the lesser the variety of “wrong’ ideas explored there, the more the AGI will have to work to come up with the “right” answer itself.^[1] So at the very least, I hope to contribute to the variety of “wrong” ideas in the literature, but of course I’m aiming for something closer to the “right’ answer than what’s currently out there.

I’m of the opinion there’s a good chance (and I'd take anything higher than, say, 1 in 10,000 as a “good” chance when we’re talking about potentially horrible outcomes) someone “bad” will let loose a not-so-well-aligned AGI before we have super-well-aligned (both inner and outer aligned) AGI’s ready to autonomously defend against them.^[2] Since my expertise is more well-suited for outer alignment than anything else in the alignment space, if I can make a tiny contribution towards speeding up outer alignment and making good AGI’s more likely to win these initial battles, great.

^{^}
Let’s say, for sake of argument, that there is a “right” answer.
^{^}
It’ll have to be autonomous at least over most decisions because humans won’t be able to keep up in real time with AGI’s fighting it out.

Sean Sweeney

Posts 7

Comments25

Posts
7

Comments
25