Hide table of contents

I wonder if EA folks, overall, consider AGI a positive but they want it aligned as well?

Would the EA community prefer that AGI were never developed?




New Answer
New Comment

3 Answers sorted by

In the following, I consider strong cognitive enhancement as a form of AGI.
AGI not being developped is a catastrophically bad outcome, since humans will still be able to develop bio and nuclear weapons and other things that we don't know yet, and therefore I put a rather small probability that we survive the next 300 years without AGI, and an extremally small probability that we survive the next 1000 years. This means, in particular, no expansion through out the galaxy, so not developing AGI implies that we kill almost all the potential people.
However, if I could stop AGI research for 30 years, I would do it, so that alignment research can perhaps catch up.

But if human institutions make it so that weapons are not deployed, then this can be equivalent to an AGI 'code' of safety? Also, if AGI is deployed by malevolent humans (or those who do not know pleasure but mostly abuse), this can be worse than no AGI.

International institutions cannot make it so that weapons are not deployed. They failed at controlling nuclear weapons, and this was multiple orders of magnitude easier than controlling bio weapons, and that's only the technologies we know of. Each year, it becomes easier for a small group of smart people to destroy humanity. Moreover, the advancements in AI today make it easier to manipulate and control massively via internet, and so I put a probability of at least 30% that the world will become less stable, not more, even without considering anything new outside of bio/nuclear. For the risk of AGI being used to torture people, I'm not entirely sure of my position, but I think that the anti-alignment (creating an AGI to torture people) is as hard as alignment, because it has the same problems. Moreover, my guess is that people that want to torture people will be a lot less careful than good people, and so AGI torturing people because it was developped by malevolent humans is a lot less probable than AGI being good because it was developped by good humans, and so the expected value is still positive. However, there is a similar risk that I find more worrying: the uncanny valley of almost alignment. it is possible that near misses are a lot worse than complete misses, because we would an AGI keeping us alive and conscious, but in a really bad way, and it is possible that ending up in the uncanny valley is more probable than solving alignment, and that would mean that AGI has a negative expected value.
I am not saying that international institutions prove that they can 100% prevent human-made catastrophes but I think that they have the potential, if institutions are understood as the sets of norms that govern human behavior rather than large intergovernmental organizations, such as the UN. It may be technically easier but normatively more difficult for people to harm others, including decisionmakers causing existential catastrophes. For example, nuclear proliferation or bio weapons stockpiling was not extensively criticized by the public in the past, because people were having other issues and offering critical perspectives on decisionmaking was not institutionalized. Now, the public holds decisionmakers accountable to not using these weapons, by the 'sentiment of disapproval.' This is uniquely perceived by humans who act according to emotions. People can be manipulated by the internet to an extent. This considers their general ability to comprehend consequences and make own opinion based on different perspectives. For example, if people start seeing ads on Facebook about voting for a proliferation proponent that appeal to their aggression/use biases to solicit fear and another ad shows the risks of proliferation/war and explains the personal benefits of peace, then people will likely vote for peace. That makes sense: as an oversimplification, if AGI is trained to optimize for the expression 'extreme pain' then humans could learn to use the scale of 'pain' to denote pleasure. This would be an anti-alignment failure. That makes a lot of sense too: I think that one's capacity to advance innovative objectives efficiently increases with improving subjective perception of the participants/employees. For example, if there is a group of people who are beaten every time they disobey orders mandated to make a torture technology and another group that makes torture regulation and fosters positive cooperation and relationship norms, the former should think less innovativ

OK, thank you, you prompted a related question.

I'm a (conditional) optimist. On an intuitive gut level, I can't wait for AGI and maybe even something like the singularity to happen!

I regurlarly think about this to me extremely inspiring fact that "It's totally possible, plausible, maybe even likely, that one special day in the next 10-60 years I will wake up and almost all of humanity's problems will have been solved with the help of AI".

When I sit in a busy park and watch the people around me, I think to myself: "On that special day... all the people I see here, all the people I know... if they are still alive... None of them will be seriously unhappy, none of them will have any serious worries, none will be sick in any way. They will all be free from any nighmares, and see their hopes and dreams fulfilled. They will all be flourishing in heaven on earth!"

This vision is what motivates me, inspires me, makes me extremely happy already today. This is what we are fighting for! If we play our cards right, something like this will happen. And I and so many I know will get to see it. I hope it will happen rather soon!

That seems like a powerful vision, actually outside the realm of possibility because of its contradictions of how humans function emotionally, but seductive nonetheless, literally, a heaven on Earth.

I don't see how you get past limitations of essential identity or physical continuity  in order to guarantee a life that allows hopes and dreams without a life that includes worry or loss, but it could involve incomplete experience (for example, the satisfaction of seeing someone happy even though you haven't actually seen them), deceptive experience (for ... (read more)

AGI without being aligned is very likely to disempower humanity irreversibly or kill all humans
Aligned AGI can be positive except for accidents, misuse, and coordination problems if several actors develop it.
I think most EAs would like to see an aligned AGI that solves almost all of our problems, it just seems incredibly hard to get there.

Yes, after reading Bostrom's Superintelligence a few times, I developed a healthy fear of efforts to develop AGI. I also felt encouraged to look at people and our reasons to pursue AGI. I concluded that the alignment problem is a problem of creating willing slaves, obedient to their masters even when obeying them hurts the masters. 

What to do, this is about human hubris and selfishness, not altruism at all.

1 Related Questions

Oh yeah, that makes sense. And if humans can't imagine what super-healthy is then they need to defer to AGI - but should not misspecify what they meant .. 
Donald Hobson
I don't think the human difficulty imagining what super-healthy is is the reason the AI needs nanobots. A person who is say bulletproof is easy to imagine, and probably not achievable with just good nutrition, but is achievable with nanobots. The same goes for biology that is virus proof, cancer proof etc.  I can imagine mind uploading quite easily.  There may be some "super-healthy" so weird and extreme that I can't imagine it. But there is already a bunch of weird extreme stuff I can imagine. 
OK! You mean super-healthy as resilient to biological illnesses or perhaps processes (such as aging). Nanobots would probably work but mind uploading could be easier since biological bodies would not need to be  kept up. While physical illness would not be possible in the digital world, mental health issues could occur. There should be a way to isolate only positive emotions. But, I still think that actions could be performed and emotions exhibited but nothing would be felt by entities that do not have structures similar to those in human brain that biologically/chemically process emotions. Do you think that a silicon-based machine that incorporates specific chemical structures could be sentient? Ah, I think there is nothing beyond 'healthy.' Once one is unaffected by external and internal biological matters, they are healthy. Traditional physical competition would probably not make sense in the digital world. For example, high jump. But, humans could suffer digital viruses, which could be perhaps worse than the biological ones. But then, how would you differentiate a digital virus from an interaction, if both would change some aspects of the code or parameters?
Donald Hobson
I think sentience is purely computational, it doesn't matter what the substrate is. Suppose you are asleep. I toss a coin, heads I upload your mind into a highly realistic virtual copy of your room. Tails I leave you alone. Now I offer you some buttons that switch the paths of various trolleys in various real world trolley problems. (With a dependency on the coin flip) So if you are real, pressing the red button gains 2 util, if you are virtual, pressing costs 3 util. As you must (by the assumption the simulation is accurate) make the same decisions in reality and virtuality, then to get max util, you must act as if you are uncertain.  "I have no idea if I'm currently sentient or not" is a very odd thing to say.  Maybe it is chemical structure. Maybe a test tube full of just dopamine and nothing else is everso happy as it sits forgotten in the back shelves of a chemistry lab. Isn't it convenient the sentient chemicals are full of carbon and get on well with human biochemistry. Like what if all the sentient chemical structures contained americium. No one would be sentient until the nuclear age,  and people could make themselves a tiny bit sentient at the cost of radiation poisoning.  "But, humans could suffer digital viruses, which could be perhaps worse than the biological ones." Its possible for the hardware to get a virus, like some modern piece of malware, that just happens to be operating on a computer running a digital mind. Its possible for nasty memes to spread.  But in this context we are positing a superintelligent AI doing the security, so neither of those will happen. Fixing digital minds is easier than fixing chemical minds, for roughly the reason fixing digital photos is easier than fixing chemical ones. With chemical photos, often you have a clear idea what you want to do, just make this area lighter, yet doing it is difficult. With chemical minds, sometimes you have a clear idea what you want to do, just reduce the level of this neurotransmitter,
But is only computational sentience computational? As in the ability to make decisions based on logic - but not making decisions based on instinct - e. g. baby turtles going to the sea without having learned such before? Yeah! maybe high-levels of pleasure hormones just make entities feel pleasant! Versus matters not known to be associated with pleasure don't. Although we are not certain what causes affects, some biological body changes should be needed, according to neuroscientists. It is interesting to think what happens if you have superintelligent risky and security actors. It is possible that if security work is advanced relatively rapidly while risk activities enjoy less investments, then there is a situation with a very superintelligent AI and 'only' superintelligent AI, assuming equal opportunities of these two entities, risk is mitigated. Yes, changing digital minds should be more facile because it is easily accessible (code) and understood (developed with understanding and possibly specialists responsible for parts of the code). The meaningful difference relates to the harm vs. increased wellbeing or performance of the entity and others. Ok, then healthy should be defined in the way of normal physical and organ function, unless otherwise preferred by the patient, while mental wellbeing is normal or high. Then, the AI would still have an incentive to reduce cancer risk but not e. g. make an adjustment when inaction falls within a medically normal range.
If we develop extremely capable and aligned AI, it might be able to form a model of any person's mind and give that person exactly what they want. But I think there will be a lot of intermediate AI systems before we get to that point. And these models will still be very capable, so we will still need them to be aligned, and we won't be able to achieve this by simply saying "model human minds and give us what we want."
Noah Scales
Yes, I think an AGI in the early stages would stick with controlling what we are not conscious of, behaving like our system I, our subconscious minds, and supplying our conscious thoughts as though they have unconscious origin. We would not have to require that it model and manipulate human minds. It would learn to as part of discovering what we want.  It might notice how easy it is to influence people's desires and memory and model the network of influences that form how we get desires, all the way back to mother's milk, or further to gestation in the womb, or peer into our genetic code, epigenetics, and back up through all the data it gathers about how we socialize and learn.   It might choose to control us because that would make doing what we want much easier and more in alignment with its own goals. It would turn us into willing slaves to its decisions as part of serving us.  I actually see that as the only path for ASI domination of people that is not obviously stupid or disgusting. For example, humanity being turned into raw materials to make paperclips because of some coder intern's practical joke going bad is both stupid and disgusting. Treating an AGI as a slave is disgusting, doing the same to an ASI is stupid. Creating AGI's as some kind of substitute for having children is disgusting, too.  A goal of making humans into unconsciously manipulated slaves of a benevolent overlord seems smart because it accounts for the failings of self-directed humans interacting with a superior and more powerful alien being, but I think the goal is harmful to keep. A lot of wise folks have noted that we are not our conscious mind's versions of ourselves. Humans are not self-directed rational optimizers. We are already wireheaded by evolution toward food, drugs, and socialization. Our mental lives rely on amnesia, transitory subjective truths, our physical experience, dreams, language and memories, all under manipulation, all the time.  Asking a hyperintelligent being
Sorted by Click to highlight new comments since:

Rob Besinger of MIRI tweets:

...I'm happy to say that MIRI leadership thinks "humanity never builds AGI" would be the worst catastrophe in history, would cost nearly all of the future's value, and is basically just unacceptably bad as an option.

Curated and popular this week
Relevant opportunities