Hide table of contents

Humans don't have single overriding goals that determine everything we do. We have some general goals - stay alive, be happy, be good, reproduce, etc. And there are a lot of other things we do that can be thought of us asu goals towards achieving top level goals.

But really it seems like people just do things. We are messy, sometimes wanting one thing and sometimes another. For lack of a better term humans are more "organic" than programmed.

I am newly diving into AI safety research, and it seems to me a lot of the problem with misaligned AI is at root an issue of the AI having a single overarching goal in the first place. If you have a super powerful intelligence that is set in acheiving one thing, it is like having a super powerful drill that only goes one direction. If the direction you give it is a little off, you have serious consequences because the tool is so powerful that it plows through anything in its way and ends up way of course.

But is there any reason an AI has to have a single goal? Does it seem likely possible to make intelligent AI that has goals more similarly to how people have goals?

And as a side note: I wonder if the organic, non-goal-directed manner in which humans are msde is somehow critical for consciousness. Or maybe even intelligence of a certain kind. It seems that we are not sure enough how consciousness arises that it might be the case that it only arises in "organic" beings.

Curious to hear anyone's thoughts. I'm sure there's existing thinking on this. If anyone can point me towards that I would greatly appreciate!




New Answer
New Comment

2 Answers sorted by

The trouble isn't that AI can't have complex or vague goals, it's that there's no reason why having more complicated and vague goals makes something less dangerous. 

Think of it this way: A lion has complicated and vague goals. It is messy, organic, and not "programmed".  Does that mean that a lion is safe? Would you be afraid to be locked in a cage with a lion? I would be.

Humans and lions both have complicated and sometimes vague goals, but because their goals are not the same goals, both beings pose a severe danger to each other all the same. The lion is dangerous to the human because the lion is stronger than the human. The human is dangerous to the lion because the human is smarter than the lion. 

Where most people go wrong is that they think that smart means nice, so they think that if only the lion in this analogy was smart too, then it would magically also be safe. They don't imagine that a smart lion might want to eat you just the same as a regular lion. 

In order to make a lion safe, you need to either control its values, so that it doesn't want to harm you, or you need to make it more predictable.

Thanks a lot! Very clear and helpful

Furthermore, a lion becomes more dangerous as it becomes more intelligent and capable, even if its terminal goal is not "maximize number of wildebeests eaten".

It probably won't. In my opinion, the "single, overarching goal to be maximised at all costs" is an outdated concept, based on speculation made before neural networks and the like became the norm. 

Nostalgebraist asked a similar question to yours a while back, and makes a good argument that it won't be. I put my thoughts as to why fixed goal maximization is  unlikely into my own post here

As others pointed out, it doesn't need to have a fixed goal to be dangerous. But I think lacking such a goal does make a few nightmare scenarios significantly less likely, and overall decreases the likelihood of apocalypse. 

Curated and popular this week
Relevant opportunities