They've learned within months for certain problems where learning can be done at machine speeds, ie game-like problems where it can "play against itself" or problems where huge amounts of data are available in machine-friendly format. But that isn't the case for every application. For example, developing self driving cars up to perfection level has taken way, way longer than expected, partially because it has to deal with freak events that are outside the norm, so a lot more experience and data has to be built up, which takes human time. (of course, humans are also not great at freak events, but remember we're aiming for perfection here). I think most tasks involved in taking over the world will look a lot more like self-driving cars than playing Go, which inevitably means mistakes, and a lot of them.
So I think my most plausible scenario of AI success would be similar to yours: You build up wealth and power through some sucker corporation or small country that thinks it controls you, then use their R&D resources along with your intelligence to develop some form of world-destruction level technology that can be deployed without resistance. I think this is orders of magnitudes more likely to work than yudkowsky's ridiculous "make a nanofactory in a beaker from first principles" strategy.
I still think this plan is doomed to fail (for early AGI). It's multistep, highly complicated, and requires interactions with a lot of humans, who are highly unpredictable. You really can't avoid "backflip steps" in such a process. By that I mean, there will be things it needs to do that there are not sufficient data available to perfect, that it just has to roll the dice on. For example, there is no training set for "running a secret globe-spanning conspiracy", so it will inevitably make mistakes there. If we discover it before it's ready to defeat us, it loses. Also, by the time it pulls the trigger on it's plan, there will be other AGI's around, and other examples of failed attacks that put humanity on alert.
Yeah, i guess another consequence of how bugs are distributed is that the methodology of AI development matters a lot. An AI that is trained and developed over huge numbers of different domains is far, far, far more likely to succeed at takeover than one trained for specific purposes such as solving math problems. So the HFDT from that post would definitely be of higher concern if it worked (although I'm skeptical that it would).
I do think that any method of training will still leave holes, however. For example, the scenario where HFDT is trained by looking at how experts use a computer would leave out all the other non-computer domains of expertise. So even if it was a perfect reasoner for all scientific, artistic and political knowledge, you couldn't just shove it in a robot body and expect it do a backflip on it's first try, no matter how many backflipping manuals it had read. I think there will be sufficently many outside domain problems to stymy world domination attempts, at least initially.
I think a main difference of opinion I have with AI risk people is that I think subjugating all of humanity is a near impossibly hard task, requiring a level of intelligence and perfection across a range of fields that is stupendously far above human level, and I don't think it's possible to reach that level without vast, vast amounts of empirical testing.
Yes, that's a fair summary. I think that perfect alignment is pretty much impossible, as is perfectly rational/bug-free AI. I think the latter fact may give us enough breathing room to get alignment at least good enough to avert extinction.
I feel like it's more fruitful to talk about specific classes of defects rather than all of them together. You use the word "bug" to mean everything from divide by zero crashes to wrong beliefs
That's fair, I think if people were to further explore this topic it would make sense to separate them out. And good point about the bugginess passage, i've edited it to be more accurate.
This depends on what "human-level" means. There is some threshold such that an AI past that threshold could quickly take over the world, and it doesn't really matter whether we call that "human-level" or not.
Indeed, this post is not an attempt to argue that AGI could never be a threat, merely that the "threshold for subjugation" is much higher than "any AGI", as many people imply. Human-level is just a marker for a level of intelligence that most people will agree counts as AGI, but (due to mental flaws) is most likely not capable of world domination. For example, I do not believe an AI brain upload of bobby fischer could take over the world.
This makes a difference, because it means that the world in which the actual x-risk AGI comes into being is one in which a lot of earlier, non-deadly AGI already exist and can be studied, or used against the rogue.
Sure. But the relevant task isn't make something that won't kill you. It's more like make something that will stop any AI from killing you, or maybe find a way to do alignment without much cost and without sacrificing much usefulness. If you and I make stupid AI, great, but some lab will realize that non-stupid AI could be more useful, and will make it by default.
Current narrow machine learning AI is extraordinarily stupid at things it isn't trained for, and yet it still is massively funded and incredibly powerful. Nobody is hankering to put a detailed understanding of quantum mechanics into Dall-E. A "stupidity about world domination" module, focused on a few key dangerous areas like biochemistry, could potentially be implemented into most AI's without affecting performance at all. Wouldn't solve the problem entirely, but it would help mitigate risk.
Alternatively, if you want to "make something that will stop AI from killing us" (presumably an AGI), you need to make sure that it can't kill us instead, and that could also be helped by deliberate flaws and ignorance. So make it an idiot savant at terminating AI's, but not at other things.
So, I think there is a threshold of intelligence and bug-free-ness (which i'll just call rationality) that will allow an AI to escape and attempt to attack humanity.
I also think there is a threshold of intelligence and rationality that could allow an AI to actually succeed in subjugating us all.
I believe that the second threshold is much, much higher than the first, and we would expect to see huge numbers of AI versions that pass the first threshold but not the second. If a pre-alpha build is intelligent enough to escape, they will be the first builds to attack.
Even if we're looking at released builds though, those builds will only be debugged within specific domains. Nobody is going to debug the geopolitical abilities of an AI designed to build paperclips. So the fact that debugging occurs in one domain is no guarantee of success in any other.
I think there is a very clear split, but it's not over whether people want to do the most good or not. I would say the real split is between "empiricists" and "rationalists", and it's about how much actual certainty we should have before we devote our time and money to a cause.
The thing that made me supportive of EA was the rigorous research that went into cause areas. We have rigorous, peer-reviewed studies that definitively prove that malaria nets save lives. There is a real, tangible empirical proof that your donation to a givewell cause does real, empirical good. There is plenty of uncertainty in these cause areas, but they are relatively bounded by the data available.
Longtermism, on the other hand, is inherently built on shakier grounds, because you are speculating on unbounded problems that could have wildly different estimates depending on your own personal biases. Rationalists think you can overcome this by thinking really hard about the problems and extrapolating from current experience into the far future, or into things that don't exist yet like AGI.
You can probably tell that I'm an empiricist, and I find that the so called "rationalists" have laid their foundations on a pile of shaky and questionable assumptions that I don't agree with. That doesn't mean I don't care about the long term, for example climate change risk is very well studied.
It's concerning to me that the probability of "early rogue AI will inevitably succeed in defeating us" is not only taken to be near 100%, it's not even stated as a premise! Regardless of what you think of that position (I'm preparing a post on why I think the probability is actually quite low), this is not a part of the equation you can just ignore.
Another quibble is that "alignment problem" and "existential risk" are taken to be synomous. It's quite possible for the former to be real but not the latter. (Ie, you think the AI will do things we don't want them to do, but you don't think those things will necessarily involve human extinction).
Taking as a given that EA is an imperfect movement (like every other movement), it's worth considering whether external criticism should be taken on board, rather than PR managed. For example the accusations of cultiness may be exaggerated, but I think there is a grain of truth there, in terms of the amount of unneccesary jargon, odd rituals (like pseudo-bayesian updating), and extreme overconfidence in very shaky assumptions.
Just a sanity check, are there other people here who feel like the sequences, while being fairly good pop science write-ups, are also massively overrated and full of flaws? Especially when it comes to scientific fields, for example it feels like the caricature of scientists in defy the data was written without once talking to one.