Ah, thanks for rephrasing that. To make sure I’ve got this right - there’s a window between something being ‘easy to solve’ and ‘impossible to solve’ that a cause has to exist in to be worth funding. If it were ‘easy to solve’ it would be solved in the natural course of things, but if it were ‘impossible to solve’ there’s no point working on it.
When I argue that AGI safety won’t be solved in the normal course of AGI research, that is an argument that pushes it towards the ‘impossible’ side of the tractability scale. We agree up to this point, I think.
If I’ve got that right, then if I could show that it would be possible to solve AGI safety with increased funding, you would agree that it’s a worthy cause area? I suppose we should go through all the literature and judge for ourselves if progress is being made in the field. That might be a bit of a task to do here, though.
For the sake of argument, let’s say technical alignment is a totally intractable problem, when then? Give up and let extinction happen? If the problem does turn out to be impossible to solve, then no other cause area matters either because everybody is dead. If the problem is solvable, and we build a superintelligence, then still no other cause area matters because a superintelligence would be able to solve those problems.
This is kind of why I expected your argument to be about whether a superintelligence will be built, and when. Or about why you think that safety is a more trivial problem than I do. If you’re arguing the other way -- that safety is an impossible problem -- then wouldn’t you instead argue for stopping it being built in the first place?
I don’t know how tractable technical alignment will turn out to be. There has been some progress, but my main takeaway has been “We’ve discovered X, Y, and Z won’t work.”. If there is still no solution as we get closer to AGI being developed, then at least we’ll be able to point to that failure to try and slow down dangerous projects. Maybe the only safe solution will be to somehow increase human intelligence, rather than creating an independent AGI at all, I don’t know.
On the other hand, it might be totally solvable. It’s theoretical research, we don’t know until it’s done. If it is easily solved, then the problem becomes making sure that all AGI projects implement the solution, which would still be an effective cause. In either case, marginal increases in funding wouldn't be wasted.
Sorry for the delay on this reply. It’s been a very busy week.
Okay, so, to be clear -- I am making the argument that superintelligence safety is an important area that is underfunded today, and you are arguing that extinction caused by superintelligence is so unlikely that it shouldn’t be a concern. Is that accurate?
With that in mind, I’ll go through you points here one by one, and then attempt to address some of arguments in your blog posts (though the first post was unavailable!).
The theoretical limits of computation are lower bounds, we don't know if it is possible to achieve them for any kind of computation, let alone for general computation. Moreover, having a lot of computational power probably doesn't mean that you can calculate everything. A lot of real-world problems are hard to approximate in a way that adding more computational power doesn't meaningfully help you. For example, computing approximate Nash-equilibria or finding good lay-outs for microchip design. It is not clear that having a lot of computing power translates into relevant superior capabilities.
I agree with you here. My reason for bringing this up in the main post was to show that superintelligence is possible under today’s understanding of physics. Raw computation is not intelligent by itself, we agree, but rather one requirement for it. I was just pointing out the computation that could be done in a small amount of matter is much larger than the computation that is done in the brain. (And that the brain’s computation is in a pattern that we call general intelligence).
There is a growing literature on making algorithms fair, accountable and transparent. This is a collaborative effort between researchers in computer science, law and many other fields. There are so many similarities between this and the professed goals of the AI Safety community that it is strange that no cross-fertilization is happening.
I didn’t mention a lot of good research relevant to safety, and progress is being made in many independant directions for sure. I do agree, I would also like to see more of a crossover, though I really don’t know how much the two areas are already working off the other’s progress. I’d be surprised if it were zero. Regardless, if it were zero, it would show poor communication, rather than say anything about the concerns being wrong.
You can't just ask the AI to "be good", because the whole problem is getting the AI to do what you mean instead of what you ask. But what if you asked the AI to "make itself smart"? On the one hand, instrumental convergence implies that the AI should make itself smart. On the other hand, the AI will misunderstand what you mean, hence not making itself smart. Can you point the way out of this seeming contradiction?
I mean, there’s no rule that a superintelligence has to misunderstand you. And there’s no certainty instrumental convergence is correct. (I wouldn’t risk my life on either statement!) It’s just that we think being smarter would help achieve most goals, so we probably should expect a superintelligence to try and make itself smarter.
The other part is we just don’t know how to guarantee that a superintelligence will do what we mean. (If you do know how to do this, that would be a huge relief). Even in your example of trying to get an superintelligence just to make itself smarter, I certainly wouldn’t be confident it would do it in the way I expect -- I have enough trouble predicting how my programs today will run. Suppose I’d written a utility function for ‘smartness’ that actually just measured total bits flipped, for example, I might not realise until afterwards, which wouldn't be good.
AI Safety would be a worthy cause if a superintelligence were powerful and dangerous enough to be an issue but not so powerful and dangerous as to be uncontrollable. A solution has to be necessary, but it also has to exist. Thus, there is a tension between scale and tractability here. Both Bostrom and Yudkowsky only ever address one thing at a time, never acknowledging this tension.
I might be misunderstanding you here. Are you arguing that because superintelligence does not yet exist, it is not yet worthwhile to work on safety? Or are you arguing that we can’t be confident that a solution to alignment will work without a superintelligence to test it on?
If it’s the first, I would argue that there’s a major risk that we won’t find a solution in the period of time between creating a superintelligence, and the superintelligence having enough power to be a big problem. Unless I was super confident this time period would be very large, wouldn’t it make more sense to try and find a solution as early as possible?
I’d also argue that solving a solution early would mean it could be worked into the design of a superintelligence early, rather than just relying on the class of solutions that would fit something that’s already been built.
If it’s the second, I agree -- it would be a much easier problem to solve if we had a ‘mini’-superintelligence to practice on, for sure. Figuring out how to do this is a part of safety research! How can we limit a superintelligence’s capabilities so it stays in this state? How can we predict what will happen as we increase a weak superintelligence to a strong superintelligence? We still need to figure out how to do that as well, hence my call for research funding.
Most estimates on take-off speed start counting from the point that the AI is superintelligent. Why wait until then? A computer can be reset, so if you had a primitive AGI specimen you'd have unlimited tries to spot problems and make it behave.
I am not sure this is true, I’ve always read takeoff speed estimates as counting from the moment of human-level general intelligence - though I know many people imagine a human-level AGI as having access to current narrow superintelligence (as in, max[human, current computer] abilities at each task). Maybe that’s it.
Regardless, as above, I hope we get that chance, though from the little research that has been done it looks like this might not be as safe as it sounds. We would have to be very very good at determining the capability of an AGI, be confident that no other project is moving forward faster than us, and be confident that the behaviour will remain the same as intelligence increases -- which might be the trickiest one. For example, a near-human AGI might be able to predict that doing what humans want early on would make it more likely to achieve its goal later on, no matter what the goal actually is. -- So we haven’t avoided catastrophe, only added an instrumental goal of ‘behaving the way humans want me to until I have enough power to disregard them without being shut down’. Still, this is an open area of research and I hope it gets more funding and attention.
I'd say that a 0.0001% chance of a superintelligence catastrophe is a huge over-estimate. Hence, AI Safety would be an ineffective cause area if you hold a person-affecting view. If you don't, then at least this opens the way for the kind of counterarguments used against Pascal's Mugging.
Getting into your arguments for that figure below, though I want to clarify here my estimate of superintelligence being built this century is in the double digits percentage wise, and that if it's built before we solve alignment it is almost certain to be dangerous. I'm not relying on very low probabilities of drastic outcomes, so Pascal's Mugging doesn't apply.
Onwards to a some limited responses to your blog posts. I wasn’t entirely sure if I understood your argument properly, so I’m going to try and list the main points here and see if you agree.
1. You argue that if the probability of an AI-related extinction event were large, and if a single AI-related extinction event could affect any lifeform in the universe ever, one should have already happened somewhere and we shouldn’t exist.
2. You argue that current safety research is ineffective -- we’d be able to work more effectively and cheaply if we waited until we were closer to developing superintelligence.
3. You believe that if a superintelligence was going to be built in the near future, and if it was going to be dangerous, it would probably result in a smaller scale catastrophe that would give us plenty of warning that a bigger catastrophe was coming.
4. You believe that there are numerous psychological reasons people are inclined to believe superintelligence is likely and dangerous, and so increase your skepticism of the claims because of that.
5. You argue that left to its own devices, regular commercial or academic research will be able to solve the problem.
If there’s a major point I’ve missed here, or if I’ve phrased these badly, do correct me! Anyway, let’s go through them.
If the probability of broadcasting radio into space were large, we should have already detected alien radio. (Since radio would also spread at the speed of light in all directions, and be distinct from natural events). I don’t believe this is strong evidence against the hypothesis that superintelligence (or radio) is possible and dangerous, though I suppose it’s evidence that there are no other advanced civilisations within our past light cone.
It is hard to say how effective current safety research is, for sure. If anything, the limited progress should make us think this problem is very hard and make us way less confident about being able to solve it in a short period of time in the future. Particularly since some aspects of safety get harder to implement the longer we wait -- building culture and institutions that consider the issue when setting up their AGI projects, for instance.
3. You believe that if a superintelligence was going to be built in the near future, and if it was going to be dangerous, it would probably result in a smaller scale catastrophe that would give us plenty of warning to do safety research to prevent a larger one at that point.
If the time period between a small scale catastrophe and a large one is small, we shouldn’t be confident that we can solve safety in time -- especially if you are right about a small scale catastrophe being evidence we are nearing superintelligence.
Additionally, if there exist large scale failure modes that are wholly different to any small scale failure mode, we shouldn’t expect learning from small scale catastrophes to help us prevent larger ones.
Alternatively, we might even make large scale failures harder to detect by patching small failures -- for example, we might think we’ve prevented a superintelligence from trying to escape onto the internet, but we’ve really just made escaping so hard that only a strong superintelligence could manage it.
Humanities general lack of concern generally about climate change or nuclear weapons (prior to them being created / caused) would indicate to me the psychological trends go in the other direction, at least for most people. Regardless, I would certainly agree with being really skeptical about extraordinary claims.
I would argue that it’s an extraordinary claim both ways. Either superintelligence is not that hard to build, or there is something so incredibly complicated and special about biological general intelligence that even with billions of dollars of funding per year for a hundred years, we won’t manage to replicate it - even as we replicate other aspects of biological intelligence (like vision, or motor control).
You might argue, fairly, that this is more likely, but do you really believe it is billions of times as likely?
I’m not sure if you’re main disagreement is with superintelligence being built at all, or with it being dangerous, so let’s look at that quickly too. If we are skeptical of superintelligence being dangerous because it seems extraordinary, we should also be skeptical of the extraordinary claim that a superintelligence would be be safe and good by default. (If it is not by default, we already have discovered how difficult it is specify safe behaviour).
I really hope so.
Commercially, building a superintelligence (or rather, every step towards superintelligence) would be extremely profitable. But since safety research would take some of your best minds away from building it, the incentives are in the wrong direction. Whoever spends the least on safety has the largest proportion of their resources to spend on development.
As far as regular academic research goes, it’s more hopeful, but the number of people working on safety in traditional academia is very very low. How confident can we be that this low output would be enough to solve the problem prior to building a superintelligence -- especially given how difficult we’ve found it to be so far -- and considering how many ambitious researchers are working on building a superintelligence as soon as possible? Perhaps money could be best spent persuading those researchers to consider safety, I don’t know.
To conclude, I want to lay out what would change my mind:
If progress on computer hardware and software seemed very likely to halt (or slow dramatically) in the near future.
If our current understanding of neuroscience turned out to be wrong, and we could show that simulating general purpose computation required far more computation than the brain’s cells do -- perhaps the brain uses hard-to-compute actions on the level of atoms or smaller, rather than something that could be done in abstract models of cells.
If somebody was able to disprove (or provide very strong evidence against) the orthogonality thesis and instrumental convergence thesis.
If no project was working on building superintelligence.
Otherwise, it seems very much like we could have the capability of simulating and optimising a general intelligence in the near future, and that this could be very dangerous.
Yes, apologies for the delay, it's been a hectic week! Will hopefully post tomorrow.
Thanks for your response! I just wanted to let you know I'm taking the time to read your links and write out a well thought out reply, which might take another evening or two.