A response to Matthews on AI Risk


11


Dylan Matthews has written lots of useful exploratory material about EA. Of my favourite journalistic articles about effective altruism, Matthews has written about half. So I was surprised to see that after attending the recent EA Global conference, Matthews wrote that he left worried, largely because of the treatment of AI risk, a topic that seems important to me. Matthews's writing, though as clear as ever, had some issues with its facts and background research that I think compromised major parts of his argument.

 

Matthews critique of AI risk mitigation was mixed in with a wide range of his experiences and impressions from the event, but still his criticism was more substantial than most, and are already gaining thousands of social media shares. His main points, it seems to me, were that AI risk reduction efforts are:

  • self-serving
  • self-defeating
  • based on Pascal's Mugging, a riddle in expected value thinking. 

Let's take these in the reverse order.

Conference-goers told Matthews that reducing AI risks is enormously important even if the probability of successfully mitigating those risks was arbitrarily small. Matthews reasonable identified this as an argument from arbitrarily small probabilities of astronomical gains, known as Pascal's Mugging. He attributes discussion of the idea to Bostrom, and uses it it to challenge the notion of funding an AI risk charity like MIRI. Pascal's Mugging is an interesting and contentious issue in decision theory. But does Matthews realise that the approach was thoroughly disendorsed by MIRI years ago? If Matthews read Bostrom's piece about Pascal's Mugging to the end, and followed Bostrom's link, he would realise that the idea was originated by MIRI's founder Eliezer Yudkowsky. In Eliezer's original piece, non-credible offers of astronomical utility were not described as things to go and do, but as an unresolved puzzle.

 

Matthews says that to want to reduce existential risk, you have to distinguish between a probability of success of 10e-15 or 10e-50, and then throws his hands into the air exclaiming that surely noone could achieve such precision. This would be fine if Matthews had presented arguments that the likelihood of doing useful AI safety research was even less than one in a hundred.

But Matthews' reservations about AI safety efforts were only a paragraph in length, and did not have such force. First, he professed some uncertainty whether AI is possible. However, this should not seriously slim the odds of reducing AI risk. The median AI researcher estimates even odds of human-level AI between 2035 and 2050 so the prospect that AI is possible and achievable within decades is large enough to worry about. Second, he doubts whether intelligence is sufficient to give a computer dominion over humans. But intelligence is exactly what has always given humans dominion over animals. A superintelligent AI could covertly gain extreme financial power (as trading algorithms already do), hack hardware (as academics do) and control military devices (as drone software does) --- at least!. Third, he asks whether artificial intelligences might function just as tools, rather than as agents. But an ultra-powerful AI tool would still permit one human to wield power over all others, a problem that would still require some combination of technical and other risk-reduction research. Fourth, he asks how we ought to define friendliness in the context of machines. But this question that has previously been of interest to MIRI researchers, and will probably return to the fold as progress is facilitated by work on underlying mathematical problems. All up, Matthews has weakly argued for uncertainty about the impact of AI and AI safety research, but then supposes that we therefore can't tell the probability of success from 10e-15 and 10e-50. If we're the kind of people who want to quantify our uncertainty, then Matthews has presented a complete non sequitur. If we're uncertain about Matthews propositions, we ought to place our guesses somewhere closer to 50%. To do otherwise would be to mistake our deep uncertainty deep scepticism. And if the prospects are decent, then as Rob Wiblin, a speaker at the conference, has previously explained, Pascal's Mugging is not needed:

"While there are legitimate question marks over whether existential risk reduction really does offer a very high expected value, and we should correct for ‘regression to the mean‘, cognitive biases and so on, I don’t think we have any reason to discard these calculations altogether. The impulse to do so seems mostly driven by a desire to avoid the weirdness of the conclusion, rather than actually having a sound reason to doubt it.

A similar activity which nobody objects to on such theoretical grounds is voting, or political campaigning. Considering the difference in vote totals and the number of active campaigners, the probability that someone volunteering for a US presidential campaign will swing the outcome seems somewhere between 1 in 100,000 and 1 in 10,000,000. The US political system throws up significantly different candidates for a position with a great deal of power over global problems. If a campaigner does swing the outcome, they can therefore have a very large and positive impact on the world, at least in subjective expected value terms.

While people may doubt the expected value of joining such a campaign on the grounds that the difference between the candidates isn’t big enough, or the probability of changing the outcome too small, I have never heard anyone say that the ‘low probability, high payoff’ combination means that we must dismiss it out of hand."

Since there are a wide range of proposed actions for reducing risks from artificial intelligence, some more of which I will mention, it would take extensive argumentation to suggest that the probability of success for any of them was much less than swinging an election victory. So it would not seem that there's any need to involve riddles in decision theory to decide whether AI safety research is something worth doing.

Claiming that AI risk-reduction research would be self-defeating, Matthews says: "It's hard to think of ways to tackle this problem today other than doing more AI research, which itself might increase the likelihood of the very apocalypse this camp frets over". But this sells the efforts of AI risk reducers far short. First, they are taking efforts that are political, such as ralling researchers and reporting to politicians. Second, there are strategy and technological forecasting. Overall, achieving differential progress of safety technology relative to raw intelligence has been the main point of the AI risk reduction progress for years. It remains a key fixture --- see Russell's recent talk, where he advocated promoting reverse reinforcement learning, while decreasing fine-tuning of deep neural networks. But even if Matthews disagreed with Russell's assessment, this would only disagree with one specific plan for AI risk reduction, not the validity of the enterprise altogether. There are a wide range of other approaches to safely address the safety problem, such as rallying risk-aware researchers and politicians, and building clear strategies and timelines, that seem even more unambiguously good, and it would be odd --- to say the least --- if every one of these turned out to increase the risk of apocalypse, and that neither could any new safe courses of action be discovered.

Last, Matthews argues that AI risk reduction talk could be self-serving or biased. "At the risk of overgeneralizing, the computer science majors have convinced each other that the best way to save the world is to do computer science research.". He later returns to the issue: "The movement has a very real demographic problem, which contributes to very real intellectual blinders of the kind that give rise to the AI obsession." The problem here is that AI risk reducers can't win. If they're not computer scientists, they're decried as uninformed non-experts, and if they do come from computer scientists, they're promoting and serving themselves. In reality, they're a healthy mixture. From MIRI, Eliezer wanted to make AI, and has had to flip into making AI safety measures. Bostrom, who begun as a philosopher, has ended up writing about AI because it seems not only interesting, but also like an important problem. Interestingly, where Eliezer gets criticised for his overly enthusiastic writing and warmth for science fiction, Bostrom has, in order to avoid bias, avoided it entirely. Russell begun as an AI professor, and it was only when he took sabbatical that he realised that despite Eliezer's grating writing style, he was onto something. These stories would seem to describe efforts to overcoming bias moreso than succumbing to it.

Let's take on one final argument of Matthews' that also sums up the whole situation. According to Matthews, those concerned about AI risk presuppose that unborn people count equally to people alive now. To begin with, that's stronger than what Eliezer argues. Eliezer has argued that future people need only to be valuable to within some reasonable factor of present people, to be overwhelmingly important. If our unborn children, great grandchildren and so on for a dozen generations were even 10% as important as us, then they would be more important than people currently living. If population grows in that time, or we give some moral weight to generations beyond that, or you privilege our descendants more equally to ourselves, then their value increases far beyond ours. Even if the 8 billion people currently alive were lost to a disaster, AI or otherwise, that would be terrible. It also seems neglected, as the comparison and prioritisation of such disasters lacks an establish field in which to receive proper academic attention. If future generations count also, then it may be terribly worse, and the question is just how much.

If Matthews just wanted to say that it's a bit awkward that people are still citing Pascal's Mugging arguments, then that would be fine. But if he was writing a piece whose main focus was his reservations about AI risk, and would widely be distributed as a critique of such, then he should have stress-tested these against the people who are working for the organisations being criticised, and who were eminently accessible to him at the recent conference. Unfortunately, it's not straightforward to undo the impact of a poorly thought-through, and shareable opinion piece. At any rate, Matthews can be one of the first to read this counter-critique and I'm happy to correct any errors.

In conclusion, covering AI risk is hard. AI risk reduction efforts are a mixture of CS-experts and others, who would anyway be criticised if they were composed differently. Even if one gives some privilege to presently alive people above our descendants, existential risks are important, and we've no reason to be so sceptical as to invoke Pascal's Mugging to support AI risk reduction.