Below I describe three faulty thoughts that made me too confident that AI-related x-risk would be real. After I identified these thoughts, I grew a lot more sceptical of AI Safety as an effective cause area. This post consists of edited parts of some blog posts on my own website.
All sentences are wrong, but some are useful. I think that a certain emotional salience makes me talk about AI in a way that is more wrong than necessary. For example, a self-driving car and a pre-driven car are the exact same thing, but I can feel myself thinking about the two in completely different ways.
A self-driving car is easy to imagine: they are smart and autonomous and you can trust the car like you trust a cab driver. They can make mistakes but probably have good intent. When they encounter an unfamiliar situation they can think about the correct way to proceed. They behave in concordance with the goal their creator set them and they tend to make smart decisions. If anything goes wrong then the car is at fault.
A pre-driven car is hard to imagine: it has to have a bunch of rules coded into it by the manufacturer and you can trust the car like you trust a bridge; it does exactly what it was built to do, but unforeseen circumstances will lead to inelegant failure. The functioning of these systems depends deeply on how the programmers modelled the task of driving and the code's functionality is very brittle. When something goes wrong, the company and engineers are at fault.
This is to say, the language I use to talk about autonomous systems influences how I think about them, and what outcomes I consider more or less likely. If you want to understand present-day algorithms, the "pre-driven car" model of thinking works a lot better than the "self-driving car" model of thinking. The present and past are the only tools we have to think about the future, so I expect the "pre-driven car" model to make more accurate predictions. I try to move my own language away from words like "artificial intelligence" and "self-driving cars" towards words like "classification algorithms" and "pre-driven cars".
One can make these substitutions on any sentence in which a computer is ascribed agency. In the best case, “The neural network learned to recognize objects in images” becomes “The fitted model classifies images in close correspondence with the human-given labels”. (In reality, even that description might be too generous.)
It helps to keep in mind the human component. “The YouTube autoplay algorithm shows you exactly those videos that make you spend more time on the platform” is accurate in some sense, but it completely glances over the ways in which in the algorithm does not do that. When you listen to music using YouTube’s autoplay, it isn’t hard to notice that suggestions tend to point backwards in time compared to the upload date of the video you’re watching right now, and that, apart from preventing repeats, autoplay is pretty Markovian (that is mathspeak for the algorithm not doing anything clever based on your viewing history, just “this video is best followed by that video”). Both of those properties are clearly a result from the way in which YouTube’s engineers modelled the problem they were trying to solve. I would describe YouTube’s suggestions as “The YouTube autoplay algorithm shows you videos that most people watched and liked after watching the current video”.
When you rewrite AI-related statements, they tend to become more wordy. That is exactly what you would expect, since the common words did get selected based on ease of use, but does make it unwieldy to have accurate conversations.
I have not yet found any argument in favour of AI Risk being real that remained convincing after the above translation.
A sense of meaning
My background makes me prone to overrate how important AI Safety is.
My fields of expertise and enjoyment are mathematics and computer science. These skills are useful for the economy and in high demand. The general public is in awe of mathematics and thinks highly of anyone who can do it well. Computer science is the closest thing we have to literal magic.
Wealth, fun, respect, power. The only thing left for me to desire is cosmic significance, which is exactly the sales pitch of existential risk causes areas. It would be nice if AI-related existential risk were real; for my labour to potentially make the difference between a meaningless lifeless universe or a universe filled with happiness. It would give objective significance to my life.
This is fertile ground for motivated reasoning.
When EA's describe how much utility could fit in the universe, the reference class for numbers is “how many X fits in the universe”, where X ranges over things like atoms, people, planets, stars. These numbers are huge, typically expressed by quantities of the form for .
When we describe how likely certain events are, the tempting reference class is “statements of probability”, which are typically expressed as . It seems absurd to assign AI-risk less than 0.0000000000000000000000000000001% probability because that would be a lot of zeros.
The combination of these vastly different expressions of scale together with anchoring makes that we should expect people to over-estimate the probability of unlikely risks and hence to over-estimate the expected utility of x-risk prevention measures.
I think there is a good argument to be made that the probability of AI Safety work being effective is less than . I wrote a very rough draft trying to defend that number in this blog post, but it's very much a proof-of-concept written for myself and so the writing isn't very polished.
I used to think that working in AI Safety would be a good fit for me, but I stopped thinking that after I noticed that most of my belief in AI risk was caused by biased thinking: self-aggrandizing motivated reasoning, misleading language, and anchoring on unjustified probability estimates.
If people here would appreciate it, I would be happy to write one or more posts on object-level arguments as to why I am now sceptical of AI risk. Let me know in the comments.