Note: I mostly wrote this post after Eliezer Yudkowsky’s “Death with Dignity” essay appeared on LessWrong. Since then, Jotto has written a post that overlaps a bit with this one, which sparked an extended discussion in the comments. You may want to look at that discussion as well. See also, here, for another relevant discussion thread.
EDIT: See here for some post-discussion reflections on what I think this post got right and wrong.
Most people, when forming their own views on risks from misaligned AI, have some inclination to defer to others who they respect or think of as experts.
This is a reasonable thing to do, especially if you don’t yet know much about AI or haven’t yet spent much time scrutinizing the arguments. If someone you respect has spent years thinking about the subject, and believes the risk of catastrophe is very high, then you probably should take that information into account when forming your own views.
It’s understandable, then, if Eliezer Yudkowsky’s recent writing on AI risk helps to really freak some people out. Yudkowsky has probably spent more time thinking about AI risk than anyone else. Along with Nick Bostrom, he is the person most responsible for developing and popularizing these concerns. Yudkowsky has now begun to publicly express the view that misaligned AI has a virtually 100% chance of killing everyone on Earth - such that all we can hope to do is “die with dignity.”
The purpose of this post is, simply, to argue that people should be wary of deferring too much to Eliezer Yudkowsky, specifically, when it comes to estimating AI risk. In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who is smart and has spent a large amount of time thinking about AI risk.
The post highlights what I regard as some negative aspects of Yudkowsky’s track record, when it comes to technological risk forecasting. I think these examples suggest that (a) his track record is at best fairly mixed and (b) he has some tendency toward expressing dramatic views with excessive confidence. As a result, I don’t personally see a strong justification for giving his current confident and dramatic views about AI risk a great deal of weight.
I agree it’s highly worthwhile to read and reflect on Yudkowsky’s arguments. I also agree that potential risks from misaligned AI deserve extremely serious attention - and are even, plausibly, more deserving of attention than any other existential risk. I also think it's important to note that many experts beyond Yudkowsky are very concerned about risks from misaligned AI. I just don’t think people should infer too much from the fact that Yudkowsky, specifically, believes we’re doomed.
Why write this post?
Before diving in, it may be worth saying a little more about why I hope this post might be useful. (Feel free to skip ahead if you're not interested in this section.)
In brief, it matters what the existential risk community believes about the risk from misaligned AI. I think that excessively high credences in doom can lead to:
- poor prioritization decisions (underprioritizing other risks, including other potential existential risks from AI)
- poor community health (anxiety and alienation)
- poor reputation (seeming irrational, cultish, or potentially even threatening), which in turn can lead to poor recruitment or retention of people working on important problems
My own impression is that, although it's sensible to take potential risks from misaligned AI very seriously, a decent number of people are now more freaked out than they need to be. And I think that excessive deference to some highly visible intellectuals in this space, like Yudkowsky, may be playing an important role - either directly or through deference cascades. I'm especially concerned about new community members, who may be particularly inclined to defer to well-known figures and who may have particularly limited awareness of the diversity of views in this space. I've recently encountered some anecdotes I found worrisome.
Nothing I write in this post implies that people shouldn't freak out, of course, since I'm mostly not engaging with the substance of the relevant arguments (although I have done this elsewhere, for instance here, here, and here). If people are going to freak out about AI risk, then I at least want to help make sure that they’re freaking out for sufficiently good reasons.
Yudkowsky’s track record: some cherry-picked examples
Here, I’ve collected a number of examples of Yudkowsky making (in my view) dramatic and overconfident predictions concerning risks from technology.
Note that this isn’t an attempt to provide a balanced overview of Yudkowsky’s technological predictions over the years. I’m specifically highlighting a number of predictions that I think are underappreciated and suggest a particular kind of bias.
Doing a more comprehensive overview, which doesn’t involve specifically selecting poor predictions, would surely give a more positive impression. Hopefully this biased sample is meaningful enough, however, to support the claim that Yudkowsky’s track record is at least pretty mixed.
Also, a quick caveat: Unfortunately, but understandably, Yudkowsky didn’t have time review this post and correct any inaccuracies. In various places, I’m summarizing or giving impressions of lengthy pieces I haven’t fully read, or haven't fully read in well more than year, so there's a decent chance that I’ve accidentally mischaracterized some of his views or arguments. Concretely: I think there’s something on the order of a 50% chance I’ll ultimately feel I should correct something below.
Fairly clearcut examples
1. Predicting near-term extinction from nanotech
At least up until 1999, admittedly when he was still only about 20 years old, Yudkowsky argued that transformative nanotechnology would probably emerge suddenly and soon (“no later than 2010”) and result in human extinction by default. My understanding is that this viewpoint was a substantial part of the justification for founding the institute that would become MIRI; the institute was initially focused on building AGI, since developing aligned superintelligence quickly enough was understood to be the only way to manage nanotech risk:
On the nanotechnology side, we possess machines capable of producing arbitrary DNA sequences, and we know how to turn arbitrary DNA sequences into arbitrary proteins (6). We have machines - Atomic Force Probes - that can put single atoms anywhere we like, and which have recently  been demonstrated to be capable of forming atomic bonds. Hundredth-nanometer precision positioning, atomic-scale tweezers... the news just keeps on piling up…. If we had a time machine, 100K of information from the future could specify a protein that built a device that would give us nanotechnology overnight….
If you project on a graph the minimum size of the materials we can manipulate, it reaches the atomic level - nanotechnology - in I forget how many years (the page vanished), but I think around 2035. This, of course, was before the time of the Scanning Tunnelling Microscope and "IBM" spelled out in xenon atoms. For that matter, we now have the artificial atom ("You can make any kind of artificial atom - long, thin atoms and big, round atoms."), which has in a sense obsoleted merely molecular nanotechnology - the surest sign that nanotech is just around the corner. I believe Drexler is now giving the ballpark figure of 2013. My own guess would be no later than 2010…
Above all, I would really, really like the Singularity to arrive before nanotechnology, given the virtual certainty of deliberate misuse - misuse of a purely material (and thus, amoral) ultratechnology, one powerful enough to destroy the planet. We cannot just sit back and wait….
Mitchell Porter calls it "The race between superweapons and superintelligence." Human civilization will continue to change until we either create superintelligence, or wipe ourselves out. Those are the two stable states, the two "attractors". It doesn't matter how long it takes, or how many cycles of nanowar-and-regrowth occur before Transcendence or final extinction. If the system keeps changing, over a thousand years, or a million years, or a billion years, it will eventually wind up in one attractor or the other. But my best guess is that the issue will be settled now.”
I should, once again, emphasize that Yudkowsky was around twenty when he did the final updates on this essay. In that sense, it might be unfair to bring this very old example up.
Nonetheless, I do think this case can be treated as informative, since: the belief was so analogous to his current belief about AI (a high outlier credence in near-term doom from an emerging technology), since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community, since it's not clear when he dropped the belief, and since twenty isn't (in my view) actually all that young. I do know a lot of people in their early twenties; I think their current work and styles of thought are likely to be predictive of their work and styles of thought in the future, even though I do of course expect the quality to go up over time.
2. Predicting that his team had a substantial chance of building AGI before 2010
In 2001, and possibly later, Yudkowsky apparently believed that his small team would be able to develop a “final stage AI” that would “reach transhumanity sometime between 2005 and 2020, probably around 2008 or 2010.”
In the first half of the 2000s, he produced a fair amount of technical and conceptual work related to this goal. It hasn't ultimately had much clear usefulness for AI development, and, partly on the basis, my impression is that it has not held up well - but that he was very confident in the value of this work at the time.
The key points here are that:
Yudkowsky has previously held short AI timeline views that turned out to be wrong
Yudkowsky has previously held really confident inside views about the path to AGI that (at least seemingly) turned out to be wrong
More generally, Yudkowsky may have a track record of overestimating or overstating the quality of his insights into AI
Although I haven’t evaluated the work, my impression is that Yudkowsky was a key part of a Singularity Institute effort to develop a new programming language to use to create “seed AI.” He (or whoever was writing the description of the project) seems to have been substantially overconfident about its usefulness. From the section of the documentation titled “Foreword: Earth Needs Flare” (2001):
A new programming language has to be really good to survive. A new language needs to represent a quantum leap just to be in the game. Well, we're going to be up-front about this: Flare is really good. There are concepts in Flare that have never been seen before. We expect to be able to solve problems in Flare that cannot realistically be solved in any other language. We expect that people who learn to read Flare will think about programming differently and solve problems in new ways, even if they never write a single line of Flare….Flare was created under the auspices of the Singularity Institute for Artificial Intelligence, an organization created with the mission of building a computer program far before its time - a true Artificial Intelligence. Flare, the programming language they asked for to help achieve that goal, is not that far out of time, but it's still a special language.”
Coding a Transhuman AI
I haven’t read it, to my discredit, but “Coding a Transhuman AI 2.2” is another piece of technical writing by Yudkowsky that one could look at. The document is described as “the first serious attempt to design an AI which has the potential to become smarter than human,” and aims to “describe the principles, paradigms, cognitive architecture, and cognitive components needed to build a complete mind possessed of general intelligence.”
From a skim, I suspect there’s a good chance it hasn’t held up well - since I’m not aware of any promising later work that builds on it and since it doesn’t seem to have been written with the ML paradigm in mind - but can’t currently give an informed take.
Levels of Organization in General Intelligence
A later piece of work which I also haven’t properly read is “Levels of Organization in General Intelligence.” At least by 2005, going off of Yudkowsky’s post “So You Want to be a Seed AI Programmer,” it seems like he thought a variation of the framework in this paper would make it possible for a very small team at the Singularity Institute to create AGI:
There's a tradeoff between the depth of AI theory, the amount of time it takes to implement the project, the number of people required, and how smart those people need to be. The AI theory we're planning to use - not LOGI, LOGI's successor - will save time and it means that the project may be able to get by with fewer people. But those few people will have to be brilliant…. The theory of AI is a lot easier than the practice, so if you can learn the practice at all, you should be able to pick up the theory on pretty much the first try. The current theory of AI I'm using is considerably deeper than what's currently online in Levels of Organization in General Intelligence - so if you'll be able to master the new theory at all, you shouldn't have had trouble with LOGI. I know people who did comprehend LOGI on the first try; who can complete patterns and jump ahead in explanations and get everything right, who can rapidly fill in gaps from just a few hints, who still don't have the level of ability needed to work on an AI project.
Somewhat disputable examples
I think of the previous two examples as predictions that resolved negatively. I'll now cover a few predictions that we don't yet know are wrong (e.g. predictions about the role of compute in developing AGI), but I think now have reason to regard as significantly overconfident.
3. Having high confidence that AI progress would be extremely discontinuous and localized and not require much compute
In his 2008 "FOOM debate" with Robin Hanson, Yudkowsky confidentally staked out very extreme positions about what future AI progress would look like - without (in my view) offering strong justifications. The past decade of AI progress has also provided further evidence against the correctness of his core predictions.
A quote from the debate, describing the median development scenario he was imaging at the time:
When we try to visualize how all this is likely to go down, we tend to visualize a scenario that someone else once termed “a brain in a box in a basement.” I love that phrase, so I stole it. In other words, we tend to visualize that there’s this AI programming team, a lot like the sort of wannabe AI programming teams you see nowadays, trying to create artificial general intelligence, like the artificial general intelligence projects you see nowadays. They manage to acquire some new deep insights which, combined with published insights in the general scientific community, let them go down into their basement and work on it for a while and create an AI which is smart enough to reprogram itself, and then you get an intelligence explosion…. (p. 436)
The idea (as I understand it) was that AI progress would have very little impact on the world, then a small team of people with a very small amount of computing power would have some key insight, then they’d write some code for an AI system, then that system would rewrite its own code, and then it would shortly after take over the world.
When pressed by his debate partner, regarding the magnitude of the technological jump he was forecasting, Yudkowsky suggested that economic output could at least plausibly rise by twenty orders-of-magnitude within not much more than a week - once the AI system has developed relevant nanotechnologies (pg. 400). To give a sense of how extreme that is: If you extrapolate twenty-orders-of-magnitude-per-week over the course of a year - although, of course, no one expected this rate to be maintained for anywhere close to a year - it is equivalent to an annual economic growth rate of (10^1000)%.
I think it’s pretty clear that this viewpoint was heavily influenced by the reigning AI paradigm at the time, which was closer to traditional programming than machine learning. The emphasis on “coding” (as opposed to training) as the means of improvement, the assumption that large amounts of compute are unnecessary, etc. seem to follow from this. A large part of the debate was Yudkowsky arguing against Hanson, who thought that Yudkowsky was underrating the importance of compute and “content” (i.e. data) as drivers of AI progress. Although Hanson very clearly wasn’t envisioning something like deep learning either, his side of the argument seems to fit better with what AI progress has looked like over the past decade. In particular, huge amounts of compute and data have clearly been central to recent AI progress and are currently commonly thought to be central - or, at least, necessary - for future progress.
In my view, the pro-FOOM essays in the debate also just offered very weak justifications for thinking that a small number of insights could allow a small programming team, with a small amount of computing power, to abruptly jump the economic growth rate up by several orders of magnitude. The main reasons that stood out to me, from the debate, are these:
It requires less than a gigabyte to store someone’s genetic information on a computer (p. 444).
The brain “just doesn’t look all that complicated” in comparison to human-made pieces of technology such as computer operating systems (p.444), on the basis of the principles that have been worked out by neuroscientists and cognitive scientists.
There is a large gap between the accomplishments of humans and chimpanzees, which Yudkowsky attributes this to a small architectural improvement: “If we look at the world today, we find that taking a little bit out of the architecture produces something that is just not in the running as an ally or a competitor when it comes to doing cognitive labor….[T]here are no branches of science where chimpanzees do better because they have mostly the same architecture and more relevant content” (p. 448).
Although natural selection can be conceptualized as implementing a simple algorithm, it was nonetheless capable of creating the human mind.
I think that Yudkowsky's prediction - that a small amount of code, run using only a small amount of computing power, was likely to abruptly jump economic output upward by more than a dozen orders-of-magnitude - was extreme enough to require very strong justifications. My view is that his justifications simply weren't that strong. Given the way AI progress has looked over the past decade, his prediction also seems very likely to resolve negatively.
4. Treating early AI risk arguments as close to decisive
In my view, the arguments for AI risk that Yudkowsky had developed by the early 2010s had a lot of very important gaps. They were suggestive of a real risk, but were still far from worked out enough to justify very high credences in extinction from misaligned AI. Nonetheless, Yudkowsky recalls his credence in doom was "around the 50% range" at the time, and his public writing tended to suggest that he saw the arguments as very tight and decisive.
These slides summarize what I see as gaps in the AI risk argument that appear in Yudkowsky’s essays/papers and in Superintelligence, which presents somewhat fleshed out and tweaked versions of Yudkowsky’s arguments. This podcast episode covers most of the same points. (Note that almost none of these objections I walk through are entirely original to me.)
You can judge for yourself whether these criticisms of his arguments fair. If they seem unfair to you, then, of course, you should disregard this as an illustration of an overconfident prediction. One additional piece of evidence, though, is that his arguments focused on a fairly specific catastrophe scenario that most researchers now assign less weight to than they did when they first entered the field.
For instance, the classic arguments treated used an extremely sudden "AI takeoff" as a central premise. Arguably, fast takeoff was the central premise, since presentations of the risk often began by establishing that there is likely to be a fast take-off (and thus an opportunity for a decisive strategic advantage) and then built the remainder of the argument on top of this foundation. However, many people in the field have now moved away from finding sudden take-off arguments compelling (e.g. for the kinds of reasons discussed here and here).
My point, here, is not necessarily that Yudkowsky was wrong, but rather that he held a much higher credence in existential risk from AI than his arguments justified at the time. The arguments had pretty crucial gaps that still needed to be resolved, but, I believe, his public writing tended to suggest that these arguments were tight and sufficient to justify very high credences in doom.
5. Treating "coherence arguments" as forceful
In the mid-2010s, some arguments for AI risk began to lean heavily on “coherence arguments” (i.e. arguments that draw implications from the von Neumann-Morgenstern utility theorem) to support the case for AI risk. See, for instance, this introduction to AI risk from 2016, by Yudkowsky, which places a coherence argument front and center as a foundation for the rest of the presentation. I think it's probably fair to guess that the introduction-to-AI-risk talk that Yudkowsky was giving in 2016 contained what he regarded as the strongest concise arguments available.
However, later analysis has suggested that coherence arguments have either no or very limited implications for how we should expect future AI systems to behave. See Rohin Shah’s (I think correct) objection to the use of “coherence arguments” to support AI risk concerns. See also similar objections by Richard Ngo and Eric Drexler (Section 6.4).
Unfortunately, this is another case where the significance of this example depends on how much validity you assign to a given critique. In my view, the critique is strong. However, I'm unsure what portion of alignment researchers currently agree with me. I do know of at least one prominent researcher who was convinced by it; people also don't seem to make coherence arguments very often anymore, which perhaps suggests that the critiques have gotten traction. However, if you have the time and energy, you should reflect on the critiques for yourself.
If the critique is valid, then this would be another example of Yudkowsky significantly overestimating the strength of an argument for AI risk.
[[EDIT: See here for a useful clarification by Rohin.]]
A somewhat meta example
6. Not acknowledging his mixed track record
So far as I know, although I certainly haven't read all of his writing, Yudkowsky has never (at least publicly) seemed to take into account the mixed track record outlined above - including the relatively unambiguous misses.
He has written about mistakes from early on in his intellectual life (particularly pre-2003) and has, on this basis, even made a blanket-statement disavowing his pre-2003 work. However, based on my memory and a quick re-read/re-skim, this writing is an exploration of why it took him a long time to become extremely concerned about existential risks from misaligned AI. For instance, the main issue it discusses with his plans to build AGI are that these plans didn't take into account the difficulty and importance of ensuring alignment. This writing isn't, I think, an exploration or acknowledgement of the kinds of mistakes I've listed in this post.
The fact he seemingly hasn’t taken these mistakes into account - and, if anything, tends to write in a way that suggests he holds a very high opinion of his technological forecasting track record - leads me to trust his current judgments less than I otherwise would.
To be clear, Yudkowsky isn’t asking other people to defer to him. He’s spent a huge amount of time outlining his views (allowing people to evaluate them on their merits) and has often expressed concerns about excessive epistemic deference. ↩︎
A better, but still far-from-optimal approach to deference might be to give a lot of weight to the "average" view within the pool of smart people who have spent a reasonable amount of time thinking about AI risk. This still isn't great, though, since different people do deserve different amounts of weight, and since there's at least some reason to think that selection effects might bias this pool toward overestimating the level of risk. ↩︎
It might be worth emphasizing that I’m not making any claim about the relative quality of my own track record. ↩︎
To say something concrete about my current views on misalignment risk: I'm currently inclined to assign a low-to-mid-single-digits probability to existential risk from misaligned AI this century, with a lot of volatility in my views. This is of course, in some sense, still extremely high! ↩︎
I think that expressing extremely high credences in existential risk (without sufficiently strong and clear justification) can also lead some people to simply dismiss the concerns. It is often easier to be taken seriously, when talking about strange and extreme things, if you express significant uncertainty. Importantly, I don't think this means that people should ever misrepresent their levels of concern about existential risks; dishonesty seems like a really bad and corrosive policy. Still, this is one extra reason to think that it can be important to avoid overestimating risks. ↩︎
Yudkowsky is obviously a pretty polarizing figure. I'd also say that some people are probably too dismissive of him, for example because they assign too much significance to his lack of traditional credentials. But it also seems clear that many people are inclined to give Yudkowsky's views a great deal of weight. I've even encountered the idea that Yudkowsky is virtually the only person capable of thinking about alignment risk clearly. ↩︎
I think that cherry-picking examples from someone's forecasting track record is normally bad to do, even if you flag that you're engaged in cherry-picking. However, I do think (or at least hope) that it's fair in cases where someone already has a very high level of respect and frequently draws attention to their own successful predictions. ↩︎
I don't mean to suggest that the specific twenty orders-of-magnitude of growth figure was the result of deep reflection or was Yudkowsky's median estimate. Here is the specific quote, in response to Hanson raising the twenty orders-of-magnitude-in-a-week number: "Twenty orders of magnitude in a week doesn’t sound right, unless you’re talking about the tail end after the AI gets nanotechnology. Figure more like some number of years to push the AI up to a critical point, two to six orders of magnitude improvement from there to nanotech, then some more orders of magnitude after that." I think that my general point, that this is a very extreme prediction, stays the same even if we lower the number to ten orders-of-magnitude and assume that there will be a bit of a lag between the 'critical point' and the development of the relevant nanotechnology. ↩︎
As an example of a failed prediction or piece of analysis on the other side of the FOOM debate, Hanson praised the CYC project - which lies far afield of the current deep learning paradigm and now looks like a clear dead end. ↩︎
Yudkowsky also provides a number of arguments in favor of the view that the human mind can be massively improved upon. I think these arguments are mostly right. However, I think, they don't have any very strong implications for the question of whether AI progress will be compute-intensive, sudden, or localized. ↩︎
To probe just the relevance of this one piece of evidence, specifically, let’s suppose that it’s appropriate to use the length of a person’s genome in bits of information as an upper bound on the minimum amount of code required to produce a system that shares their cognitive abilities (excluding code associated with digital environments). This would imply that it is in principle possible to train an ML model that can do anything a given person can do, using something on the order of 10 million lines of code. But even if we accept this hypothesis - which seems quite plausible to me - it doesn’t seem to me like this implies much about the relative contributions of architecture and compute to AI progress or the extent to which progress in architecture design is driven by “deep insights.” For example, why couldn’t it be true that it is possible to develop a human-equivalent system using fewer than 10 million lines of code and also true that computing power (rather than insight) is the main bottleneck to developing such a system? ↩︎
Two caveats regarding my discussion of the FOOM debate:
First, I should emphasize that, although I think Yudkowsky’s arguments were weak when it came to the central hypothesis being debated, his views were in some other regards more reasonable than his debate partner’s. See here for comments by Paul Christiano on how well various views Yudkowsky expressed in the FOOM debate have held up.
Second, it's been a few years since I've read the FOOM debate - and there's a lot in there (the book version of it is 741 pages long) - so I wouldn't be surprised if my high-level characterization of Yudkowsky's arguments is importantly misleading. My characterization here is based on some rough notes I took the last time I read it. ↩︎
For example, it may be possible to construct very strong arguments for AI risk that don't rely on the fast take-off assumption. However, in practice, I think it's fair to say that the classic arguments did rely on this assumption. If the assumption wasn't actually very justified, then, I think, it seems to follow that having a very high credence in AI risk also wasn't justified at the time ↩︎
Here’s another example of an argument that’s risen to prominence in the past few years, and plays an important role in some presentations of AI risk, that I now suspect simply might not work. This argument shows up, for example, in Yudkowsky’s recent post “AGI Ruin: A List of Lethalities,” at the top of the section outlining “central difficulties.” ↩︎
EDIT: I've now written up my own account of how we should do epistemic deference in general, which fleshes out more clearly a bunch of the intuitions I outline in this comment thread.
I think that a bunch of people are overindexing on Yudkowsky's views; I've nevertheless downvoted this post because it seems like it's making claims that are significantly too strong, based on a methodology that I strongly disendorse. I'd much prefer a version of this post which, rather than essentially saying "pay less attention to Yudkowsky", is more nuanced about how to update based on his previous contributions; I've tried to do that in this comment, for example. (More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements. Note that the list of agreements there, which I expect that many other alignment researchers also buy into, serves as a significant testament to Yudkowsky's track record.)
The part of this post which seems most wild to me is the leap from "mixed track record" to... (read more)
I disagree that the sentence is false for the interpretation I have in mind.
I think it's really important to seperate out the question "Is Yudkowsky an unusually innovative thinker?" and the question "Is Yudkowsky someone whose credences you should give an unusual amount of weight to?"
I read your comment as arguing for the former,... (read more)
I phrased my reply strongly (e.g. telling people to read the other post instead of this one) because deference epistemology is intrinsically closely linked to status interactions, and you need to be pretty careful in order to make this kind of post not end up being, in effect, a one-dimensional "downweight this person". I don't think this post was anywhere near careful enough to avoid that effect. That seems particularly bad because I think most EAs should significantly upweight Yudkowsky's views if they're doing any kind of reasonable, careful deference, because most EAs significantly underweight how heavy-tailed the production of innovative ideas actually is (e.g. because of hindsight bias, it's hard to realise how much worse than Eliezer we would have been at inventing the arguments for AI risk, and how many dumb things we would have said in his position).
By contrast, I think your post is implicitly using a model where we have a few existing, well-identified questions, and the most important thing is to just get to the best credences on those questions, and we should do so partly by just updating in the direction of experts. But I think this model of deference is rarely relevant... (read more)
This seems like an overly research-centric position.
When your job is to come up with novel relevant stuff in a domain, then I agree that it's mostly about "which ideas and arguments to take seriously" rather than specific credences.
When your job is to make decisions right now, the specific credences matter. Some examples:
I think that there are very few decisions which are both a) that low-dimensional and b) actually sensitive to the relevant range of credences that we're talking about.
Like, suppose you think that Eliezer's credences on his biggest claims are literally 2x higher than they should be, even for claims where he's 90% confident. This is a huge hit in terms of Bayes points; if that's how you determine deference, and you believe he's 2x off, then plausibly that implies you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved - this should very rarely move you from a yes to no, or vice versa. (edit: I should restrict the scope here to grantmaking in complex, high-uncertainty domains like AI alignment).
Then you might say: well, okay, we're not just making binary decisions, we're making complex decisions where we're choosing between lots of different options. But the more complex the decisions you're making, the less you should care about whether somebody's credences on a few key claims are accurate, and the more you should care about whether they're identify... (read more)
Such differences are crucial for many of the most important grant areas IME, because they are areas where you are trading off multiple high-stakes concerns. E.g. in nuclear policy all the strategies on offer have arguments that they might lead to nuclear war or worse war. On AI alignment there are multiple such tradeoffs and people embracing strategies to push the same variable in opposite directions with high stakes on both sides.
I think differences between Eliezer + my views often make way more than a 2x difference to the bottom line. I'm not sure why you're only considering probabilities on specific claims; when I think of "deferring" I also imagine deferring on estimates of usefulness of various actions, which can much more easily have OOMs of difference.
(Fwiw I also think Eliezer is way more than 2x too high for probabilities on many claims, though I don't think that matters much for my point.)
Taking my examples:
Since Eliezer thinks something like 99.99% chance of doom from AI, that reduces cost effectiveness of all x-risk-targeted biosecurity work by a factor of 10,000x (since only in 1 in 10,000 worlds does the reduced bio x-risk matter at all), whereas if you have < 50% of doom from AI (as I do) then that's a discount factor of < 2x on x-risk-targeted biosecurity work. So that's almost 4 OOMs of difference.
Eli... (read more)
We both agree that you shouldn't defer to Eliezer's literal credences, because we both think he's systematically overconfident. The debate is between two responses to that:
a) Give him less deference weight than the cautious, sober, AI safety people who make few novel claims but are better-calibrated (which is what Ben advocates).
b) Try to adjust for his overconfidence and then give significant deference weight to a version of his worldview that isn't overconfident.
I say you should do the latter, because you should be deferring to coherent worldviews (which are rare) rather than deferring on a question-by-question basis. This becomes more and more true the more complex the decisions you have to make. Even for your (pretty simple) examples, the type of deference you seem to be advocating doesn't make much sense.
It doesn't make sense to defer to Eliezer's estimate of the relative importance of AI without also accounting for his estimate of the relative tractability of funding AI, which I infer he thinks is very low.... (read more)
Musing out loud: I don't know of any complete model of deference which doesn't run into weird issues, like the conclusion that you should never trust yourself. But suppose you have some kind of epistemic parliament where you give your own views some number of votes, and assign the rest of the votes to other people in proportion to how defer-worthy they seem. Then you need to make a bunch of decisions, and your epistemic parliament keeps voting on what will best achieve your (fixed) goals.
If you do naive question-by-question majority voting on each question simultaneously then you can end up with an arbitrarily incoherent policy - i.e. a set of decisions that's inconsistent with each other. And if you make the decisions in some order, with the constraint that they each have to be consistent with all prior decisions, then the ordering of the decisions can become arbitrarily important.
Instead, you want your parliament to negotiate some more coherent joint policy to follow. And I expect that in this joint policy, each worldview gets its way on the questions that are most important to it, and cedes responsibility on the questions that are least important. So Eliezer's worldview doesn't end up reallocating all the biosecurity money, but it does get a share of curriculum time (at least for the most promising potential researchers). But in general how to conduct those negotiations is an unsolved problem (and pretty plausibly unsolveable).
IMO the crux is that I disagree with both of these. Instead I think you should use each worldview to calculate a policy, and then generate some kind of compromise between those policies. My arguments above were aiming to establish that this strategy is not very sensitive to exactly how much you defer to Eliezer, because there just aren't very many good worldviews going around - hence why I assign maybe 15 or 20% (inside view) credence to his worldview (updated from 10% above after reflection). (I think my all-things-considered view is similar, actually, because deference to him cancels out against deference to all the people who think he's totally wrong.)
Again, the difference is in large part determined by whether you think you're in a low-dimensional space (here are our two actions, which one should we take?) versus a high-dimensional space (millions of actions available to us, how do we narrow it down?) In a high-dimensional space the tradeoffs between the best ways to generate utility according to Eliezer... (read more)
Meta: I'm currently writing up a post with a fully-fleshed-out account of deference. If you'd like to drop this thread and engage with that when it comes out (or drop this thread without engaging with that), feel free; I expect it to be easier to debate when I've described the position I'm defending in more detail.
I always agreed with this claim; my point was that the type of deference which is important for people making decisions now should not be very sensitive to the "specific credences" of the people you're deferring to. You were arguing above that the difference between your and Eliezer's views makes much more than a 2x difference; do you now agree that, on my account of deference, a big change in the deference-weight you assign to Eliezer plausibly leads to a much smaller change in your policy from the perspective of other worldviews, because the Eliezer-worldview trades off influence ... (read more)
Beat me to it & said it better than I could.
My now-obsolete draft comment was going to say:
It seems to me that between about 2004 and 2014, Yudkowsky was the best person in the world to listen to on the subject of AGI and AI risks. That is, deferring to Yudkowsky would have been a better choice than deferring to literally anyone else in the world. Moreover, after about 2014 Yudkowsky would probably have been in the top 10; if you are going to choose 10 people to split your deference between (which I do not recommend, I recommend thinking for oneself), Yudkowsky should be one of those people and had you dropped Yudkowsky from the list in 2014 you would have missed out on some important stuff. Would you agree with this?
On the positive side, I'd be interested to see a top ten list from you of people you think should be deferred to as much or more than Yudkowsky on matters of AGI and AI risks.*
*What do I mean by this? Idk, here's a partial operationalization: Timelines, takeoff speeds, technical AI alignment, and p(doom).
[ETA: lest people write me off as a Yudkowsky fanboy, I wish to emphasize that I too think people are overindexing on Yudkowsky's views, I too think there ar... (read more)
That seems like a considerable overstatement to me. I think it would be bad if the forum rules said an article like this couldn't be posted.
If anything, I think that prohibiting posts like this from being published would have a more detrimental effect on community culture.
Of course, people are welcome to criticise Ben's post - which some in fact do. That's a very different category from prohibition.
Yeah, that sounds perfectly plausible to me.
“A bit confused” wasn’t meant to be any sort of rhetorical pretend understatement or something. I really just felt a slight surprise that caused me to check whether the forum rules contain something about ad hom, and found that they don’t. It may well be the right call on balance. I trust the forum team on that.
I really appreciate the time people have taken to engage with this post (and actually hope the attention cost hasn’t been too significant). I decided to write some post-discussion reflections on what I think this post got right and wrong.
The reflections became unreasonably long - and almost certainly should be edited down - but I’m posting them here in a hopefully skim-friendly format. They cover what I see as some mistakes with the post, first, and then cover some views I stand by.
Things I would do differently in a second version of the post:
1. I would either drop the overall claim about how much people should defer to Yudkowsky — or defend it more explicitly
At the start of the post, I highlight the two obvious reasons to give Yudkowsky's risk estimates a lot of weight: (a) he's probably thought more about the topic than anyone else and (b) he developed many of the initial AI risk arguments. I acknowledge that many people, justifiably, treat these as important factors when (explicitly or implicitly) deciding how much to defer to Yudkowsky.
Then the post gives some evidence that, at each stage of his career, Yudkowsky has made a dramatic, seemingly overconfident prediction about tec... (read more)
I noted some places I agree with your comment here, Ben. (Along with my overall take on the OP.)
Some additional thoughts:
The “death with dignity” post came in the wake of Eliezer writing hundreds of thousands of words about why he thinks alignment is hard in the Late 2021 MIRI Conversations (in addition to the many specific views and arguments about alignment difficulty he’s written up in the preceding 15+ years). So it seems wrong to say that everyone was taking it seriously based on deference alone.
The post also has a lot of content beyond “p(doom) is high”. Indeed, I think the post’s focus (and value-add) is mostly in its discussion of rationalization, premature/excessive conditionalizing, and ethical injunctions, not in the bare assertion that p(doom) is high. Eliezer was already saying pretty similar stuff about p(doom) back in September.... (read more)
I just wanted to state agreement that it seems a large number of people largely misread Death with Dignity, at least according to what seems to me the most plausible intended message: mainly about the ethical injunctions (which are very important as a finitely-rational and prone-to-rationalisation being), as Yudkowsky has written of in the past.
The additional detail of 'and by the way this is a bad situation and we are doing badly' is basically modal Yudkowsky schtick and I'm somewhat surprised it updated anyone's beliefs (about Yudkowsky's beliefs, and therefore their all-things-considered-including-deference beliefs).
I think if he had been a little more audience-aware he might have written it differently. Then again maybe not, if the net effect is more attention and investment in AI safety - and more recent posts and comments suggest he's more willing than before to use certain persuasive techniques to spur action (which seems potentially misguided to me, though understandable).
I think "deference alone" is a stronger claim than the one we should worry about. People might read the arguments on either side (or disproportionately Eliezer's arguments), but then defer largely to Eliezer's weighing of arguments because of his status/position, confidence, references to having complicated internal models (that he often doesn't explain or link explanations to), or emotive writing style.
What share of people with views similar to Eliezer's do you expect to have read these conversations? They're very long, not well organized, and have no summaries/takeaways. The format seems pretty bad if you value your time.
I think the AGI Ruin: A List of Lethalities post was formatted pretty accessibly, but that came after death with dignity.... (read more)
I appreciate this update!
I am confused about you bringing in the claim of "at each stage of his career", given that the only two examples you cited that seemed to provide much evidence here were from the same (and very early) stage of his career. Of course, you might have other points of evidence that point in this direction, but I did want to provide some additional pushback on the "at each stage of his career" point, which I think you didn't really provide evidence for.
I do think finding evidence for each stage of his career would of course be time-consuming, and I understand that you didn't really want to go through all of that, but it seemed good to point out explicitly.... (read more)
I really appreciated this update. Mostly it checks out to me, but I wanted to push back on this:
It seems to me that a good part of the beliefs I care about assessing are the beliefs about what is important. When someone has a track record of doing things with big positive impact, that's some real evidence that they have truth-tracking beliefs about what's important. In the hypothetical where Yudkowsky never published his work, I don't get the update that he thought these were important things to publish, so he doesn't get credit for being right about that.
Thanks for writing this update. I think my number one takeaway here is something like: when writing a piece with the aim of changing community dynamics, it's important to be very clear about motivations and context. E.g. I think a version of the piece which said "I think people are overreacting to Death with Dignity, here are my specific models of where Yudkowsky tends to be overconfident, here are the reasons why I think people aren't taking those into account as much as they should" would have been much more useful and much less controversial than the current piece, which (as I interpret it) essentially pushes a general "take Yudkowsky less seriously" meme (and is thereby intrinsically political/statusy).
Indeed, Broome co-supervised the doctoral theses of both Toby Ord and Will MacAskill. And Broome was, in fact, the person who advised Will to get in touch with Toby, before the two had met.
Speaking for myself, I was interested in a lot of the same things in the LW cluster (Bayes, approaches to uncertainty, human biases, utilitarianism, philosophy, avoiding the news) before I came across LessWrong or EA. The feeling is much more like "I found people who can describe these ideas well" than "oh these are interesting and novel ideas to me." (I had the same realization when I learned about utilitarianism...much more of a feeling that "this is the articulation of clearly correct ideas, believing otherwise seems dumb").
That said, some of the ideas on LW that seemed more original to me (AI risk, logical decision theory stuff, heroic responsibility in an inadequate world), do seem both substantively true and extremely important, and it took me a lot of time to be convinced of this.
(There are also other ideas that I'm less sure about, like cryonics and MW).
I'm a bit confused about a specific small part:
I imagine that for many people, including me (including you?), once we work on [what we believe to be] preventing the world from ending, we would only move to another job if it was also preventing the world from ending, probably in an even more important way.
In other words, I think "working at a 2nd x-risk job and believing it is very important" is mainly predicted by "working at a 1st x-risk job and believing it is very important", much more than by personality traits.
This is almost testable, given we have lots of people working on x-risk today and believing it is very important. But maybe you can easily put your finger on what I'm missing?
OTOH, I am (or I guess was?) a professional physicist, and when I read Rationality A-Z, I found that Yudkowsky was always reaching exactly the same conclusions as me whenever he talked about physics, including areas where (IMO) the physics literature itself is a mess—not only interpretations of QM, but also how to think about entropy & the 2nd law of thermodynamics, and, umm, I thought there was a third thing too but I forget.
That increased my respect for him quite a bit.
And who the heck am I? Granted, I can’t out-credential Scott Aaronson in QM. But FWIW, hmm let’s see, I had the highest physics GPA in my Harvard undergrad class and got the highest preliminary-exam score in my UC Berkeley physics grad school class, and I’ve played a major role in designing I think 5 different atomic interferometers (including an atomic clock) for various different applications, and in particular I was always in charge of all the QM calculations related to estimating their performance, and also I once did a semester-long (unpublished) research project on quantum computing with superconducting qubits, and also I have made lots of neat wikipedia QM diagrams and explanations including a pedag... (read more)
Hmm, I’m a bit confused where you’re coming from.
Suppose that the majority of eminent mathematicians believe 5+5=10, but a significant minority believes 5+5=11. Also, out of the people in the 5+5=10 camp, some say “5+5=10 and anyone who says otherwise is just totally wrong”, whereas other people said “I happen to believe that the balance of evidence is that 5+5=10, but my esteemed colleagues are reasonable people and have come to a different conclusion, so we 5+5=10 advocates should approach the issue with appropriate humility, not overconfidence.”
In this case, the fact of the matter is that 5+5=10. So in terms of who gets the most credit added to their track-record, the ranking is:
Agree so far?
(See also: Bayes’s theorem, Brier score, etc.)
Back to the issue here. Yudkowsky is claiming “MWI, and anyone who says otherwis... (read more)
It seems that half of these examples are from 15+ years ago, from a period for which Eliezer has explicitly disavowed his opinions (and the ones that are not strike me as most likely correct, like treating coherence arguments as forceful and that AI progress is likely to be discontinuous and localized and to require relatively little compute).
Let's go example-by-example:
1. Predicting near-term extinction from nanotech
This critique strikes me as about as sensible as digging up someone's old high-school essays and critiquing their stance on communism or the criminal justice system. I want to remind any reader that this is an opinion from 1999, when Eliezer was barely 20 years old. I am confident I can find crazier and worse opinions for every single leadership figure in Effective Altruism, if I am willing to go back to what they thought while they were in high-school. To give some character, here are some things I believed in my early high-school years:
... (read more)
- The economy was going to collapse because the U.S. was establishing a global surveillance state
- Nuclear power plants are extremely dangerous and any one of them is quite likely to explode in a given year
- We could have e
Just to note that the boldfaced part has no relevance in this context. The post is not attributing these views to present-day Yudkowsky. Rather, it is arguing that Yudkowsky's track record is less flattering than some people appear to believe. You can disavow an opinion that you once held, but this disavowal doesn't erase a bad prediction from your track record.
Hmm, I think that part definitely has relevance. Clearly we would trust Eliezer less if his response to that past writing was "I just got unlucky in my prediction, I still endorse the epistemological principles that gave rise to this prediction, and would make the same prediction, given the same evidence, today".
If someone visibly learns from forecasting mistakes they make, that should clearly update us positively on them not repeating the same mistakes.
I suppose one of my main questions is whether he has visibly learned from the mistakes, in this case.
For example, I wasn't able to find a post or comment to the effect of "When I was younger, I spent of years of my life motivated by the belief that near-term extinction from nanotech was looming. I turned out to be wrong. Here's what I learned from that experience and how I've applied it to my forecasts of near-term existential risk from AI." Or a post or comment acknowledging his previous over-optimistic AI timelines and what he learned from them, when formulating his current seemingly short AI timelines.
(I genuinely could be missing these, since he has so much public writing.)
Eliezer writes a bit about his early AI timeline and nanotechnology opinions here, though it sure is a somewhat obscure reference that takes a bunch of context to parse:... (read more)
Did Yudkowsky actually write these sentences?
If Yudkowsky thinks, as this suggests, that people in EA think or do things because he tells them to - this alone means it's valuable to question whether people give him the right credibility.
I don't see how he has encouraged people to pay attention to forecasting track records. People who have encouraged that norm make public bets or go on public forecasting platforms and make predictions about questions that can resolve in the short term. Bryan Caplan does this; I think greg Lewis and David Manheim are superforecasters.
I thought the upshot of this piece and the Jotto post was that Yudkowsky is in fact very dismissive of people who make public forecasts. "I consider naming particular years to be a cognitively harmful sort of activity; I have refrained from trying to translate my brain's native intuitions about this into probabilities, for fear that my verbalized probabilities will be stupider than my intuitions if I try to put weight on them." This seems like the opposite of encouraging people to pay attention to forecasting but is rather dismissing the whole enterprise of forecasting.
I wanted to make sure I'm not missing something, since this shines a negative light about him IMO.
There's a difference between saying, for example, "You can't expect me to have done X then - nobody was doing it, and I haven't even written about it yet, nor was I aware of anyone else doing so" - and saying "... nobody was doing it because I haven't told them to."
This isn't about credit. It's about self-perception and social dynamics.
More than Philip Tetlock (author of Superforecasting)?
Does that particular quote from Yudkowsky not strike you as slightly arrogant?
FWIW I think "it was 20 years ago" is a good reason not to take these failed predictions too seriously, and "he has disavowed these predictions after seeing they were false" is a bad reason to take them unseriously.
On 1 (the nanotech case):
I think your comment might give the misimpression that I don't discuss this fact in the post or explain why I include the case. What I write is:
An addition reason why I think it's worth distinguishing between his... (read more)
One quick response, since it was easy (might respond more later):
I do think takeoff speeds between 1 week and 10 years are a core premise of the classic arguments. I do think the situation looks very different if we spend 5+ years in the human domain, but I don't think there are many who believe that that is going to happen.
I don't think the distinction between 1 week and 1 year is that relevant to the core argument for AI Risk, since it seems in either case more than enough cause for likely doom, and that premise seems very likely to be true to me. I do think Eliezer believes things more on the order of 1 week than 1 year, but I don't think the basic argument structure is that different in either case (though I do agree that the 1 year opens us up to some more potential mitigating strategies).
My impression is the post is somewhat unfortunate attempt to "patch" the situation in which many generically too trusting people updated a lot on AGI Ruin: A List of Lethalities and Death with Dignity and subsequent deference/update cascades.
In my view the deeper problem here is instead of disagreements about model internals, many of these people do some sort of "averaging conclusions" move, based on signals like seniority, karma, vibes, etc.
Many of these signals are currently wildly off from truth-tracking, so you get attempts to push the conclusion-updates directly.
This is really minor and nitpicky, and I agree with much of your overall points, but I don't think equivocating between "barely 20" and "early high-school" is fair. The former is a normal age to be a third-year university student in the US, and plenty of college-age EAs are taken quite seriously by the rest of us.
Oh, hmm, I think this is just me messing up the differences between the U.S. and german education systems (I was 18 and 19 in high-school, and enrolled in college when I was 20).
I think the first quote on nanotechnology was actually written in 1996 originally (though was maybe updated in 1999). Which would put Eliezer at ~17 years old when he wrote that.
The second quote was I think written in more like 2000, which would put him more in the early college years, and I agree that it seems good to clarify that.
To clarify, what I said was:
Then I listed a bunch of ways in which the world looks more like Robin's predictions, particularly regarding continuity and locality. I said Robin's predictions about AI timelines in particular looked bad. This isn't closely related to the topic of your section 3, where I mostly agree with the OP.
Not sure why this is on EAF rather than LW or maybe AF, but anyway. I find this interesting to look at because I have been following Eliezer's work since approximately 2003 on SL4, and so I remember this firsthand, as it were. I disagree with several of the evaluations here (but of course agree with several of the others - I found the premise of Flare to be ludicrous at the time, and thankfully, AFAICT, pretty much zero effort went into that vaporware*):
calling LOGI and related articles 'wrong' because that's not how DL looks right now is itself wrong. Yudkowsky has never said that DL or evolutionary approaches couldn't work, or that all future AI work would look like the Bayesian program and logical approach he favored; he's said (consistently since at least SL4 that I've observed) that they would be extremely dangerous when they worked, and extremely hard to make safe to the high probability that we need them to when deployed to the real world indefinitely and unboundedly and self-modifyingly, and that rigorous program-proof approaches which can make formal logical guarantees of 100% safety are what are necessary and must deal with the issues and concepts discussed in LOGI. I
It's not accurate that the key ideas of Superintelligence came to Bostrom from Eliezer, who originated them. Rather, at least some of the main ideas came to Eliezer from Nick. For instance, in one message from Nick to Eliezer on the Extropians mailing list, dated to Dec 6th 1998, inline quotations show Eliezer arguing that it would be good to allow a superintelligent AI system to choose own its morality. Nick responds that it's possible for an AI system to be highly intelligent without being motivated to act morally. In other words, Nick explains to Eliezer an early version of the orthogonality thesis.
Nick was not lagging behind Eliezer on evaluating the ideal timing of a singularity, either - the same thread reveals that they both had some grasp of the issue. Nick said that the fact that 150,000 people die per day must be contextualised against "the total number of sentiences that have died or may come to live", foreshadowing his piece on Astronomical Waste, that would be published five years later. Eliezer said that having waited billions of years, the probability of a... (read more)
I think chapter 4, The Kinetics of an Intelligence Explosion, has a lot of terms and arguments from EY's posts in the FOOM Debate. (I've been surprised by this in the past, thinking Bostrom invented the terms, then finding things like resource overhangs getting explicitly defined in the FOOM Debate.)
Thanks for the comment! A lot of this is useful.
I mainly have the impression that LOGI and related articles were probably "wrong" because, so far as I've seen, nothing significant has been built on top of them in the intervening decade-and-half (even though LOGI's successor was seemingly predicted to make it possible for a small group to build AGI). It doesn't seem like there's any sign that these articles were the start of a promising path to AGI that was simply slower than the deep learning path.
I have had the impression, though, that Yudkowsky also thought that logical/Bayesian approaches were in general more powerful/likely-to-enable-near-term-AGI (not just less safe) than DL. It's totally possible this is a misimpression - and I'd be inclined to trust your impression over mine, since you've read more of his old writing than I have. (I'd also be interested if you happen to have any links handy.) But I'm not... (read more)
A general reflection: I wonder if one at least minor contributing factor to disagreement, around whether this post is worthwhile, is different understandings about who the relevant audience is.
I mostly have in mind people who have read and engaged a little bit with AI risk debates, but not yet in a very deep way, and would overall be disinclined to form strong independent views on the basis of (e.g.) simply reading Yudkowsky's and Christiano's most recent posts. I think the info I've included in this post could be pretty relevant to these people, since in practice they're often going to rely a lot -- consciously or unconsciously; directly or indirectly -- on cues about how much weight to give different prominent figures' views. I also think that the majority of members of the existential risk community are in this reference class.
I think the info in this post isn't nearly as relevant to people who've consumed and reflected on the relevant debates very deeply. The more you've engaged with and reflected on an issue, the less you should be inclined to defer -- and therefore the less relevant track records become.
(The limited target audience might be something I don't do a good enough job communicating in the post.)
I think that insofar as people are deferring on matters of AGI risk etc., Yudkowsky is in the top 10 people in the world to defer to based on his track record, and arguably top 1. Nobody who has been talking about these topics for 20+ years has a similarly good track record. If you restrict attention to the last 10 years, then Bostrom does and Carl Shulman and maybe some other people too (Gwern?), and if you restrict attention to the last 5 years then arguably about a dozen people have a somewhat better track record than him.
(To my knowledge. I think I'm probably missing a handful of people who I don't know as much about because their writings aren't as prominent in the stuff I've read, sorry!)
He's like Szilard. Szilard wasn't right about everything (e.g. he predicted there would be a war and the Nazis would win) but he was right about a bunch of things including that there would be a bomb, that this put all of humanity in danger, etc. and importantly he was the first to do so by several years.
I think if I were to write a post cautioning people against deferring to Yudkowsky, I wouldn't talk about his excellent track record but rather about his arrogance, inability to clearly... (read more)
Fwiw I'd say this somewhat differently.
I object to a specific way in which one could use coherence arguments to support AI risk: namely, "AI is intelligent --> AI satisfies coherence arguments better than we do --> AI looks as though it is maximizing a utility function from our perspective --> Convergent instrumental subgoals --> Doom".
As far as I know, anyone who has spent ~an hour reading my post and thinking about it basically agrees with that particular narrow point.
This doesn't rule out other ways that one could use coherence arguments to support AI risk, such as "coherence arguments show that achieving stuff can typically be factored into beliefs about the world and goals that you want to achieve; since we'll be building AIs to achieve stuff, it seems likely they'll work by having separated beliefs and goals; if they have bad goals, then we die because of convergent instrumental subgoals". I'm more sympathetic to this argument (though not nearly as much as Eliezer appears to be).
I agree that the intro talk that you link to would likely cause people to think... (read more)
Thank you for writing this, Ben. I think the examples are a helpful and I plan to read more about several of them.
With that in mind, I'm confused about how to interpret your post and how much to update on Eliezer. Specifically, I find it pretty hard to assess how much I should update (if at all) given the "cherry-picking" methodology:
If you were apply this to any EA thought leader (or non-EA thought leader, for that matter), I strongly suspect you'd find a lot clearcut and disputable examples of them being wrong on important things.
As a toy analogy, imagine that Alice is widely-considered to be extremely moral. I hire an investigator to find as many examples of Alice doing Bad Things as possible. I then publish my list of Bad Things that Alice has done. And I tell people "look-- Alice... (read more)
I think the effect should depend on your existing view. If you've always engaged directly with Yudkowsky's arguments and chose the ones convinced you, there's nothing to learn. If you thought he was a unique genius and always assumed you weren't convinced of things because he understood things you didn't know about, and believed him anyway, maybe it's time to dial it back. If you'd always assumed he's wrong about literally everything, it should be telling for you that OP had to go 15 years back for good examples.
Writing this comment actually helped me understand how to respond to the OP myself.
The negative reactions to this post are disheartening. I have a degree of affectionate fondness for the parodic levels of overthinking that characterize the EA community, but here you really see the downsides of that overthinking concretely.
Of course it is meaningful that Eliezer Yudkowsky has made a bunch of terrible predictions in the past that closely echo predictions he continues to make in slightly different form today. Of course it is relevant that he has neither owned up to those earlier terrible predictions or explained how he has learned from those mistakes. Of course we should be more skeptical of similar claims he makes in the future. Of course we should pay more attention to broader consensus or aggregate predictions in the field than in outlier predictions.
This is sensible advice in any complex domain, and saying that we should "evaluate every argument in isolation on its merits" is a type of special pleading or sophistry. Sometimes (often!) the obvious conclusions are the correct ones: even extraordinarily clever people are often wrong; extreme claims that other knowledgeable experts disagree with are often wrong; and people who make extreme claims that prove to... (read more)
I assume you're mainly talking about young-Eliezer worrying about near-term risk from molecular nanotechnology, and current-Eliezer worrying about near-term risk from AGI?
I think age-17 Eliezer was correct to think widespread access to nanotech would be extremely dangerous. See my comment. If you or Ben disagree, why do you disagree?
Age-20 Eliezer was obviously wrong about the timing for nanotech, and this is obviously Bayesian evidence for 'Eliezer may have overly-aggressive tech timelines in general'.
I don't think this is generally true -- e.g., if you took a survey of EAs worried about AI risk in 2010 or in 2014, I suspect Eliezer would have longer AI timelines than others at the time. (E.g., he expected it to take longer to solve Go than Carl Shulman did.) When I joined MIRI, the standard way we summarized MIRI's view was roughly 'We think AI risk is high, but not because we think AGI is imminent; rather, our worry is that alignment is likely to take a long time, and that civilization may need ... (read more)
I'm not sure I can argue for this, but it feels weird and off-putting to me that all this energy is being spent discussing how good a track-record one guy has, especially one guy with a very charismatic and assertive writing-style, and a history of attempting to provide very general guidance for how to think across all topics (though I guess any philosophical theory of rationality does the last thing.) It just feels like a bad sign to me, though that could just be for dubious social reasons.
The question of how much to defer to E.Y. isn't answered just by things like "he has possibly the best track record in the world on this issue." If he's out of step with other experts, and by a long way, we need to have reason to think he outperforms the aggregate of experts before we weight him more than the aggregate and it's entirely normal, I'd have thought, for the aggregate to significantly outperform the single best individual. (I'm not making as strong a claim as that the best individual outperforming the aggregate is super-unusual and unlikely.) Of course if you think he's nearly as good as the aggregate, then you should still move a decent amount in his directi
Some off-topic comments, not specific to you or Yudkowsky:
It seems to me (but I could be mistaken) like I see the phrase "has thought a lot about X" fairly often in EA contexts, where it is taken to imply being very well-informed about X. I don't think this is good reasoning. Thinking about something is probably required for understanding it well, but is certainly not enough.
When an idea or theory is very fringe, there's a strong selection effect for people in the relevant intellectual community. This means even their average views are sometimes not good evidence for something. For example, to answer a question about the probability of doom from AI in this century, are alignment researchers a good reference class? They all naturally believe AI is an existential risk to begin with. I'm not sure I have the solution, since "AI researchers in general" isn't a good reference class either - many might have not given any thought to whether AI is dangerous.
I like that you admit that your examples are cherry-picked. But I'm actually curious what a non-cherry-picked track record would show. Can people point to Yudkowsky's successes? What did he predict better than other people? What project did MIRI generate that either solved clearly interesting technical problems or got significant publicity in academic/AI circles outside of rationalism/EA? Maybe instead of a comment here this should be a short-form question on the forum.
While he's not single-handedly responsible, he lead the movement to take AI risk seriously at a time when approximately no one was talking about it, which has now attracted the interests of top academics. This isn't a complete track record, but it's still a very important data-point. It's a bit like if he were the first person to say that we should take nuclear war seriously, and then five years later people are starting to build nuclear bombs and academics realize that nuclear war is very plausible.
I work at MIRI, but as usual, this comment is me speaking for myself, and I haven’t heard from Eliezer or anyone else on whether they'd agree with the following.
My general thoughts:
... (read more)
- The primary things I like about this post are that (1) it focuses on specific points of disagreement, encouraging us to then hash out a bunch of object-level questions; and (2) it might help wake some people from their dream if they hero-worship Eliezer, or if they generally think that leaders in this space can do no wrong.
- By "hero-worshipping" I mean a cognitive algorithm, not a set of empirical conclusions. I'm generally opposed to faux egalitarianism and the Modest-Epistemology reasoning discussed in Inadequate Equilibria: if your generalized anti-hero-worship defenses force the conclusion that there just aren't big gaps in skills or knowledge (or that skills and knowledge always correspond to mainstream prestige and authority), then your defenses are ruling out reality a priori. In saying "people need to hero-worship Eliezer less", I'm opposing a certain kind of reasoning process and mindset, not a specific factual belief like "Eliezer is the clearest thinker about AI risk".
In a sense, I want to prom
I didn't see the "my own guess" part in the linked document (or the archived version), but it seems to be visible here, was probably edited between 2001 and 2004. Mentioned it in case others are confused after trying to find the quote in context.
I read this post kind of quickly, so apologies if I'm misunderstanding. It seems to me that this post's claim is basically:
I think this is dismissing a different (and much more likely IMO) possibility, which is that Eliezer's arguments were good, and people updated based on the strength of the arguments.
(Even if his recent posts didn't contain novel arguments, the arguments still could have been novel to many readers.)
Perhaps also relevant, though it isn’t forecasting, is Eliezer’s weak (in my opinion) attempted takedown of Ajeya Cotra’s bioanchors report on AI timelines. Here’s Eliezer’s bioanchors takedown attempt, here’s Holden Karnofsky’s response to Eliezer, and here’s Scott Alexander’s response.
I'm confused by the fact Eliezer's post was posted on April Fool's day. To what extent does that contribute to conscious exaggeration on his part?
As someone not active in the field of AI risk, and having always used epistemic deference quite heavily, this feels very helpful. I hope it doesn't end up reducing society's efforts to stop AI from taking over the world some day.
On the contrary, my best guess is that the “dying with dignity” style dooming is harming the community’s ability to tackle AI risk as effectively as it otherwise could
I agree with many of the comments here that this is overall a bit unfair, and there are good reasons to take Yudkowsky seriously even if you don't automatically accept his self-expressed level of confidence.
My main criticism of Yudkowsky is that he has many innovative/somewhat compelling ideas, but even with many years and a research institution their evolution has been unsatisfying. Many of them are still imprecise, and some of those that are precise(ish) are not satisfactory (e.g the orthogonality thesis, mesa-optimizers). Furthermore, he still doesn't seem very interested in improving this situation.
Almost all of this seems reasonable. But:
I don't think we should update based on this, or eg on the fact that we didn't go extinct due to nanotechnology, because anthropics / observer selection. (We should only update based on whether we think the reasons for those beliefs were bad.)
Suppose you've been captured by some terrorists and you're tied up with your friend Eli. There is a device on the other side of the room you that you can't quite make out. Your friend Eli says that he can tell (he's 99% sure) it is a bomb and that it is rigged to go off randomly. Every minute, he's confident there's a 50-50 chance it will explode, killing both of you. You wait a minute and it doesn't explode. You wait 10. You wait 12 hours. Nothing. He starts eying the light fixture, and say's he's pretty sure there's a bomb there too. You believe him?
No, my survival for 12 hours is evidence against Eli being correct about the bomb.
So: oops, I think.
I think posts like this better open with "but consider forming your own opinions rather than relying on experts"
I prefer to just analyse and refute his concrete arguments on the object level.
I'm not a fan of engaging the person of the arguer instead of their arguments.
Granted, I don't practice epistemic deference in regards to AI risk (so I'm not the target audience here), but I'm really not a fan of this kind of post. It rubs me the wrong way.
Challenging someone's overall credibility instead of their concrete arguments feels like bad form and [logical rudeness] (https://www.lesswrong.com/posts/srge9MCLHSiwzaX6r/logical-rudeness).
I wish EAs did not engage in such be... (read more)
I agree that work analyzing specific arguments is, overall, more useful than work analyzing individual people's track records. Personally, partly for that reason, I've actually done a decent amount of public argument analysis (e.g. here, here, and most recently here) but never written a post like this before.
Still, I think, people do in practice tend to engage in epistemic deference. (I think that even people who don't consciously practice epistemic deference tend to be influenced by the views of people they respect.) I also think that people should practice some level of epistemic deference, particularly if they're new to an area. So - in that sense - I think this kind of track record analysis is still worth doing, even if it's overall less useful than argument analysis.
(I hadn't seen this reply when I made my other reply).
What do you think of legitimising behaviour that calls out the credibility of other community members in the future?
I am worried about displacing the concrete object level arguments as the sole domain of engagement. A culture in which arguments cannot be allowed to stand by themselves. In which people have to be concerned about prior credibility, track record and legitimacy when formulating their arguments...
It feels like a worse epistemic culture.
I think I roughly agree with you on this point, although I would guess I have at least a somewhat weaker version of your view. If discourse about people's track records or reliability starts taking up (e.g.) more than a fifth of the space that object-level argument does, within the most engaged core of people, then I do think that will tend to suggest an unhealthy or at least not-very-intellectually-productive community.
One caveat: For less engaged people, I do actually think it can make sense to spend most of your time thinking about questions around deference. If I'm only going to spend ten hours thinking about nanotechnology risk, for example, then I might actually want to spend most of this time trying to get a sense of what different people believe and how much weight I should give their views; I'm probably not going to be able to make a ton of headway getting a good gears-level-understanding of the relevant issues, particularly as someone without a chemistry or engineering background.
> I do not want an epistemic culture that finds it acceptable to challenge an individuals overall credibility in lieu of directly engaging with their arguments.
I think it's fair to talk about a person's lifetime performance when we are talking about forecasting. When we don't have the expertise ourselves, all we have to go on is what little we understand and the track records of the experts we defer to. Many people defer to Eliezer so I think it's a service to lay out his track record so that we can know how meaningful his levels of confidence and special insights into this kind of problem are.
Here is a schematic (link below) that I starte... (read more)