Holden Karnofsky describes the claims of his “Most Important Century” series as “wild” and “wacky”, but at the same time purports to be in the mindset of “critically examining” such “strange possibilities” with “as much rigour as possible”. This emphasis is mine, but for what is supposedly an important piece of writing in a field that has a big part of its roots in academic analytic philosophy, it is almost ridiculous to suggest that this examination has been carried out with 'as much rigour as possible'. My main reactions - which I will expand on this essay - are that Karnofsky’s writing is in fact distinctly lacking in rigour; that his claims are too vague or even seem to shift around; and that his writing style - often informal, or sensationalist - aggravates the lack of clarity while simultaneously putting the goal of persuasion above that of truth-seeking. I also suggest that his emphasis on the wildness and wackiness of his own "thesis" is tantamount to an admission of bias on his part in favour of surprising or unconventional claims.
I will start with some introductory remarks about the nature of my criticisms and of such criticism in general. Then I will spend some time trying to point to various instances of imprecision, bias, or confusion. And I will end by asking whether any of this even matters or what kind of lessons we should be drawing from it all.
Notes: Throughout, I will quote from the whole series of blog posts by treating them as a single source rather than referencing them separately. Note that the series appears in single pdf here (so one can always Ctrl/Cmd+F to jump to the part I am quoting).
It is plausible that some of this post comes across quite harshly but none of it is intended to constitute a personal attack on Holden Karnofsky or an accusation of dishonesty. Where I have made errors of have misrepresented others, I welcome any and all corrections. I also generally welcome feedback on the writing and presentation of my own thoughts either privately or in the comments.
Acknowledgements: I started this essay a while ago and so during the preparation of this work, I have been supported at various points by FHI, SERI MATS, BERI and Open Philanthropy. The development of this work benefitted significantly from numerous conversations with Jennifer Lin.
1. Broad Remarks About My Criticisms
If you felt and do feel convinced by Karnofsky's writings, then upon hearing about my reservations, your instinct may be to respond with reasonable-seeming questions like: 'So where exactly does he disagree with Karnofsky?' or 'What are some specific things that he thinks Karnofsky gets wrong?'. You may well want to look for wherever it is that I have carefully categorized my criticisms, to scroll through to find all of my individual object-level disagreements so that you can see if you know the counterarguments that mean that I am wrong. And so it may be frustrating that I will often sound like I am trying to weasel out of having to answer these questions head-on or not putting much weight on the fact that I have not laid out my criticisms in that way.
Firstly, I think that the main issues to do with clarity and precision that I will highlight occur at a fundamental level. It is not that they are 'more important' than individual, specific, object-level disagreements, but I claim that Karnofsky does a sufficiently poor job of explaining his main claims, the structure of his arguments, the dependencies between his propositions, and in separating his claims from the verifications of those claims, that it actually prevents detailed, in-depth discussions of object-level disagreements from making much sense. I also contend that this in itself is a rhetorical technique (and that Karnofsky is not the only person in the EA ecosystem that employs it). The principle example here is that without a clear notion of 'the importance' of a given century or clear criteria for what would in theory make a century 'the most important' (stated in a way that is independent of specific facts about this century), it is impossible for anyone to compare the importance of two different centuries or to evaluate whether or not this century meets the criteria. Thus it is impossible for a critic to precisely explain why Karnofsky's 'arguments' fail to show that this century meets the criteria.
Secondly, I'd invite you to consider a point made by Philip Trammell in his blog post But Have They Engaged with the Arguments? He considers the situation in which a claim is argued for via a long series of fuzzy inferences, each step of which seems plausible by itself. And he asks us to suppose that most people who try to understand the full argument will ‘drop out’ and reject it at some random step along the chain. Then:
Believers will then be in the extremely secure-feeling position of knowing not only that most people who engage with the arguments are believers, but even that, for any particular skeptic, her particular reason for skepticism seems false to almost everyone who knows its counterargument.
This suggests that when a lot of people disagree with the 'Believers', we should (perhaps begrudgingly, in practice) give weight to the popular disagreement, even when each person’s particular disagreement sounds incorrect to those who know all the counterarguments. The kicker - to paraphrase Trammell - is that although
They haven't engaged with the arguments, … there is information to be extracted from the very fact that they haven't bothered engaging with them.
I believe I am one of many such people in the present context, i.e. although I have at least taken the time to write this essay, it may seem to a true believer that I am not 'engaging with arguments' enough. But there is yet a wider context into which this fits, which forms my third point: To form detailed, specific criticisms of something that one finds to be vague and disorganized, one typically has to actually add clarity first. For example, by picking a precise characterization for a term that was left undefined, as MacAskill had to do in his Are we living at the hinge of history? essay when responding to Parfit, or by rearranging a set of points that were made haphazardly into an argument that is linear enough to be cleanly attacked. But I have not set out to do these things. In particular, if one finds the claims to be generally unconvincing and can't see a good route to bolstering them with better versions of the arguments, then it is hard to find motivation to do this.
Readers will no doubt notice that the previous two points apply to the AI x-risk argument itself (i.e. independently of most important century-type claims) and indeed, yes, I do sympathize with skeptics who are constantly told they are not engaging with the arguments. It often seems like they are being subjected to a device wherein their criticism is held to a higher standard of clarity and rigour than the original arguments.
In fact, I think much of this is part of a wider issue in which vague and confusing ‘research’ (in the EA/LessWrong/Alignment Forum space) often does not adequately receive the signal that it is in fact vague and confusing: It takes a lot of effort to produce high-quality criticisms of low-quality work; and if the work is so low-quality, then why would you want to bother? This lack of pushback can then be interpreted as a positive signal, i.e. a set of ideas or a research agenda can gain credibility from the fact it is being written about regularly without being publicly criticized, when actually part of the reason it isn’t being criticized enough is that it’s confusing or vague nature is putting people off bothering to try.
All this having been said, let's now start to turn to my more specific points.
2. Precision of the main claim
Philosophers or mathematicians will often go to many paragraphs carefully explaining a main claim (what it is, what it isn’t, what it does or doesn’t imply, what is stronger, what is weaker, giving examples etc.), but in my opinion, a sufficient unpacking of Karnofsky's central claim is absent. The version that appears early-on in the summary is just: “we could be in the most important century of all time for humanity”. The later post Some additional detail on what I mean by "most important century" then adds two "different senses" of what the phrase means. The following remarks apply to each of these.
The foremost thing that this is needed in order to consider any version of such a claim is: What is the criteria by which a given century can be shown to be the most important? One obvious way of doing this would be to define some quantity called 'the importance' of a given century and then to argue that this century has the greatest importance. To do this, we'd want to know: What measure of importance is being used to compare centuries and how can we estimate it? Or perhaps you can't actually compare some quantity called 'importance' but there is still some clear criteria which, if satisfied by a given century, would be sufficient to show that it were 'the most important century'. Neither of these things, nor any equivalently useful definition or operationalization is given.
Next: What exactly do we mean by “could”? Note that Karnofsky has elsewhere espoused the idea of assigning probabilities to claims as part of the “Bayesian mindset”, so could "could" refer to a quantifiable level of certainty (that he has happened to omit)? Probabilities do in fact appear elsewhere in later posts, e.g. he writes "a 15-30% chance that this is the "most important century" in one sense or another". But note that this is quite different from stating upfront what he claims the probability to be in the context of some framework and then demonstrating the truth of that claim (not to mention the addition of the qualification "in one sense or another"). Instead, the numbers are dropped in later, unsystematically, and we are left to suppose that however convinced we feel in the end is what he must have originally meant by “could”.
There are various rhetorical mechanisms that are facilitated by this lack of precision. Firstly, vague claims blur together many different precise claims. One can often phrase a vague claim in such a way that people more easily agree with the vague version than they would with many of the individual, more precise versions of the claim that have been lumped together. So, a wide set of people will comfortably feel that they more or less agree with the general sentiment of 'this could be the most important century' and will nod along, but it seems reasonable to suppose that any given detailed and specific version of the thesis that one might commit to would garner much more disagreement.
Secondly, the lack of precision allows the claim to take on different forms at different times, even in one reading, so that the claim can locally fit the arguments at hand and so that after the fact, it feels like it fits whatever it is that you've been convinced of. When you first see the main claim, you may not think too hard about the "could" or the fact that you don't have a precise notion of "most important" etc., but as you give the piece a charitable reading, your interpretation of the appropriate notion of 'importance' can readily shift and mould to fit the present argument or whatever subset of the arguments you actually find convincing.
Thirdly, in Some additional detail on what I mean by "most important century", Karnofsky states that the first possible meaning of the phrase is:
Meaning #1: Most important century of all time for humanity, due to the transition to a state in which humans as we know them are no longer the main force in world events.
We can guess that it might mean something like 'a century is the most important century if it is the case that in that century, there is a transition to a state in which....'. But the use of "due to" is a bit confusing. One reading is that he has started arguing for the claim while in the midst of defining his terms. And indeed, when he tries to expand on the meaning, he starts off by saying "Here the idea is that: During this century civilization could... " and "This century is our chance to shape just how this happens." I don't want to get too bogged down in trying to go through all of it line by line, but to summarize this point: There is no clean separation of the definition that underpins the main claim from the arguments in favour of that claim. The explanation of the criteria by which a century can be judged to be the most important is blurred together with the specific points about this century that will be used. Compare with: '4 is the most important number because, as I will demonstrate, it is equal to 2+2... '.
3. Expecting the Unexpected.
In Reasons and Persons (1984), Parfit wrote that " the next few centuries will be the most important in human history" and (according to MacAskill's essay Are we living at the hinge of history?), said much more recently in 2015: "I think that we are living now at the most critical part of human history...we may be living in the most critical part of the history of the universe".
We do not need to assume that Karnofsky has a clear notion of importance or one that matches Parfit's in order to bring one of MacAskill's main criticisms to bear. The point is that any reasonable operationalization of the main claim must contend with the very low base rate. We will not go into technical detail here (see MacAskill's essay for more discussion) but, for example, two reasonable ways of setting priors are using the self-sampling assumption or using a uniform prior of importance over centuries. In both cases, the prior probability that the claim is true is very low. Karnofsky does in fact write that he
has talked about civilization lasting for billions of years... so the prior probability of "most important century" is less than 1/10,000,000.
And to be fair to him, he adds that:
This argument feels like it is pretty close to capturing my biggest source of past hesitation about the "most important century" hypothesis.
But immediately afterwards, he falls back on a dubious sort of argument that we will now discuss in more depth:
However, I think there are plenty of markers that this is not an average century, even before we consider specific arguments about AI.
The emphasis is mine.
To dig into this a bit, consider also the following quotation:
When someone forecasts transformative AI in the 21st century… a common intuitive response is something like: "It's really out-there and wild to claim that transformative AI is coming this century. So your arguments had better be really good."
I think this is a very reasonable first reaction to forecasts about transformative AI (and it matches my own initial reaction). But… I ultimately don't agree with the reaction.
Note that this is different from saying 'If you disagree with the claim, then I disagree with you (because I agree with the claim)'. He is saying that he doesn’t agree with the reaction that the arguments in favour of his claim need to be really good. And why would he think this? Presumably it is because he believes that the prior probability for believing that transformative AI is coming this century is not low to begin with. Indeed, he goes on:
- I think there are a number of reasons to think that transformative AI - or something equally momentous - is somewhat likely this century, even before we examine details of AI research, AI progress, etc.
Again we see such phrasing - "even before we examine details of AI..." - that seems to suggest that these "reasons" are not so much part of the object-level arguments and evidence in favour of the claim, but are part of an overarching framing in which the prior probability for the claim is not too small. He continues:
- I also think that on the kinds of multi-decade timelines I'm talking about, we should generally be quite open to very wacky, disruptive, even revolutionary changes. With this backdrop, I think that specific well-researched estimates of when transformative AI is coming can be credible, even if they involve a lot of guesswork and aren't rock-solid.
The emphasis is mine. What is expressed here is in the same vein of either shifting the priors or somehow getting round the fact that they might be low. It ventures into a questionable argument that seems to say: Given how uncertain and difficult-to-predict everything is, maybe we should just generally be more open to unlikely things than we normally would? Maybe we should think of less-than-solid "guesswork" as more credible than we usually would?
I must point out that there is actually a sense in which something like this can be technically true: A specific extreme outcome may be more likely in some specific higher-variance worlds than in a given low-variance world. But without much more detailed knowledge or assumptions about what 'distributions' one is dealing with, one cannot pull of the kind of argument he is attempting here, which is to appeal to uncertainty in order to back-up predictions. And when talking about something like a date for the arrival of transformative AI, since such estimates can 'go either way', the high-variance "backdrop" being referred to makes it more important to have strong arguments aimed specifically at bounding the estimate from above.
He writes elsewhere, speaking about his main claims:
These claims seem too "wild" to take seriously. But there are a lot of reasons to think that we live in a wild time, and should be ready for anything
I know what you're thinking: "The odds that we could live in such a significant time seem infinitesimal; the odds that Holden is having delusions of grandeur (on behalf of all of Earth, but still) seem far higher."
This is exactly the kind of thought that kept me skeptical for many years of the arguments I'll be laying out in the rest of this series... [but]... Grappling directly with how "wild" our situation seems to ~undeniably be has been key for me.
There are further reasons to think this particular century is unusual. For example...
- The current economic growth rate can't be sustained for more than another 80 centuries or so.
Again, it is worth us pointing out that arguments of a not-dissimilar form can be valid and useful: For example, perhaps you want to argue for claim A and although P(A) is small, you know that P(A|B) is not small. So perhaps you are pointing out that we are a world where we already know that B holds - i.e. essentially that we can take P(B) = 1 - in which case P(A) is not the relevant quantity; but P(A|B) is.
However, once again we notice that that isn't what Karnofsky is doing. He is not making arguments of this form. For example, his point about the “current economic growth rate" is not developed into an argument that a century with a fast growth rate is necessarily one with an increased probability of the development of transformative AI. Time and again, this aspect of his overall argument seems only to say that the general situation we find ourselves in is so ‘special’ -"a wild time", "not... average", "a significant time" - that this permits things that were generally unlikely by default to now be much more likely "even before" considering the details of AI development. I can just about imagine the possibility of a very carefully argued-for framing in which the prior probability of the main claim is not small, but we do not find that here. This particular point of his ends up being not even wrong; it's the absence of a solid and relevant argument.
But notice that it does, however, work in favour of the persuasiveness of the writing. When one gives the piece a charitable reading and tries to parse all of the colour and added detail that could make this century special or "wild" (PASTA, economic productivity explosion, seeding a galaxy-wide civilization, digital people, misaligned AI, the prospect of having huge impact on future people ) one feels broadly like one is reading about things that seem to be characteristic or representative of a very important century. And one's uncertainty about how the whole argument actually fits together doesn't really surface unless one is challenged. But when we stop to think about it: Presumably certain collections of claims have to all hold simultaneously or certain chains of implications have to hold in order for the overall argument to work? And so - as Yudkowsky explained here - there's a form of the conjunction fallacy that creeps in here as a rhetorical device: "Adding detail can make a scenario sound more plausible, even though the event necessarily becomes less probable."
4. 60% of The Time, It Works Every Time.
When introducing a graph that shows the size of the global economy of the past 75 years or so, Karnofsky writes "When you think about the past and the future, you're probably thinking about something kind of like this:" and then after displaying the graph he follows up with:
I live in a different headspace, one with a more turbulent past and a more uncertain future.
Elsewhere later, he writes:
When some people imagine the future, they picture the kind of thing you see in sci-fi films. But… The future I picture is enormously bigger, faster, weirder,
He regularly chooses to emphasize how strange, weird or "wacky" his thesis is or that it has a "sci-fi feel”.
It sure sounds more sexy and exciting to share Karnofsky’s headspace than to remain in a sober, skeptical headspace. Would it not just be so much... well, cooler... if this were all true, rather than if we had to play the poindexter and say 'hmm, after careful consideration, this just seems too unlikely'? And do we not think, perhaps, that young, idealistic EAs, often arriving in this community as part of a secular search for meaning and purpose, find themselves motivated to share his views not by the weight of evidence and quality of argument in favour of the claims, but in order to get in on this exciting feeling of living in the most important time? - To 'get a piece of the action'?
But yet another purpose is served by this sort of language. In fact, with different wording, I'm sure many will recognize the format: "What I'm about to say might sound crazy, but...". It's a common rhetorical device, a type of expectation management intended to disarm our natural reaction of skepticism. By being reminded of something to the effect of 'You cannot reject the claim just because it sounds crazy', you are primed to be more receptive to the future arguments. I contend that if one wants to priotirize the rigorous consideration of a claim, then one has something of a duty to omit this kind of language. Truth-seeking is not the same as persuasion. A critical weighing of the arguments needs to be done dispassionately.
On a similar theme, one final specific point I want to draw attention to is seen in All Possible Views About Humanity's Future Are Wild, when Karnofsky writes:
Let's say you agree with me about where humanity could eventually be headed - that we will eventually have the technology to create robust, stable settlements throughout our galaxy and beyond. But you think it will take far longer than I'm saying.
You don't think any of this is happening this century - you think, instead...it will take 100,000 years.
He goes on to say that:
In the scheme of things, this "conservative" view and my view are the same.
Really? What does he mean that these things are the same "in the scheme of things"? I thought that the precise "scheme of things" was that this century were the most important? And it sort of sounded like the development of technology to create settlements throughout the galaxy was somehow part of the argument. He ends the post with:
the choices made in the next 100,000 years - or even this century - could determine whether that galaxy-scale civilization comes to exist, and what values it has, across billions of stars and billions of years to come.
He genuinely appears to not be narrowing down this particular point beyond a period of 1,000 centuries. This of course makes us wonder: Is there a sub-claim about technologies that lead to galaxy-scale civilizations that forms a necessary step in the overall argument? If there is, then we should be understandably confused as how it can not really matter in which of the next 1,000 centuries this technology emerges. If there is no such necessary sub-claim, then why are we spending so much time discussing and analyzing it and what then is the real structure of the argument?
This oddity is part of the fact that, apparently, Karnofsky doesn't place much weight on whether or not the titular claim is even true or not. At the end of the Some additional detail on what I mean by "most important century” post, he writes that if he is correct about the general picture of the near future that he has been describing but "wrong about the 'most important century' for some reason" then he'd "still think this series's general idea was importantly right".
Not only does this exemplify some of the rhetorical devices alluded to earlier ('If you think there are specific versions of the claim that are wrong, don't worry, just make it a bit more vague or change the claim a little bit until it seems right'), but could he really be ending the series by more or less retroactively absolving himself from ever having to have verified the main claim? Or is he indeed admitting that seeking out the truth or otherwise of the main claim was never even really his point? And if we weren't even trying to figure out if the main claim was true, then what were we doing? Perhaps he is sticking with the phrase partly because he's in the mindset of a fundraiser and overly accustomed to presenting an eye-catching or - dare we say - exaggerated version of a cause's importance, urgency, and worth in order 'sell' the idea of funding it. Indeed he does also write that he chose the phrase 'most important century'
as a wake-up call about how high the stakes seem to be.
and even that his "main intent" is just:
to call attention to the "Holy !@#$" feeling of possibly developing something like PASTA this century...
But he shows no signs of abandoning the phrase: As recently as a couple of months ago, he was writing about what we can do to "help with the most important century" or make it "go well", e.g. How major governments can help with the most important century. So I cannot help but feel that the quotations above act as a way of avoiding criticism. He is going to keep using the phrase with a straight face, but if you try to pin him down on it in order criticize it, the comeback is that he never really mean it in the first place (chill out, it's only meant "holistically").
5. Concluding Remarks
My first real contact with this community - i.e. the EA, rationalist, and Alignment Forum communities - was when I started on the Research Scholars Program at the Future of Humanity Institute a little more than two years ago. As part of one of the introductory meetings for the incoming cohort, and in order to stimulate a discussion session, we we'd been granted permission to read draft versions of some of Karnofsky's most important century posts. One of my reactions, I remember well. It was a feeling that has resurfaced on multiple occasions since and in multiple different contexts during my immersion into this community: One of incredulity that I was surrounded by ostensibly very smart people, in this case people had come from traditional degrees and were now technically members of the Oxford philosophy department, who seemed to be totally buying into an unusual worldview that was being argued for in only fairly lax, informal ways.
What I was witnessing then, and have witnessed many times since, was a level of deference that is not fully explained by an appeal to the expertise or track record of those or that which was being deferred to. In fact, I would describe it less as deference and more as a kind of susceptibility towards big, exciting, and unlikely claims that has its roots in the social and cultural forces of the community. Note here that the unlikeliness is part of the appeal: It sure makes you feel clever and important to be part of a club that (you believe) has a track record for uncovering urgent truths about the world that many other smart people have failed to see. But we must be wary of simply building this in to our identity, of internalizing it to the point that we are primed to accept certain kinds of arguments that are made by the right kind of people in the right sort of way.
But that is where my fixation with these posts stems from. For me, they became emblematic of the culture shock that I experienced and were the context for my first big disappointment with the intellectual and critical standards within the EA/LW/AF ecosystem. I will emphasize the distinction that I say 'critical' as well as 'intellectual' because it isn't just that I'd come across just any old content that I disagreed with or thought was low-quality, it was because it was content from a respected and powerful figure in the field and (perhaps through no fault of his own) too many others seemed to have given him too much benefit of the doubt and swallowed the message without doing him the honour of providing decent criticism.
Later, I saw people recommend the series of posts as a standard way for someone who is curious about AI Safety to get more of an idea about the way the community approaches the issue. Coming from an academic background myself, I would think: If I wanted to come across as a convincing, credible authority on this subject, I would be embarrassed to recommend this series of posts or to cite it as one of the sources of my own knowledge on the subject. And I continue to be frustrated by the fact that many of those doing the recommending or defending can be quite so oblivious to how off-putting this style of writing can be to a skeptical outsider, particularly when part of the assumption going in is that this subject lies between two paradigms of rather extreme intellectual achievement: One that exemplifies rigorous, analytic, thought - academic analytic philosophy - and one that exemplifies the bleeding edge of technical sophistication - the AI research labs of the Bay Area. We must not use the fact that our scholarship doesn't fully belong to either world as an excuse; we ought to be seeking the best of both worlds, with all that comes with it, including the holding of our work, even when or especially when it is speculative and unusual, to standards that are recognizable from both vantage points.