Ben Garfinkel

Ben Garfinkel - Researcher at Future of Humanity Institute

Topic Contributions

Comments

How likely is World War III?

Let’s call the hypothesis that the base rate of major wars hasn’t changed the constant risk hypothesis. The best presentation of this view is in Only the Dead, a book by an IR professor with the glorious name of Bear Braumoeller. He argues that there is no clear trend in the average incidence of several measures of conflict—including uses of force, militarized disputes, all interstate wars, and wars between “politically-relevant dyads”—between 1800 and today.

A quick note on Braumoeller's analysis:

He's relying on the Correlates of War (COW) dataset, which is extremely commonly used but (in my opinion) more problematic than his book indicates. As a result, I don't think we should take his main finding too seriously.

The COW dataset is meant to record all "militarized disputes" between states since 1816. However, it uses a really strange standard for what counts as a "state." If I remember correctly, up until WW1, a political entity only qualifies as a "state" if it has a sufficiently high-level diplomatic presence in England or France. As a result, in 1816, there are supposedly only two non-European states: Turkey and the US. If I remember correctly, even an obvious state like China doesn't get classified as a "state" until after the Opium Wars. The dataset only really becomes properly global sometime in the 20th century century.

This means that Braumoeller is actually comparing (A) the rate of intra-European conflict in the first half of the 19th century and (B) the global rate of interstate conflict in the late 20th century.

This 19th-century-Europe-vs.-20th-century-world comparison is interesting, and suggestive, but isn't necessarily all that informative. Europe was almost certainly, by far, the most conflict-free part of the world at the start of the 19th century -- so I strongly expect that the actual global rate of conflict in the early 19th century was much higher.

It's also important that the COW dataset begins in 1816, at the very start of a few-decade period that was -- at the time -- marvelled over as the most peaceful in all of European history. This period was immediately preceded by two decades of intense warfare involving essentially all the states in Europe.

So, in summary: I think Braumoeller's analysis would probably show a long-run drop in the rate of conflict if the COW dataset was either properly global or went back slightly further in time. (Which is good news!)


EDIT: Here's a bit more detail, on the claim that the COW dataset can't tell us very much about long-run trends in the global rate of interstate conflict.

From the COW documentation, these are the criteria for state membership:

The Correlates of War project includes a state in the international system from 1816-2016 for the following criteria. Prior to 1920, the entity must have had a population greater than 500,000 and have had diplomatic missions at or above the rank of charge d’affaires with Britain and France. After 1920, the entity must be a member of the League of Nations or the United Nations, or have a population greater than 500,000 and receive diplomatic missions from two major powers.

As a result, the dataset starts out assuming that only 23 states existed in 1816. For reference, they're: Austria-Hungary, Baden, Bavaria, Denmark, France, Germany, Hesse Electoral, Hesse Grand Ducal, Italy, Netherlands, Papal States, Portugal, Russia, Saxony, Two Sicilies, Spain, Sweden, Switzerland, Tuscany, United Kingdom, USA, Wuerttemburg, and Turkey.

An alternative dataset, the International Systems(s) Dataset, instead produces an estimate of 135 states by relaxing the criteria to (a) estimated population over 100,000, (b) "autonomy over a specific territory", and (c) "sovereignty that is either uncontested or acknowledged by the relevant international actors."

So - at least by these alternative standards - the COW dataset starts out considering only a very small portion (<20%) of the international system. We also have reason to believe that this portion of the international system was really unusually peaceful internally, rather than serving as a representative sample.

Democratising Risk - or how EA deals with critics

I'm not familiar with Zoe's work, and would love to hear from anyone who has worked with them in the past. After seeing the red flags mentioned above,  and being stuck with only Zoe's word for their claims, anything from a named community member along the lines of "this person has done good research/has been intellectually honest" would be a big update for me…. [The post] strikes me as being motivated not by a desire to increase community understanding of an important issue, but rather to generate sympathy for the authors and support for their position by appealing to justice and fairness norms. The other explanation is that this was a very stressful experience, and the author was simply venting their frustrations.

(Hopefully I'm not overstepping; I’m just reading this thread now and thought someone ought to reply.)

I’ve worked with Zoe and am happy to vouch for her intentions here; I’m sure others would be as well. I served as her advisor at FHI for a bit more than a year, and have now known her for a few years. Although I didn’t review this paper, and don’t have any detailed or first-hand knowledge of the reviewer discussions, I have also talked to her about this paper a few different times while she’s been working on it with Luke.

I’m very confident that this post reflects genuine concern/frustration; it would be a mistake to dismiss it as (e.g.) a strategy to attract funding or bias readers toward accepting the paper’s arguments. In general, I’m confident that Zoe genuinely cares about the health of the EA and existential risk communities and that her critiques have come from this perspective.

Why AI alignment could be hard with modern deep learning

FWIW, I haven't had this impression.

Single data point: In the most recent survey on community opinion on AI risk, I was in at least the 75th percentile for pessimism (for roughly the same reasons Lukas suggests below). But I'm also seemingly unusually optimistic about alignment risk.

I haven't found that this is a really unusual combo: I think I know at least a few other people who are unusually pessimistic about 'AI going well,' but also at least moderately optimistic about alignment.

(Caveat that my apparently higher level of pessimism could also be explained by me having a more inclusive conception of "existential risk" than other survey participants.)

All Possible Views About Humanity's Future Are Wild

Thanks for the clarification! I still feel a bit fuzzy on this line of thought, but hopefully understand a bit better now.

At least on my read, the post seems to discuss a couple different forms of wildness: let’s call them “temporal wildness” (we currently live at an unusually notable time) and “structural wildness” (the world is intuitively wild; the human trajectory is intuitively wild).[1]

I think I still don’t see the relevance of “structural wildness,” for evaluating fishiness arguments. As a silly example: Quantum mechanics is pretty intuitively wild, but the fact that we live in a world where QM is true doesn’t seem to substantially undermine fishiness arguments.

I think I do see, though, how claims about temporal wildness might be relevant. I wonder if this kind of argument feels approximately right to you (or to Holden):

Step 1: A priori, it’s unlikely that we would live even within 10000 years of the most consequential century in human history. However, despite this low prior, we have obviously strong reasons to think it’s at least plausible that we live this close to the HoH. Therefore, let’s say, a reasonable person should assign at least a 20% credence to the (wild) hypothesis: “The HoH will happen within the next 10000 years.”

Step 2: If we suppose that the HoH will happen with the next 10000 years, then a reasonable conditional credence that this century is the HoH should probably be something like 1/100. Therefore, it seems, our ‘new prior’ that this century is the HoH should be at least .2*.01 = .002. This is substantially higher than (e.g.) the more non-informative prior that Will's paper starts with.

Fishiness arguments can obviously still be applied to the hypothesis presented in Step 1, in the usual way. But maybe the difference, here, is that the standard arguments/evidence that lend credibility to the more conservative hypothesis “The HoH will happen within the next 10000” are just pretty obviously robust — which makes it easier to overcome a low prior. Then, once we’ve established the plausibility of the more conservative hypothesis, we can sort of back-chain and use it to bump up our prior in the Strong HoH Hypothesis.


  1. I suppose it also evokes an epistemic notion of wildness, when it describes certain confidence levels as “wild,” but I take it that “wild” here is mostly just a way of saying “irrational”? ↩︎

All Possible Views About Humanity's Future Are Wild

To say a bit more here, on the epistemic relevance of wildness:

I take it that one of the main purposes of this post is to push back against “fishiness arguments,” like the argument that Will makes in “Are We Living at the Hinge of History?

The basic idea, of course, is that it’s a priori very unlikely that any given person would find themselves living at the hinge of history (and correctly recognise this). Due to the fallibility of human reasoning and due to various possible sources of bias, however, it’s not as unlikely that a given person would mistakenly conclude that they live at the HoH. Therefore, if someone comes to believe that they probably live at the HoH, we should think there’s a sizeable chance they’ve simply made a mistake.

As this line of argument is expressed in the post:

I know what you're thinking: "The odds that we could live in such a significant time seem infinitesimal; the odds that Holden is having delusions of grandeur (on behalf of all of Earth, but still) seem far higher."

The three critical probabilities here are:

  • Pr(Someone makes an epistemic mistake when thinking about their place in history)
  • Pr(Someone believes they live at the HoH|They haven’t made an epistemic mistake)
  • Pr(Someone believes they live at the HoH|They’ve made an epistemic mistake)

The first describes the robustness of our reasoning. The second describes the prior probability that we would live at the HoH (and be able to recognise this fact if reasoning well). The third describes the level of bias in our reasoning, toward the HoH hypothesis, when we make mistakes.

I agree that all possible futures are “wild,” in some sense, but I don’t think this point necessarily bears much on the magnitudes of any of these probabilities.

For example, it would be sort of “wild” if long-distance space travel turns out to be impossible and our solar system turns out to be the only solar system to ever harbour life. It would also be “wild” if long-distance space travel starts to happen 100,000 years from now. But — at least at a glance — I don’t see how this wildness should inform our estimates for the three key probabilities.

One possible argument here, focusing on the bias factor, is something like: “We shouldn’t expect intellectuals to be significantly biased toward the conclusion that they live at the HoH, because the HoH Hypothesis isn’t substantially more appealing, salient, etc., than other beliefs they could have about the future.”

But I don’t think this argument would be right. For example: I think the hypothesis “the HoH will happen within my lifetime” and the hypothesis “the HoH will happen between 100,000 and 200,000 years from now” are pretty psychologically different.

To sum up: At least on a first pass, I don't see why the point "all possible futures are wild" undermines the fishiness argument raised at the top of the post.

All Possible Views About Humanity's Future Are Wild

Some possible futures do feel relatively more "wild” to me, too, even if all of them are wild to a significant degree. If we suppose that wildness is actually pretty epistemically relevant (I’m not sure it is), then it could still matter a lot if some future is 10x wilder than another.

For example, take a prediction like this:

Humanity will build self-replicating robots and shoot them out into space at close to the speed of light; as they expand outward, they will construct giant spherical structures around all of the galaxy’s stars to extract tremendous volumes of energy; this energy will be used to power octillions of digital minds with unfathomable experiences; this process will start in the next thirty years, by which point we’ll already have transcended our bodies to reside on computers as brain emulation software.

A prediction like “none of the above happens; humanity hangs around and then dies out sometime in the next million years” definitely also feels wild in its own way. So does the prediction “all of the above happens, starting a few hundred years from now.” But both of these predictions still feel much less wild than the first one.

I suppose whether they actually are much less “wild” depends on one’s metric of wildness. I’m not sure how to think about that metric, though. If wildness is epistemically relevant, then presumably some forms of wildness are more epistemically relevant than others.

Taboo "Outside View"

I suspect you are more broadly underestimating the extent to which people used "insect-level intelligence" as a generic stand-in for "pretty dumb," though I haven't looked at the discussion in Mind Children and Moravec may be making a stronger claim.

I think that's good push-back and a fair suggestion: I'm not sure how seriously the statement in Nick's paper was meant to be taken. I hadn't considered that it might be almost entirely a quip. (I may ask him about this.)

Moravec's discussion in Mind Children is similarly brief: He presents a graph of the computing power of different animal's brains and states that "lab computers are roughly equal in power to the nervous systems of insects."He also characterizes current AI behaviors as "insectlike" and writes: "I believe that robots with human intelligence will be common within fifty years. By comparison, the best of today's machines have minds more like those of insects than humans. Yet this performance itself represents a giant leap forward in just a few decades." I don't think he's just being quippy, but there's also no suggestion that he means anything very rigorous/specific by his suggestion.

Rodney Brooks, I think, did mean for his comparisons to insect intelligence to be taken very seriously. The idea of his "nouvelle AI program" was to create AI systems that match insect intelligence, then use that as a jumping-off point for trying to produce human-like intelligence. I think walking and obstacle navigation, with several legs, was used as the main dimension of comparison. The Brooks case is a little different, though, since (IIRC) he only claimed that his robots exhibited important aspects of insect intelligence or fell just short insect intelligence, rather than directly claiming that they actually matched insect intelligence. On the other hand, he apparently felt he had gotten close enough to transition to the stage of the project that was meant to go from insect-level stuff to human-level stuff.

A plausible reaction to these cases, then, might be:

OK, Rodney Brooks did make a similar comparison, and was a major figure at the time, but his stuff was pretty transparently flawed. Moravec's and Bostrom's comments were at best fairly off-hand, suggesting casual impressions more than they suggest outcomes of rigorous analysis. The more recent "insect-level intelligence" claim is pretty different, since it's built on top of much more detailed analysis than anything Moravec/Bostrom did, and it's less obviously flawed than Brooks' analysis. The likelihood that it reflects an erroneous impression is, therefore, a lot lower. The previous cases shouldn't actually do much to raise our suspicion levels.

I think there's something to this reaction, particularly if there's now more rigorous work being done to operationalize and test the "insect-level intelligence" claim. I hadn't yet seen the recent post you linked to, which, at first glance, seems like a good and clear piece of work. The more rigorous work is done to flesh out the argument, the less I'm inclined to treat the Bostrom/Moravec/Brooks cases as part of an epistemically relevant reference class.

My impression a few years ago was that the claim wasn't yet backed by any really clear/careful analysis. At least, the version that filtered down to me seemed to be substantially based on fuzzy analogies between RL agent behavior and insect behavior, without anyone yet knowing much about insect behavior. (Although maybe this was a misimpression.) So I probably do stand by the reference class being relevant back then.

Overall, to sum up, my position here is something like: "The Bostrom/Moravec/Brooks cases do suggest that it might be easy to see roughly insect-level intelligence, if that's what you expect to see and you're relying on fuzzy impressions, paying special attention to stuff AI systems can already do, or not really operationalizing your claims. This should make us more suspicious of modern claims that we've recently achieved 'insect-level intelligence,' unless they're accompanied by transparent and pretty obviously robust reasoning. Insofar as this work is being done, though, the Bostrom/Moravec/Brooks cases become weaker grounds for suspicion."

Taboo "Outside View"

As a last thought here (no need to respond), I thought it might useful to give one example of a concrete case where: (a) Tetlock’s work seems relevant, and I find the terms “inside view” and “outside view” natural to use, even though the case is relatively different from the ones Tetlock has studied; and (b) I think many people in the community have tended to underweight an “outside view.”

A few years ago, I pretty frequently encountered the claim that recently developed AI systems exhibited roughly “insect-level intelligence.” This claim was typically used to support an argument for short timelines, since the claim was also made that we now had roughly insect-level compute. If insect-level intelligence has arrived around the same time as insect-level compute, then, it seems to follow, we shouldn’t be at all surprised if we get ‘human-level intelligence’ at roughly the point where we get human-level compute. And human-level compute might be achieved pretty soon.

For a couple of reasons, I think some people updated their timelines too strongly in response to this argument. First, it seemed like there are probably a lot of opportunities to make mistakes when constructing the argument: it’s not clear how “insect-level intelligence” or “human-level intelligence” should be conceptualised, it’s not clear how best to map AI behaviour onto insect behaviour, etc. The argument also hadn't yet been vetted closely or expressed very precisely, which seemed to increase the possibility of not-yet-appreciated issues.

Second, we know that there are previous of examples of smart people looking at AI behaviour and forming the impression that it suggests “insect-level intelligence.” For example, in Nick Bostrom’s paper “How Long Before Superintelligence?” (1998) he suggested that “approximately insect-level intelligence” was achieved sometime in the 70s, as a result of insect-level computing power being achieved in the 70s. In Moravec’s book Mind Children (1990), he also suggested that both insect-level intelligence and insect-level compute had both recently been achieved. Rodney Brooks also had this whole research program, in the 90s, that was based around going from “insect-level intelligence” to “human-level intelligence.”

I think many people didn’t give enough weight to the reference class “instances of smart people looking at AI systems and forming the impression that they exhibit insect-level intelligence” and gave too much weight to the more deductive/model-y argument that had been constructed.

This case is obviously pretty different than the sorts of cases that Tetlock’s studies focused on, but I do still feel like the studies have some relevance. I think Tetlock’s work should, in a pretty broad way, make people more suspicious of their own ability to perform to linear/model-heavy reasoning about complex phenomena, without getting tripped up or fooling themselves. It should also make people somewhat more inclined to take reference classes seriously, even when the reference classes are fairly different from the sorts of reference classes good forecasters used in Tetlock’s studies. I do also think that the terms “inside view” and “outside view” apply relatively neatly, in this case, and are nice bits of shorthand — although, admittedly, it’s far from necessary to use them.

This is the sort of case I have in the back of my mind.

(There are also, of course, cases that point in the opposite direction, where many people seemingly gave too much weight to something they classified as an "outside view." Early under-reaction to COVID is arguably one example.)

Taboo "Outside View"

Thank you (and sorry for my delayed response)!

I shudder at the prospect of having a discussion about "Outside view vs inside view: which is better? Which is overrated and which is underrated?" (and I've worried that this thread may be tending in that direction) but I would really look forward to having a discussion about "let's look at Daniel's list of techniques and talk about which ones are overrated and underrated and in what circumstances each is appropriate."

I also shudder a bit at that prospect.

I am sometimes happy making pretty broad and sloppy statements. For example: "People making political predictions typically don't make enough use of 'outside view' perspectives" feels fine to me, as a claim, despite some ambiguity around the edges. (Which perspectives should they use? How exactly should they use them? Etc.)

But if you want to dig in deep, for example when evaluating the rationality of a particular prediction, you should definitely shift toward making more specific and precise statements. For example, if someone has based their own AI timelines on Katja's expert survey, and they wanted to defend their view by simply evoking the principle "outside views are better than inside views," I think this would probably a horrible conversation. A good conversation would focus specifically on the conditions under which it makes sense to defer heavily to experts, whether those conditions apply in this particular case, etc. Some general Tetlock stuff might come into the conversation, like: "Tetlock's work suggests it's easy to trip yourself up if you try to use your own detailed/causal model of the world to make predictions, so you shouldn't be so confident that your own 'inside view' prediction will be very good either." But mostly you should be more specific.

Now I'll try to say what I think your position is:

  1. If people were using "outside view" without explaining more specifically what they mean, that would be bad and it should be tabood, but you don't see that in your experience
  2. If the things in the first Big List were indeed super diverse and disconnected from the evidence in Tetlock's studies etc., then there would indeed be no good reason to bundle them together under one term. But in fact this isn't the case; most of the things on the list are special cases of reference-class / statistical reasoning, which is what Tetlock's studies are about. So rather than taboo "outside view" we should continue to use the term but mildly prune the list.
  3. There may be a general bias in this community towards using the things on the first Big List, but (a) in your opinion the opposite seems more true, and (b) at any rate even if this is true the right response is to argue for that directly rather than advocating the tabooing of the term.

How does that sound?

I'd say that sounds basically right!

The only thing is that I don't necessarily agree with 3a.

I think some parts of the community lean too much on things in the bag (the example you give at the top of the post is an extreme example). I also think that some parts of the community lean too little on things in the bag, in part because (in my view) they're overconfident in their own abilities to reason causally/deductively in certain domains. I'm not sure which is overall more problematic, at the moment, in part because I'm not sure how people actually should be integrating different considerations in domains like AI forecasting.

There also seem to be biases that cut in both directions. I think the 'baseline bias' is pretty strongly toward causal/deductive reasoning, since it's more impressive-seeming, can suggest that you have something uniquely valuable to bring to the table (if you can draw on lots of specific knowledge or ideas that it's rare to possess), is probably typically more interesting and emotionally satisfying, and doesn't as strongly force you to confront or admit the limits of your predictive powers. The EA community has definitely introduced an (unusual?) bias in the opposite direction, by giving a lot of social credit to people who show certain signs of 'epistemic virtue.' I guess the pro-causal/deductive bias often feels more salient to me, but I don't really want to make any confident claim here that it actually is more powerful.

Ben Garfinkel's Shortform

I'm not sure if you think this is an interesting point to notice that's useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I'd agree with the former but disagree with the latter.

Mostly the former!

I think the point may have implications for how much we should prioritize alignment research, relative to other kinds of work, but this depends on what the previous version of someone's world model was.

For example, if someone has assumed that solving the 'alignment problem' is close to sufficient to ensure that humanity has "control" of its future, then absorbing this point (if it's correct) might cause them to update downward on the expected impact of technical alignment research. Research focused on coordination-related issues (e.g. cooperative AI stuff) might increase in value, at least in relative terms.

Load More