Ben Garfinkel's Shortform

27 comments, sorted by Highlighting new comments since Today at 3:22 PM
New Comment

A thought on epistemic deference:

The longer you hold a view, and the more publicly you hold a view, the more calcified it typically becomes. Changing your mind becomes more aversive and potentially costly, you have more tools at your disposal to mount a lawyerly defense, and you find it harder to adopt frameworks/perspectives other than your favored one (the grooves become firmly imprinted into your brain). At least, this is the way it seems and personally feels to me.[1]

For this reason, the observation “someone I respect publicly argued for X many years ago and still believes X” typically only provides a bit more evidence than the observation “someone I respect argued for X many years ago.” For example, even though I greatly respect Daron Acemoglu, I think the observation “Daron Acemoglu still believes that political institutions are the central determinant of economic growth rates” only gives me a bit more evidence than the observation “15 years ago Daron Acemoglu publicly argued that institutions are the central determinant of economic growth rates.”

A corollary: If there’s an academic field that contains a long-standing debate, and you’d like to defer to experts in this field, you may want to give disproportionate weight to the opinions of junior academics. They’re less likely to have responded to recent evidence and arguments in an epistemically inflexible way.


  1. Of course, there are exceptions. The final chapter of Scout Mindset includes a moving example of a professor publicly abandoning a view he had championed for fifteen years, after a visiting academic presented persuasive new evidence. The reason these kinds of stories are moving, though, is that they describe truly exceptional behavior. ↩︎

At least in software, there's a problem I see where young engineers are often overly bought-in to hype trains, but older engineers (on average) stick with technologies they know too much.

I would imagine something similar in academia, where hot new theories are over-valued by the young, but older academics have the problem you describe.

Good point!

That consideration -- and the more basic consideration that more junior people often just know less -- definitely pushes in the opposite direction. If you wanted to try some version of seniority-weighted epistemic deference, my guess is that the most reliable cohort would have studied a given topic for at least a few years but less than a couple decades.

A thought on how we describe existential risks from misaligned AI:

Sometimes discussions focus on a fairly specific version of AI risk, which involves humanity being quickly wiped out. Increasingly, though, the emphasis seems to be on the more abstract idea of “humanity losing control of its future.” I think it might be worthwhile to unpack this latter idea a bit more.

There’s already a fairly strong sense in which humanity has never controlled its own future. For example, looking back ten thousand years, no one decided that the sedentary agriculture would increasingly supplant hunting and gathering, that increasingly complex states would arise, that slavery would become common, that disease would take off, that social hierarchies and gender divisions would become stricter, etc. The transition to the modern world, and everything that came with this transition, also doesn’t seem to have been meaningfully chosen (or even really understood by anyone). The most serious effort to describe a possible future in detail — Hanson’s Age of Em — also describes a future with loads of features that most present-day people would not endorse.

As long as there are still strong competitive pressures or substantial random drift, it seems to me, no generation ever really gets to choose the future.[1] It's actually sort of ambiguous, then, what it means to worry about “losing control of our future."

Here are a few alternative versions of the concern that feel a bit crisper to me:

  1. If we ‘mess up on AI,’ then even the most powerful individual humans will have unusually little influence over their own lives or the world around them.[2]
  1. If we ‘mess up on AI,’ then future people may be unusually dissatisfied about the world they live in. In other words, people's preferences will be unfilled to an unusually large degree.

  2. Humanity may have a rare opportunity to take control of its own future, by achieving strong coordination and then locking various things in. But if we ‘mess up on AI,’ then we’ll miss out on this opportunity.[3]

Something that’s a bit interesting about these alternative versions of the concern, though, is that they’re not inherently linked to AI alignment issues. Even if AI systems behave roughly as their users intend, I believe each of these outcomes is still conceivable. For example, if there’s a missed opportunity to achieve strong coordination around AI, the story might look like the failure of the Baruch Plan for international control of nuclear weapons: that failure had much more to do with politics than it had to do with the way engineers designed the technology in question.

In general, if we move beyond discussing very sharp alignment-related catastrophes (e.g. humanity being quickly wiped out), then I think concerns about misaligned AI start to bleed into broader AI governance concerns. It starts to become more ambiguous whether technical alignment issues are actually central or necessary to the disaster stories people tell.


  1. Although, admittedly, notable individuals or groups (e.g. early Christians) do sometimes have a fairly lasting and important influence. ↩︎

  2. As an analogy, in the world of The Matrix, people may not actually have much less control over the long-run future than hunter-gatherers did twenty thousand years ago. But they certainly have much less control over their own lives. ↩︎

  3. Notably, this is only a bad thing if we expect the relevant generation of humans to choose a better future than would be arrived at by default. ↩︎

Another interpretation of the concern, though related to your (3), is that misaligned AI may cause humanity to lose the potential to control its future. This is consistent with humanity not having (and never having had) actual control of its future; it only requires that this potential exists, and that misaligned AI poses a threat to it.

I agree with most of what you say here.

[ETA: I now realize that I think the following is basically just restating what Pablo already suggested in another comment.]

I think the following is a plausible & stronger concern, which could be read as a stronger version of your crisp concern #3.

"Humanity has not had meaningful control over its future, but AI will now take control one way or the other. Shaping the transition to a future controlled by AI is therefore our first and last opportunity to take control. If we mess up on AI, not only have we failed to seize this opportunity, there also won't be any other."

Of course, AI being our first and only opportunity to take control of the future is a strictly stronger claim than AI being one such opportunity. And so it must be less likely. But my impression is that the stronger claim is sufficiently more important that it could be justified to basically 'wager' most AI risk work on it being true.

I agree with this general point. I'm not sure if you think this is an interesting point to notice that's useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I'd agree with the former but disagree with the latter.

I'm not sure if you think this is an interesting point to notice that's useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I'd agree with the former but disagree with the latter.

Mostly the former!

I think the point may have implications for how much we should prioritize alignment research, relative to other kinds of work, but this depends on what the previous version of someone's world model was.

For example, if someone has assumed that solving the 'alignment problem' is close to sufficient to ensure that humanity has "control" of its future, then absorbing this point (if it's correct) might cause them to update downward on the expected impact of technical alignment research. Research focused on coordination-related issues (e.g. cooperative AI stuff) might increase in value, at least in relative terms.

Do you have the intuition that absent further technological development, human values would drift arbitrarily far? It's not clear to me that they would-- in that sense, I do feel like we're "losing control" in that even non-extinction AI is enabling a new set of possibilities that modern-day humans would endorse much less than the decisions of future humans otherwise. (It does also feel like we're missing the opportunity to "take control" and enable a new set of possibilities that we would endorse much more.)

Relatedly, it doesn't feel to me like the values of humans 150,000 years ago and humans now and even ems in Age of Em are all that different on some more absolute scale.

Do you have the intuition that absent further technological development, human values would drift arbitrarily far?

Certainly not arbitrarily far. I also think that technological development (esp. the emergence of agriculture and modern industry) has played a much larger role in changing the world over time than random value drift has.

[E]ven non-extinction AI is enabling a new set of possibilities that modern-day humans would endorse much less than the decisions of future humans otherwise.

I definitely think that's true. But I also think that was true of agriculture, relative to the values of hunter-gatherer societies.

To be clear, I'm not downplaying the likelihood or potential importance of any of the three crisper concerns I listed. For example, I think that AI progress could conceivably lead to a future that is super alienating and bad.

I'm just (a) somewhat pedantically arguing that we shouldn't frame the concerns as being about a "loss of control over the future" and (b) suggesting that you can rationally have all these same concerns even if you come to believe that technical alignment issues aren't actually a big deal.

Wow, I just learned that Robin Hanson has written about this, because obviously, and he agrees with you.

And Paul Christiano agrees with me. Truly, time makes fools of us all.

FWIW, I wouldn't say I agree with the main thesis of that post.

However, while I expect machines that outcompete humans for jobs, I don’t see how that greatly increases the problem of value drift. Human cultural plasticity already ensures that humans are capable of expressing a very wide range of values. I see no obviously limits there. Genetic engineering will allow more changes to humans. Ems inherit human plasticity, and may add even more via direct brain modifications.

In principle, non-em-based artificial intelligence is capable of expressing the entire space of possible values. But in practice, in the shorter run, such AIs will take on social roles near humans, and roles that humans once occupied....

I don’t see why people concerned with value drift should be especially focused on AI. Yes, AI may accompany faster change, and faster change can make value drift worse for people with intermediate discount rates. (Though it seems to me that altruistic discount rates should scale with actual rates of change, not with arbitrary external clocks.)

I definitely think that human biology creates at least very strong biases toward certain values (if not hard constraints) and that AI system would not need to have these same biases. If you're worried about future agents having super different and bad values, then AI is a natural focal point for your worry.


A couple other possible clarifications about my views here:

  • I think that the outcome of the AI Revolution could be much worse, relative to our current values, than the Neolithic Revolution was relative to the values of our hunter-gatherer ancestors. But I think the question "Will the outcome be worse?" is distinct from the question "Will we have less freedom to choose the outcome?"

  • I'm personally not so focused on value drift as a driver of long-run social change. For example, the changes associated with the Neolithic Revolution weren't really driven by people becoming less egalitarian, more pro-slavery, more inclined to hold certain religious beliefs, more ideologically attached to sedentism/farming, more happy to accept risks from disease, etc. There were value changes, but, to some significant degree, they seem to have been downstream of technological/economic change.

Really appreciate the clarifications! I think I was interpreting "humanity loses control of the future" in a weirdly temporally narrow sense that makes it all about outcomes, i.e. where "humanity" refers to present-day humans, rather than humans at any given time period.  I totally agree that future humans may have less freedom to choose the outcome in a way that's not a consequence of alignment issues.

I also agree value drift hasn't historically driven long-run social change, though I kind of do think it will going forward, as humanity has more power to shape its environment at will.

I also agree value drift hasn't historically driven long-run social change
 

My impression is that the differences in historical vegetarianism rates between India and China, and especially India and southern China (where there is greater similarity of climate and crops used), is a moderate counterpoint. At the timescale of centuries, vegetarianism rates in India are much higher than rates in China. Since factory farming is plausibly one of the larger sources of human-caused suffering today, the differences aren't exactly a rounding error. 

That's a good example.

I do agree that quasi-random variation in culture can be really important. And I agree that this variation is sometimes pretty sticky (e.g. Europe being predominantly Christian and the Middle East being predominantly Muslim for more than a thousand years). I wouldn't say that this kind of variation is a "rounding error."

Over sufficiently long timespans, though, I think that technological/economic change has been more significant.

As an attempt to operationalize this claim: The average human society in 1000AD was obviously very different than the average human society in 10,000BC. I think that the difference would have been less than half as large (at least in intuitive terms) if there hadn't been technological/economic change.

I think that the pool of available technology creates biases in the sorts of societies that emerge and stick around. For large enough amounts of technological change, and long enough timespans (long enough for selection pressures to really matter), I think that shifts in these technological biases will explain a large portion of the shifts we see in the traits of the average society.[1]


  1. If selection pressures become a lot weaker in the future, though, then random drift might become more important in relative terms. ↩︎

Would you consider making this into a top-level post? The discussion here is really interesting and could use more attention, and a top-level post helps to deliver that (this also means the post can be tagged for greater searchability).

I think the top-level post could be exactly the text here, plus a link to the Shortform version so people can see those comments. Though I'd also be interested to see the updated version of the original post which takes comments into account (if you felt like doing that).

The O*NET database includes a list of about 20,000 different tasks that American workers currently need to perform as part of their jobs. I’ve found it pretty interesting to scroll through the list, sorted in random order, to get a sense of the different bits of work that add up to the US economy. I think anyone who thinks a lot about AI-driven automation might find it useful to spend five minutes scrolling around: it’s a way of jumping yourself down to a lower level of abstraction. I think the list is also a little bit mesmerizing, in its own right.

One update I’ve made is that I’m now more confident that more than half of present-day occupational tasks could be automated using fairly narrow, non-agential, and boring-looking AI systems. (Most of them don’t scream “this task requires AI systems with long-run objectives and high levels of generality.”) I think it’s also pretty interesting, as kind of a game, to try to imagine as concretely as possible what the training processes might look like for systems that can perform (or eliminate the need for) different tasks on the list.

As a sample, here are ten random tasks. (Some of these could easily be broken up into a lot of different sub-tasks or task variants, which might be automated independently.)

  • Cancel letter or parcel post stamps by hand.
  • Inquire into the cause, manner, and circumstances of human deaths and establish the identities of deceased persons.
  • Teach patients to use home health care equipment.
  • Write reports or articles for Web sites or newsletters related to environmental engineering issues.
  • Supervise and participate in kitchen and dining area cleaning activities.
  • Intervene as an advocate for clients or patients to resolve emergency problems in crisis situations.
  • Mark or tag material with proper job number, piece marks, and other identifying marks as required.
  • Calculate amount of debt and funds available to plan methods of payoff and to estimate time for debt liquidation.
  • Weld metal parts together, using portable gas welding equipment.
  • Provide assistance to patrons by performing duties such as opening doors and carrying bags.

In general, I think “read short descriptions of randomly sampled cases” might be an underrated way to learn about the world and notice issues with your assumptions/models.

A couple other examples:

I’ve been trying to develop a better understanding of various aspects of interstate conflict. The Correlates of War militarized interstate disputes (MIDs) dataset is, I think, somewhat useful for this. The project files include short descriptions of (supposedly) every case between 1993 and 2014 in which one state “threatened, displayed, or used force against another.” Here, for example, is the set of descriptions for 2011-2014. I’m not sure I’ve had any huge/concrete take-aways, but I think reading the cases: (a) made me aware of some international tensions I was oblivious to; (b) gave me a slightly better understanding of dynamics around ‘micro-aggressions’ (e.g. flying over someone’s airspace); and (c) helped me more strongly internalize the low base rate for crises boiling over into war (since I disproportionately read about historical disputes that turned into something larger).

Last year, I also spent a bit of time trying to improve my understanding of police killings in the US. I found this book unusually useful. It includes short descriptions of every single incident in which an unarmed person was killed by a police officer in 2015. I feel like reading a portion of it helped me to quickly notice and internalize different aspects of the problem (e.g. the fact that something like a third of the deaths are caused by tasers; the large role of untreated mental illness as a risk factor; the fact that nearly all fatal interactions are triggered by 911 calls, rather than stops; the fact that officers are trained to interact importantly differently with people they believe are on PCP; etc.). l assume I could have learned all the same things by just reading papers — but I think the case sampling approach was probably faster and better for retention.

I think it's possible there might be value in creating “random case descriptions” collections for a broader range of phenomena. Academia really doesn’t emphasize these kinds of collections as tools for either research or teaching.

EDIT: Another good example of this approach to learning is Rob Besinger's recent post "thirty-three randomly selected bioethics papers."

Interesting ideas. Some similarities with qualitative research, but also important differences, I think (if I understand you correctly).

I’d actually say this is a variety of qualitative research. At least in the main academic areas I follow, though, it seems a lot more common to read and write up small numbers of detailed case studies (often selected for being especially interesting) than to read and write up large numbers of shallow case studies (selected close to randomly).

This seems to be true in international relations, for example. In a class on interstate war, it’s plausible people would be assigned a long analysis of the outbreak WW1, but very unlikely they’d be assigned short descriptions of the outbreaks of twenty random wars. (Quite possible there’s a lot of variation between fields, though.)

I agree with the thrust of the conclusion, though I worry that focusing on task decomposition this way elides the fact that the descriptions of the O*NET tasks already assume your unit of labor is fairly general. Reading many of these, I actually feel pretty unsure about the level of generality or common-sense reasoning required for an AI to straightforwardly replace that part of a human's job. Presumably there's some restructure that would still squeeze a lot of economic value out of narrow AIs that could basically do these things, but that restructure isn't captured looking at the list of present-day O*NET tasks.

Some thoughts on risks from unsafe technologies:

It’s hard for the development of an unsafe technology to make the world much worse, in expectation, if safety failures primarily affect the technology’s users.

For example: If the risk of dying in a plane crash outweighs the value of flying, too badly, then people won’t fly. If the risk of dying doesn’t outweigh the benefit, then people will fly, and they’ll be (on average) better off despite occasionally dying. Either way, planes don’t make the world worse.

For an unsafe technology to make the world much worse, the risk from accidents will typically need to fall primarily on non-users. Unsafe technologies that primarily harm non-users (e.g. viruses that can escape labs) are importantly different than unsafe technologies that primarily harm users (e.g. bridges that might collapse). Negative externalities are essential to the story.

Overall, though, I tend to worry less about negative externalities from safety failures than I do about negative externalities from properly functioning technologies. Externalities from safety failures grow the more unsafe the technology is; but, the more unsafe the technology is, the less incentive anyone has to develop or use it. Eliminating safety-related externalities is also largely an engineering problem, that everyone has some incentive to solve. We therefore shouldn’t expect these externalities to stick around forever — unless we lose our ability to modify the technology (e.g. because we all die) early on. On the other hand, if the technology produces massive negative externalities even when it works perfectly, it's easier to understand how its development could make the world badly and lastingly worse.

the more unsafe the technology is, the less incentive anyone has to develop or use it

That seems correct all else equal. However, it can be outweighed by actors seeking relative gains or other competitive pressures. And my impression is this is a key premise in some typical arguments for why AI risk is large.

Schlosser's Command and Control has some instructive examples from nuclear policy (which I think you're aware of, so describing them mostly for the benefit of other readers) where e.g. US policymakers were explicitly trading off accident risk with military capabilities when deciding if/how many bombers with nuclear weapons to have patrolling in the air.

And indeed several bombers with nuclear weapons crashed, e.g. 1968 over Greenland, though no nuclear detonation resulted. This is also an example where external parties for a while were kind of screwed. Yes, Denmark had an incentive to reduce safety risks from US bombers flying over their territory; but they didn't have the technical capabilities to develop less risky substitutes, and political defenses like the nuclear-free zone they declared were just violated by the US.

Tbc, I do agree all your points are correct in principle. E.g. in this example, the US did have an incentive to reduce safety risks, and since none of the accidents were "fatal" to the US they did eventually replace nuclear weapons flying around with better ICBMs, submarines etc. I still feel like your take sounds too optimistic once one takes competitive dynamics into account.

--

As an aside, I'm not sure I agree that reducing safety-related externalities is largely an engineering problem, unless we include social engineering. Things like organizational culture, checklists, maintenance policies, risk assessments, etc., also seem quite important to me. (Or in the nuclear policy example even things like arms control, geopolitics, ...)

As an aside, I'm not sure I agree that reducing safety-related externalities is largely an engineering problem, unless we include social engineering. Things like organizational culture, checklists, maintenance policies, risk assessments, etc., also seem quite important to me. (Or in the nuclear policy example even things like arms control, geopolitics, ...)

I think this depends a bit what class of safety issues we're thinking about. For example, a properly functioning nuke is meant to explode and kills loads of people. A lot of nuclear safety issues are then borderline misuse issues: people deciding to use them when really they shouldn't, for instance due to misinterpretations of others' actions. Many other technological 'accident risks' are less social, although never entirely non-social (e.g. even in the case of bridge safety, you still need to trust some organization to do maintenance/testing properly.)

That seems correct all else equal. However, it can be outweighed by actors seeking relative gains or other competitive pressures.

I definitely don't want to deny that actors can sometimes have incentives to use badly world-worseningly unsafe technologies. But you do need the right balance of conditions to hold: individual units of the technology need to offer their users large enough benefits and small enough personal safety risks, need to create large enough external safety risks, and need to have safety levels that increase slowly enough over time.

Weapons of mass destruction are sort of special in this regard. They can in some cases have exceptionally high value to their users (deterring or preventing invasion), which makes them willing to bear unusually high risks. Since their purpose is to kill huge numbers of people on very short notice, there's naturally a risk of them killing huge numbers of people (but under the wrong circumstances). This risk is also unusually hard to reduce over time, since it's often more about people making bad decisions than it is about the technology 'misbehaving' per se; there is also a natural trade-off between increasing readiness and decreasing the risk of bad usage decisions being made. The risk also naturally falls very heavily on other actors (since the technology is meant to harm other actors).

I do generally find it easiest to understand how AI safety issues could make the world permanently worse when I imagine superweapon/WMD-like systems (of the sort that also seem to be imagined in work like "Racing to the Precicipe"). I think existential safety risks become a much harder sell, though, if we're primarily imagining non-superweapon applications and distributed/gradual/what-failure-looks-like-style scenarios.

I also think it's worth noting that, on an annual basis, even nukes don't have a super high chance of producing global catastrophes through accidental use; if you have a high enough discount rate, and you buy the theory that they substantially reduce the risk of great power war, then it's even possible (maybe not likely) that their existence is currently positive EV by non-longtermist lights.

But you do need the right balance of conditions to hold: individual units of the technology need to offer their users large enough benefits and small enough personal safety risks, need to create large enough external safety risks, and need to have safety levels that increase slowly enough over time.
Weapons of mass destruction are sort of special in this regard. [...]
[...] I think existential safety risks become a much harder sell, though, if we're primarily imagining non-superweapon applications and distributed/gradual/what-failure-looks-like-style scenarios.

Yes, my guess is we broadly agree about all of this.

I also think it's worth noting that, on an annual basis, even nukes don't have a super high chance of producing global catastrophes through accidental use; if you have a high enough discount rate, and you buy the theory that they substantially reduce the risk of great power war, then it's even possible (maybe not likely) that their existence is currently positive EV by non-longtermist lights.

This also sounds right to me. FWIW, it's not even obvious to me if nukes are negative-EV by longtermist lights. Since nuclear winter seems unlikely to cause immediate extinction this depends on messy questions such as how the EV of trajectory changes from conventional great power war compares to the EV of trajectory changes from nuclear winter scenarios.

I think this depends a bit what class of safety issues we're thinking about. [...] Many other technological 'accident risks' are less social, although never entirely non-social (e.g. even in the case of bridge safety, you still need to trust some organization to do maintenance/testing properly.)

I'm not sure I agree with this. While they haven't been selected to be representative, the sense I got from the accident case studies I've read (e.g. Chernobyl, nuclear weapons accidents, and various cases from the books Flirting With Disaster and Warnings) is that the social component was quite substantial. It seems to me that usually either better engineering (though sometimes this wasn't possible) or better social management of dealing with engineering limitations (usually possible) could have avoided these accidents. It makes a lot of sense to me that some people prefer to talk of "sociotechnical systems".