Ben Garfinkel

Ben Garfinkel - Researcher at Future of Humanity Institute


What are things everyone here should (maybe) read?

Fortunately, if I remember correctly, something like the distinction between the true criterion of rightness and the best practical decision procedure actually is a major theme in the Kagan book. (Although I think the distinction probably often is underemphasized.)

It is therefore kind of misleading to think of consequentialism vs. deontology vs. virtue ethics as alternative theories, which however is the way normative ethics is typically presented in the analytic tradition.

I agree there is something to this concern. But I still wouldn't go so far as to say that it's misleading to think of them as alternative theories. I do think they count as conceptually distinct (even if the boundaries are sometimes a bit muddy), and I think they do sometimes have different implications for how you should in fact make moral decisions.

Beyond the deontology/consequentialism debate, I think there are also relevant questions around demandingness (how strong are our moral obligations, if any?), on the nature of well-being (e.g. hedonistic vs. preference-based vs. objective list theories), on the set of things that count as morally relevant consequences (e.g. do things beyond well-being matter? should we care more about totals or averages?), and so on.

What are things everyone here should (maybe) read?

A slightly boring answer: I think most people should at least partly read something that overviews common theories and frameworks in normative ethics (and the arguments for and against them) and something that overviews core concepts and principles in economics (e.g. the idea of expected utility, the idea of an externality, supply/demand, the basics of economic growth, the basics of public choice).

In my view, normative ethics and economics together make up a really large portion of the intellectual foundation that EA is built on.

One good book that overviews normative ethics is Shelly Kagan's Normative Ethics, although I haven't read it since college (and I think it has only a tiny amount of coverage of population ethics and animal ethics). One thing I like about it is it focuses on laying out the space of possible ethical views in a sensible way, rather than tracing the history of the field. If I remember correctly, names like Aristotle, Kant, etc. never show up. It's also written in a very conversational style.

One good introductory economics textbook is Tyler Cowen's and Alex Tabarrok's Modern Principles of Economics. I don't know how it stacks up to other intro textbooks, since it's the only one that I've read more than a little of, but it's very readable, has very little math, and emphasizes key concepts and principles. Reading just the foundational chapters in an intro textbook, then the chapters whose topics sound important, can probably get most people a decent portion of the value of reading a full textbook.

Ben Garfinkel's Shortform

That's a good example.

I do agree that quasi-random variation in culture can be really important. And I agree that this variation is sometimes pretty sticky (e.g. Europe being predominantly Christian and the Middle East being predominantly Muslim for more than a thousand years). I wouldn't say that this kind of variation is a "rounding error."

Over sufficiently long timespans, though, I think that technological/economic change has been more significant.

As an attempt to operationalize this claim: The average human society in 1000AD was obviously very different than the average human society in 10,000BC. I think that the difference would have been less than half as large (at least in intuitive terms) if there hadn't been technological/economic change.

I think that the pool of available technology creates biases in the sorts of societies that emerge and stick around. For large enough amounts of technological change, and long enough timespans (long enough for selection pressures to really matter), I think that shifts in these technological biases will explain a large portion of the shifts we see in the traits of the average society.[1]

  1. If selection pressures become a lot weaker in the future, though, then random drift might become more important in relative terms. ↩︎

Ben Garfinkel's Shortform

FWIW, I wouldn't say I agree with the main thesis of that post.

However, while I expect machines that outcompete humans for jobs, I don’t see how that greatly increases the problem of value drift. Human cultural plasticity already ensures that humans are capable of expressing a very wide range of values. I see no obviously limits there. Genetic engineering will allow more changes to humans. Ems inherit human plasticity, and may add even more via direct brain modifications.

In principle, non-em-based artificial intelligence is capable of expressing the entire space of possible values. But in practice, in the shorter run, such AIs will take on social roles near humans, and roles that humans once occupied....

I don’t see why people concerned with value drift should be especially focused on AI. Yes, AI may accompany faster change, and faster change can make value drift worse for people with intermediate discount rates. (Though it seems to me that altruistic discount rates should scale with actual rates of change, not with arbitrary external clocks.)

I definitely think that human biology creates at least very strong biases toward certain values (if not hard constraints) and that AI system would not need to have these same biases. If you're worried about future agents having super different and bad values, then AI is a natural focal point for your worry.

A couple other possible clarifications about my views here:

  • I think that the outcome of the AI Revolution could be much worse, relative to our current values, than the Neolithic Revolution was relative to the values of our hunter-gatherer ancestors. But I think the question "Will the outcome be worse?" is distinct from the question "Will we have less freedom to choose the outcome?"

  • I'm personally not so focused on value drift as a driver of long-run social change. For example, the changes associated with the Neolithic Revolution weren't really driven by people becoming less egalitarian, more pro-slavery, more inclined to hold certain religious beliefs, more ideologically attached to sedentism/farming, more happy to accept risks from disease, etc. There were value changes, but, to some significant degree, they seem to have been downstream of technological/economic change.

Ben Garfinkel's Shortform

Do you have the intuition that absent further technological development, human values would drift arbitrarily far?

Certainly not arbitrarily far. I also think that technological development (esp. the emergence of agriculture and modern industry) has played a much larger role in changing the world over time than random value drift has.

[E]ven non-extinction AI is enabling a new set of possibilities that modern-day humans would endorse much less than the decisions of future humans otherwise.

I definitely think that's true. But I also think that was true of agriculture, relative to the values of hunter-gatherer societies.

To be clear, I'm not downplaying the likelihood or potential importance of any of the three crisper concerns I listed. For example, I think that AI progress could conceivably lead to a future that is super alienating and bad.

I'm just (a) somewhat pedantically arguing that we shouldn't frame the concerns as being about a "loss of control over the future" and (b) suggesting that you can rationally have all these same concerns even if you come to believe that technical alignment issues aren't actually a big deal.

Ben Garfinkel's Shortform

A thought on how we describe existential risks from misaligned AI:

Sometimes discussions focus on a fairly specific version of AI risk, which involves humanity being quickly wiped out. Increasingly, though, the emphasis seems to be on the more abstract idea of “humanity losing control of its future.” I think it might be worthwhile to unpack this latter idea a bit more.

There’s already a fairly strong sense in which humanity has never controlled its own future. For example, looking back ten thousand years, no one decided that the sedentary agriculture would increasingly supplant hunting and gathering, that increasingly complex states would arise, that slavery would become common, that disease would take off, that social hierarchies and gender divisions would become stricter, etc. The transition to the modern world, and everything that came with this transition, also doesn’t seem to have been meaningfully chosen (or even really understood by anyone). The most serious effort to describe a possible future in detail — Hanson’s Age of Em — also describes a future with loads of features that most present-day people would not endorse.

As long as there are still strong competitive pressures or substantial random drift, it seems to me, no generation ever really gets to choose the future.[1] It's actually sort of ambiguous, then, what it means to worry about “losing control of our future."

Here are a few alternative versions of the concern that feel a bit crisper to me:

  1. If we ‘mess up on AI,’ then even the most powerful individual humans will have unusually little influence over their own lives or the world around them.[2]
  1. If we ‘mess up on AI,’ then future people may be unusually dissatisfied about the world they live in. In other words, people's preferences will be unfilled to an unusually large degree.

  2. Humanity may have a rare opportunity to take control of its own future, by achieving strong coordination and then locking various things in. But if we ‘mess up on AI,’ then we’ll miss out on this opportunity.[3]

Something that’s a bit interesting about these alternative versions of the concern, though, is that they’re not inherently linked to AI alignment issues. Even if AI systems behave roughly as their users intend, I believe each of these outcomes is still conceivable. For example, if there’s a missed opportunity to achieve strong coordination around AI, the story might look like the failure of the Baruch Plan for international control of nuclear weapons: that failure had much more to do with politics than it had to do with the way engineers designed the technology in question.

In general, if we move beyond discussing very sharp alignment-related catastrophes (e.g. humanity being quickly wiped out), then I think concerns about misaligned AI start to bleed into broader AI governance concerns. It starts to become more ambiguous whether technical alignment issues are actually central or necessary to the disaster stories people tell.

  1. Although, admittedly, notable individuals or groups (e.g. early Christians) do sometimes have a fairly lasting and important influence. ↩︎

  2. As an analogy, in the world of The Matrix, people may not actually have much less control over the long-run future than hunter-gatherers did twenty thousand years ago. But they certainly have much less control over their own lives. ↩︎

  3. Notably, this is only a bad thing if we expect the relevant generation of humans to choose a better future than would be arrived at by default. ↩︎

Ben Garfinkel's Shortform

Good point!

That consideration -- and the more basic consideration that more junior people often just know less -- definitely pushes in the opposite direction. If you wanted to try some version of seniority-weighted epistemic deference, my guess is that the most reliable cohort would have studied a given topic for at least a few years but less than a couple decades.

Ben Garfinkel's Shortform

A thought on epistemic deference:

The longer you hold a view, and the more publicly you hold a view, the more calcified it typically becomes. Changing your mind becomes more aversive and potentially costly, you have more tools at your disposal to mount a lawyerly defense, and you find it harder to adopt frameworks/perspectives other than your favored one (the grooves become firmly imprinted into your brain). At least, this is the way it seems and personally feels to me.[1]

For this reason, the observation “someone I respect publicly argued for X many years ago and still believes X” typically only provides a bit more evidence than the observation “someone I respect argued for X many years ago.” For example, even though I greatly respect Daron Acemoglu, I think the observation “Daron Acemoglu still believes that political institutions are the central determinant of economic growth rates” only gives me a bit more evidence than the observation “15 years ago Daron Acemoglu publicly argued that institutions are the central determinant of economic growth rates.”

A corollary: If there’s an academic field that contains a long-standing debate, and you’d like to defer to experts in this field, you may want to give disproportionate weight to the opinions of junior academics. They’re less likely to have responded to recent evidence and arguments in an epistemically inflexible way.

  1. Of course, there are exceptions. The final chapter of Scout Mindset includes a moving example of a professor publicly abandoning a view he had championed for fifteen years, after a visiting academic presented persuasive new evidence. The reason these kinds of stories are moving, though, is that they describe truly exceptional behavior. ↩︎

Ben Garfinkel's Shortform

I’d actually say this is a variety of qualitative research. At least in the main academic areas I follow, though, it seems a lot more common to read and write up small numbers of detailed case studies (often selected for being especially interesting) than to read and write up large numbers of shallow case studies (selected close to randomly).

This seems to be true in international relations, for example. In a class on interstate war, it’s plausible people would be assigned a long analysis of the outbreak WW1, but very unlikely they’d be assigned short descriptions of the outbreaks of twenty random wars. (Quite possible there’s a lot of variation between fields, though.)

Ben Garfinkel's Shortform

In general, I think “read short descriptions of randomly sampled cases” might be an underrated way to learn about the world and notice issues with your assumptions/models.

A couple other examples:

I’ve been trying to develop a better understanding of various aspects of interstate conflict. The Correlates of War militarized interstate disputes (MIDs) dataset is, I think, somewhat useful for this. The project files include short descriptions of (supposedly) every case between 1993 and 2014 in which one state “threatened, displayed, or used force against another.” Here, for example, is the set of descriptions for 2011-2014. I’m not sure I’ve had any huge/concrete take-aways, but I think reading the cases: (a) made me aware of some international tensions I was oblivious to; (b) gave me a slightly better understanding of dynamics around ‘micro-aggressions’ (e.g. flying over someone’s airspace); and (c) helped me more strongly internalize the low base rate for crises boiling over into war (since I disproportionately read about historical disputes that turned into something larger).

Last year, I also spent a bit of time trying to improve my understanding of police killings in the US. I found this book unusually useful. It includes short descriptions of every single incident in which an unarmed person was killed by a police officer in 2015. I feel like reading a portion of it helped me to quickly notice and internalize different aspects of the problem (e.g. the fact that something like a third of the deaths are caused by tasers; the large role of untreated mental illness as a risk factor; the fact that nearly all fatal interactions are triggered by 911 calls, rather than stops; the fact that officers are trained to interact importantly differently with people they believe are on PCP; etc.). l assume I could have learned all the same things by just reading papers — but I think the case sampling approach was probably faster and better for retention.

I think it's possible there might be value in creating “random case descriptions” collections for a broader range of phenomena. Academia really doesn’t emphasize these kinds of collections as tools for either research or teaching.

EDIT: Another good example of this approach to learning is Rob Besinger's recent post "thirty-three randomly selected bioethics papers."

Load More