45 comments, sorted by Click to highlight new comments since: Today at 11:12 PM
New Comment

The existential risk community’s relative level of concern about different existential risks is correlated with how hard-to-analyze these risks are. For example, here is The Precipice’s ranking of the top five most concerning existential risks:

  1. Unaligned artificial intelligence[1]
  2. Unforeseen anthropogenic risks (tied)
  3. Engineered pandemics (tied)
  4. Other anthropogenic risks
  5. Nuclear war (tied)
  6. Climate change (tied)

This isn’t surprising.

For a number of risks, when you first hear about them, it’s reasonable to have the reaction “Oh, hm, maybe that could be a huge threat to human survival” and initially assign something on the order of a 10% credence to the hypothesis that it will by default lead to existentially bad outcomes. In each case, if we can gain much greater clarity about the risk, then we should think there’s about a 90% chance we’ll become less worried about it. We’re likely to remain decently worried about hard-to-analyze risks (because we can’t get greater clarity about them) while becoming less worried about easy-to-analyze risks.

In particular, our level of worry about different plausible existential risks is likely to roughly track our ability to analyze them (e.g. through empirical evidence, predictively accurate formal models, and clearcut arguments).

Some plausible existential risks also are far easier to analyze than others. If you compare 80K’s articles on climate change and artificial intelligence, for example, then I think it is pretty clear that people analyzing climate risk simply have a lot more to go on. When we study climate change, we can rely on climate models that we have reason to believe have a decent amount of validity. We can also draw on empirical evidence about the historical effects of previous large changes in global temperature and about the ability of humans and other specifies to survive under different local climate conditions. And so on. We’re in a much worse epistemic position when it comes to analyzing the risk from misaligned AI: we’re reliant on fuzzy analogies, abstract arguments that use highly ambiguous concepts, observations of the behaviour of present-day AI systems (e.g. reinforcement learners that play videogames) that will probably be very different than future AI systems, a single datapoint (the evolution of human intelligence and values) that has a lot of important differences with the case we’re considering, and attempts to predict the incentives and beliefs of future actors in development scenarios that are still very opaque to us. Even if the existential risk from misaligned AI actually is reasonably small, it’s hard to see how we could become really confident of that.

Some upshots:

  1. The fact that the existential risk community is particularly worried about misaligned AI might mostly reflect the fact that it’s hard to analyze risks from misaligned AI.

  2. Nonetheless, even if the above possibility is true, it doesn't at all follow that the community is irrational to worry far more about misaligned AI than other potential risks. It’s completely coherent to have something like this attitude: “If I could think more clearly about the risk from misaligned AI, then I would probably come to realize it’s not that big a deal. But, in practice, I can’t yet think very clearly about it. That means that, unlike in the case of climate change, I also can’t rule out the small possibility that clarity would make me much more worried about it than I currently am. So, on balance, I should feel more worried about misaligned AI than I do about other risks. I should focus my efforts on it, even if — to uncharitable observers — my efforts will probably look a bit misguided after the fact.”

  3. For hard-to-analyze risks, it matters a lot what your “prior” in the risks is (since evidence, models, and arguments can only really move you so much). I sometimes get the sense that some people are starting from a prior that’s not far from 50%: For example, people who are very worried about misaligned AI sometimes use the rhetorical move “How would the world look different if AI wasn’t going to kill everyone?”, and this move seems to assume that empirical evidence is needed to shift us down from a high credence. I think that other people (including myself) are often implicitly starting from a low prior and feel the need to be argued up. Insofar as it’s very unclear how we should determine our priors, and it's even a bit unclear what exactly a "prior" means in this case, it’s also unsurprising that there’s a particularly huge range of variation in estimates of the risk from misaligned AI.

(This shortform partly inspired by Greg Lewis's recent forecasting post .)


  1. Toby Ord notes, in the section of The Precipice that gives risk estimates: "The case for existential risk from AI is clearly speculative. Indeed, it is the most speculative case for a major risk in this book." ↩︎

Related:

The uncertainty and error-proneness of our first-order assessments of risk is itself something we must factor into our all-things-considered probability assignments. This factor often dominates in low-probability, high- consequence risks—especially those involving poorly understood natural phenomena, complex social dynamics, or new technology, or that are difficult to assess for other reasons. Suppose that some scientific analysis A indicates that some catastrophe X has an extremely small probability P(X) of occurring. Then the probability that A has some hidden crucial flaw may easily be much greater than P(X). Furthermore, the conditional probability of X given that A is crucially flawed, P(X|¬A), may be fairly high. We may then find that most of the risk of X resides in the uncertainty of our scientific assessment that P(X) was small.

(source)

A thought on epistemic deference:

The longer you hold a view, and the more publicly you hold a view, the more calcified it typically becomes. Changing your mind becomes more aversive and potentially costly, you have more tools at your disposal to mount a lawyerly defense, and you find it harder to adopt frameworks/perspectives other than your favored one (the grooves become firmly imprinted into your brain). At least, this is the way it seems and personally feels to me.[1]

For this reason, the observation “someone I respect publicly argued for X many years ago and still believes X” typically only provides a bit more evidence than the observation “someone I respect argued for X many years ago.” For example, even though I greatly respect Daron Acemoglu, I think the observation “Daron Acemoglu still believes that political institutions are the central determinant of economic growth rates” only gives me a bit more evidence than the observation “15 years ago Daron Acemoglu publicly argued that institutions are the central determinant of economic growth rates.”

A corollary: If there’s an academic field that contains a long-standing debate, and you’d like to defer to experts in this field, you may want to give disproportionate weight to the opinions of junior academics. They’re less likely to have responded to recent evidence and arguments in an epistemically inflexible way.


  1. Of course, there are exceptions. The final chapter of Scout Mindset includes a moving example of a professor publicly abandoning a view he had championed for fifteen years, after a visiting academic presented persuasive new evidence. The reason these kinds of stories are moving, though, is that they describe truly exceptional behavior. ↩︎

At least in software, there's a problem I see where young engineers are often overly bought-in to hype trains, but older engineers (on average) stick with technologies they know too much.

I would imagine something similar in academia, where hot new theories are over-valued by the young, but older academics have the problem you describe.

Good point!

That consideration -- and the more basic consideration that more junior people often just know less -- definitely pushes in the opposite direction. If you wanted to try some version of seniority-weighted epistemic deference, my guess is that the most reliable cohort would have studied a given topic for at least a few years but less than a couple decades.

A thought on how we describe existential risks from misaligned AI:

Sometimes discussions focus on a fairly specific version of AI risk, which involves humanity being quickly wiped out. Increasingly, though, the emphasis seems to be on the more abstract idea of “humanity losing control of its future.” I think it might be worthwhile to unpack this latter idea a bit more.

There’s already a fairly strong sense in which humanity has never controlled its own future. For example, looking back ten thousand years, no one decided that the sedentary agriculture would increasingly supplant hunting and gathering, that increasingly complex states would arise, that slavery would become common, that disease would take off, that social hierarchies and gender divisions would become stricter, etc. The transition to the modern world, and everything that came with this transition, also doesn’t seem to have been meaningfully chosen (or even really understood by anyone). The most serious effort to describe a possible future in detail — Hanson’s Age of Em — also describes a future with loads of features that most present-day people would not endorse.

As long as there are still strong competitive pressures or substantial random drift, it seems to me, no generation ever really gets to choose the future.[1] It's actually sort of ambiguous, then, what it means to worry about “losing control of our future."

Here are a few alternative versions of the concern that feel a bit crisper to me:

  1. If we ‘mess up on AI,’ then even the most powerful individual humans will have unusually little influence over their own lives or the world around them.[2]
  1. If we ‘mess up on AI,’ then future people may be unusually dissatisfied about the world they live in. In other words, people's preferences will be unfilled to an unusually large degree.

  2. Humanity may have a rare opportunity to take control of its own future, by achieving strong coordination and then locking various things in. But if we ‘mess up on AI,’ then we’ll miss out on this opportunity.[3]

Something that’s a bit interesting about these alternative versions of the concern, though, is that they’re not inherently linked to AI alignment issues. Even if AI systems behave roughly as their users intend, I believe each of these outcomes is still conceivable. For example, if there’s a missed opportunity to achieve strong coordination around AI, the story might look like the failure of the Baruch Plan for international control of nuclear weapons: that failure had much more to do with politics than it had to do with the way engineers designed the technology in question.

In general, if we move beyond discussing very sharp alignment-related catastrophes (e.g. humanity being quickly wiped out), then I think concerns about misaligned AI start to bleed into broader AI governance concerns. It starts to become more ambiguous whether technical alignment issues are actually central or necessary to the disaster stories people tell.


  1. Although, admittedly, notable individuals or groups (e.g. early Christians) do sometimes have a fairly lasting and important influence. ↩︎

  2. As an analogy, in the world of The Matrix, people may not actually have much less control over the long-run future than hunter-gatherers did twenty thousand years ago. But they certainly have much less control over their own lives. ↩︎

  3. Notably, this is only a bad thing if we expect the relevant generation of humans to choose a better future than would be arrived at by default. ↩︎

Another interpretation of the concern, though related to your (3), is that misaligned AI may cause humanity to lose the potential to control its future. This is consistent with humanity not having (and never having had) actual control of its future; it only requires that this potential exists, and that misaligned AI poses a threat to it.

I agree with most of what you say here.

[ETA: I now realize that I think the following is basically just restating what Pablo already suggested in another comment.]

I think the following is a plausible & stronger concern, which could be read as a stronger version of your crisp concern #3.

"Humanity has not had meaningful control over its future, but AI will now take control one way or the other. Shaping the transition to a future controlled by AI is therefore our first and last opportunity to take control. If we mess up on AI, not only have we failed to seize this opportunity, there also won't be any other."

Of course, AI being our first and only opportunity to take control of the future is a strictly stronger claim than AI being one such opportunity. And so it must be less likely. But my impression is that the stronger claim is sufficiently more important that it could be justified to basically 'wager' most AI risk work on it being true.

I agree with this general point. I'm not sure if you think this is an interesting point to notice that's useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I'd agree with the former but disagree with the latter.

I'm not sure if you think this is an interesting point to notice that's useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I'd agree with the former but disagree with the latter.

Mostly the former!

I think the point may have implications for how much we should prioritize alignment research, relative to other kinds of work, but this depends on what the previous version of someone's world model was.

For example, if someone has assumed that solving the 'alignment problem' is close to sufficient to ensure that humanity has "control" of its future, then absorbing this point (if it's correct) might cause them to update downward on the expected impact of technical alignment research. Research focused on coordination-related issues (e.g. cooperative AI stuff) might increase in value, at least in relative terms.

Do you have the intuition that absent further technological development, human values would drift arbitrarily far? It's not clear to me that they would-- in that sense, I do feel like we're "losing control" in that even non-extinction AI is enabling a new set of possibilities that modern-day humans would endorse much less than the decisions of future humans otherwise. (It does also feel like we're missing the opportunity to "take control" and enable a new set of possibilities that we would endorse much more.)

Relatedly, it doesn't feel to me like the values of humans 150,000 years ago and humans now and even ems in Age of Em are all that different on some more absolute scale.

Do you have the intuition that absent further technological development, human values would drift arbitrarily far?

Certainly not arbitrarily far. I also think that technological development (esp. the emergence of agriculture and modern industry) has played a much larger role in changing the world over time than random value drift has.

[E]ven non-extinction AI is enabling a new set of possibilities that modern-day humans would endorse much less than the decisions of future humans otherwise.

I definitely think that's true. But I also think that was true of agriculture, relative to the values of hunter-gatherer societies.

To be clear, I'm not downplaying the likelihood or potential importance of any of the three crisper concerns I listed. For example, I think that AI progress could conceivably lead to a future that is super alienating and bad.

I'm just (a) somewhat pedantically arguing that we shouldn't frame the concerns as being about a "loss of control over the future" and (b) suggesting that you can rationally have all these same concerns even if you come to believe that technical alignment issues aren't actually a big deal.

Wow, I just learned that Robin Hanson has written about this, because obviously, and he agrees with you.

And Paul Christiano agrees with me. Truly, time makes fools of us all.

FWIW, I wouldn't say I agree with the main thesis of that post.

However, while I expect machines that outcompete humans for jobs, I don’t see how that greatly increases the problem of value drift. Human cultural plasticity already ensures that humans are capable of expressing a very wide range of values. I see no obviously limits there. Genetic engineering will allow more changes to humans. Ems inherit human plasticity, and may add even more via direct brain modifications.

In principle, non-em-based artificial intelligence is capable of expressing the entire space of possible values. But in practice, in the shorter run, such AIs will take on social roles near humans, and roles that humans once occupied....

I don’t see why people concerned with value drift should be especially focused on AI. Yes, AI may accompany faster change, and faster change can make value drift worse for people with intermediate discount rates. (Though it seems to me that altruistic discount rates should scale with actual rates of change, not with arbitrary external clocks.)

I definitely think that human biology creates at least very strong biases toward certain values (if not hard constraints) and that AI system would not need to have these same biases. If you're worried about future agents having super different and bad values, then AI is a natural focal point for your worry.


A couple other possible clarifications about my views here:

  • I think that the outcome of the AI Revolution could be much worse, relative to our current values, than the Neolithic Revolution was relative to the values of our hunter-gatherer ancestors. But I think the question "Will the outcome be worse?" is distinct from the question "Will we have less freedom to choose the outcome?"

  • I'm personally not so focused on value drift as a driver of long-run social change. For example, the changes associated with the Neolithic Revolution weren't really driven by people becoming less egalitarian, more pro-slavery, more inclined to hold certain religious beliefs, more ideologically attached to sedentism/farming, more happy to accept risks from disease, etc. There were value changes, but, to some significant degree, they seem to have been downstream of technological/economic change.

Really appreciate the clarifications! I think I was interpreting "humanity loses control of the future" in a weirdly temporally narrow sense that makes it all about outcomes, i.e. where "humanity" refers to present-day humans, rather than humans at any given time period.  I totally agree that future humans may have less freedom to choose the outcome in a way that's not a consequence of alignment issues.

I also agree value drift hasn't historically driven long-run social change, though I kind of do think it will going forward, as humanity has more power to shape its environment at will.

I also agree value drift hasn't historically driven long-run social change
 

My impression is that the differences in historical vegetarianism rates between India and China, and especially India and southern China (where there is greater similarity of climate and crops used), is a moderate counterpoint. At the timescale of centuries, vegetarianism rates in India are much higher than rates in China. Since factory farming is plausibly one of the larger sources of human-caused suffering today, the differences aren't exactly a rounding error. 

That's a good example.

I do agree that quasi-random variation in culture can be really important. And I agree that this variation is sometimes pretty sticky (e.g. Europe being predominantly Christian and the Middle East being predominantly Muslim for more than a thousand years). I wouldn't say that this kind of variation is a "rounding error."

Over sufficiently long timespans, though, I think that technological/economic change has been more significant.

As an attempt to operationalize this claim: The average human society in 1000AD was obviously very different than the average human society in 10,000BC. I think that the difference would have been less than half as large (at least in intuitive terms) if there hadn't been technological/economic change.

I think that the pool of available technology creates biases in the sorts of societies that emerge and stick around. For large enough amounts of technological change, and long enough timespans (long enough for selection pressures to really matter), I think that shifts in these technological biases will explain a large portion of the shifts we see in the traits of the average society.[1]


  1. If selection pressures become a lot weaker in the future, though, then random drift might become more important in relative terms. ↩︎

Would you consider making this into a top-level post? The discussion here is really interesting and could use more attention, and a top-level post helps to deliver that (this also means the post can be tagged for greater searchability).

I think the top-level post could be exactly the text here, plus a link to the Shortform version so people can see those comments. Though I'd also be interested to see the updated version of the original post which takes comments into account (if you felt like doing that).

A point about hiring and grantmaking, that may already be conventional wisdom:

If you're hiring for highly autonomous roles at a non-profit, or looking for non-profit founders to fund, then advice derived from the startup world is often going to overweight the importance of entrepreneurialism relative to self-skepticism and reflectiveness.[1]

Non-profits, particularly non-profits with longtermist missions, are typically trying to maximize something that is way more illegible than time-discounted future profits. To give a specific example: I think it's way harder for an organization like the Centre for Effective Altruism to tell if it's on the right track than it is for a company like Zoom to tell if it's on the right track. CEA can track certain specific metrics (e.g. the number of "new connections" reported at each conference it organizes), but it will often be ambiguous how strongly these metrics reflect positive impact - and there will also always be a risk that various negative indirect effects aren't being captured by the key metrics being used. In some cases, evaluating the expected impact of work will also require making assumptions about how the world will evolve over the next couple decades (e.g. assumptions about how pressing risks from AI are).

I think this means that it's especially important for these non-profits to employ and be headed by people who are self-skeptical and reflect deeply on decisions. Being entrepreneurial, having a bias toward action, and so on, don't count for much if the organisation isn't pointed in the right direction. As Ozzie Gooen has pointed out, there are many examples of massive and superficially successful initiatives (headed by very driven and entrepreneurial people) whose theories-of-impact don't stand up to scrutiny.

A specific example from Ozzie's post: SpaceX is a massive and extraordinarily impressive venture that was (at least according to Elon Musk) largely started to help reduce the chance of human extinction, by helping humanity become a multi-planetary species earlier than it otherwise would. But I think it's hard to see how their work reduces extinction risk very much. If you're worried about the climate effects of nuclear war, for example, then it seems important to remember that post-nuclear-war Earth would still have a much more hospitable climate than Mars. It's pretty hard to imagine a disaster scenario where building Martian colonies would be better than (for example) building some bunkers on Earth.[2] So - relative to the organization's stated social mission - all the talent, money, and effort SpaceX has absorbed might not ultimately come out to much.

A more concise way to put the concern here: Popular writing on talent identification is often implicitly asking the question "How can we identify future Elon Musks?" But, for the most part, longtermist non-profits shouldn't be looking to put future Elon Musks into leadership positions .[3]


  1. I have in mind, for example, advice given by Y Combinator and advice given in Talent. ↩︎

  2. Another example: It's possible that many highly successful environmentalist organizations/groups have ended up causing net harm to the environment, by being insufficiently self-skeptical and reflective when deciding how to approach nuclear energy issues. ↩︎

  3. A follow-up thought: Ultimately, outside of earning-to-give ventures, we probably shouldn't expect the longtermist community (or at least the best version of it) to house many extremely entrepreneurial people. There will be occasional leaders who are extremely high on both entrepreneurialism and reflectiveness (I can currently think of at least a couple); however, since these two traits don't seem to be strongly correlated, this will probably only happen pretty rarely. It's also, often, hard to keep exceptionally entrepreneurial people satisfied in non-leadership positions -- since, almost by definition, autonomy is deeply important to them -- so there may not be many opportunities, in general, to harness the talents of people who are exceptionally high on entrepreneurialism but mediocre on reflectiveness. ↩︎

I think that’s mostly right, with a couple of caveats:

  • You only mentioned non-profits, but I think most of this applies to other longtermists organizations with pretty illegible missions. Maybe Anthropic is an example.
  • Some organizations with longtermists missions should not aim to maximise something particularly illegible. In these cases, entrepreneurialism will often be very important, including in highly autonomous roles. For example, some biosecurity organization could be trying to design and produce, at very large scales, “Super PPE”, such as masks, engineered with extreme events in mind.
    • Like SpaceX, which initially aimed to significantly reduce the cost, and improve the supply, of routine space flight, the Super PPE project would need to improve upon existing PPE designed for use in extreme events, which is “ bulky, highly restrictive, and insufficiently abundant”.  (Alvea might be another example, but I don’t know enough about them).
    • This suggests a division of labour where project missions are defined by individuals outside the organization, as with Super PPE, before being executed by others, who are high on entrepreneurialism. Note that, in hiring for leadership roles in the organization, this will mean placing more weight on entrepreneurialism than on self-skepticism and reflectiveness.  While Musk did a poor job defining SpaceX's mission, he did an excellent job executing it.

Ultimately, outside of earning-to-give ventures, we probably shouldn't expect the longtermist community (or at least the best version of it) to house many extremely entrepreneurial people. There will be occasional leaders who are extremely high on both entrepreneurialism and reflectiveness (I can currently think of at least a couple); however, since these two traits don't seem to be strongly correlated, this will probably only happen pretty rarely.


This seems true. It also suggests that if you can be extremely high on both traits, you’ll bring significant counterfactual value.

Good points - those all seem right to me!

A follow-on:

The above post focused on the idea that certain traits -- reflectiveness and self-skepticism -- are more valuable in the context of non-profits (especially ones long-term missions) than they are in the context of startups.

I also think that certain traits -- drivenness, risk-tolerance, and eccentricity-- are less valuable in the context of non-profits than they are in the context of startups.

Hiring advice from the startup world often suggests that you should be looking for extraordinarily driven, risk-tolerant people with highly idiosyncratic perspectives on the world.[1] And, in the context of for-profit startups, it makes sense that these traits would be crucial.

A startup's success will often depend on its ability to outcompete large, entrenched firms in some industry (e.g. taxi companies, hotels, tech giants). To do that, an extremely high level of drivenness may be necessary to compensate for lower resource levels, lower levels of expertise, and weaker connections to gatekeepers. Or you may need to be willing to take certain risks (e.g. regulatory/PR/enemy-making risks) that would slow down existing companies in pursuing certain opportunities. Or you may need to simply see an opportunity that virtually no one else would (despite huge incentives to see it), because you have an idiosyncratic way of seeing the world. Having all three of these traits (extreme drivenness, risk tolerance, idiosyncrasy) may be necessary for you to have any plausible chance of success.

I think that all of these traits are still valuable in the non-profit world, but I also think they're comparatively less valuable (especially if you're lucky enough to have secure funding). There's simply less direct competition in the non-profit world. Large, entrenched non-profits also have much weaker incentives to find and exploit impact opportunities. Furthermore, the non-profit world isn't even that big to begin with. So there's no reason to assume all the low-hanging fruit have been plucked or to assume that large non-profits will crush you by default.[2]

For example: To accomplish something that (e.g.) the Gates Foundation hasn't already accomplished, I think you don't need to have extraordinary drivenness, risk-tolerance, or idiosyncrasy. [3]


Addendum that occurred to me while writing this follow-up: I also think that (at least given secure funding) these traits are less crucial in the non-profit world than they are in academia. Academic competition is more intense than non-profit competition and academics have stronger incentives to find new, true ideas than non-profits have to find and exploit opportunities to do good.


  1. This seems to be roughly the perspective taken by the book Talent, for example. ↩︎

  2. In fact -- unlike in the for-profit start-up world -- you should actually consider it a good outcome if a large non-profit starts copying your idea, implements it better than you, and makes your own organization redundant! ↩︎

  3. To be clear: These things -- especially drivenness -- are all important. But, unlike in the startup world, major success doesn't necessarily require setting them to extreme values. I think we should be wary of laser-focusing on these traits in the way a VC would. ↩︎

The O*NET database includes a list of about 20,000 different tasks that American workers currently need to perform as part of their jobs. I’ve found it pretty interesting to scroll through the list, sorted in random order, to get a sense of the different bits of work that add up to the US economy. I think anyone who thinks a lot about AI-driven automation might find it useful to spend five minutes scrolling around: it’s a way of jumping yourself down to a lower level of abstraction. I think the list is also a little bit mesmerizing, in its own right.

One update I’ve made is that I’m now more confident that more than half of present-day occupational tasks could be automated using fairly narrow, non-agential, and boring-looking AI systems. (Most of them don’t scream “this task requires AI systems with long-run objectives and high levels of generality.”) I think it’s also pretty interesting, as kind of a game, to try to imagine as concretely as possible what the training processes might look like for systems that can perform (or eliminate the need for) different tasks on the list.

As a sample, here are ten random tasks. (Some of these could easily be broken up into a lot of different sub-tasks or task variants, which might be automated independently.)

  • Cancel letter or parcel post stamps by hand.
  • Inquire into the cause, manner, and circumstances of human deaths and establish the identities of deceased persons.
  • Teach patients to use home health care equipment.
  • Write reports or articles for Web sites or newsletters related to environmental engineering issues.
  • Supervise and participate in kitchen and dining area cleaning activities.
  • Intervene as an advocate for clients or patients to resolve emergency problems in crisis situations.
  • Mark or tag material with proper job number, piece marks, and other identifying marks as required.
  • Calculate amount of debt and funds available to plan methods of payoff and to estimate time for debt liquidation.
  • Weld metal parts together, using portable gas welding equipment.
  • Provide assistance to patrons by performing duties such as opening doors and carrying bags.

In general, I think “read short descriptions of randomly sampled cases” might be an underrated way to learn about the world and notice issues with your assumptions/models.

A couple other examples:

I’ve been trying to develop a better understanding of various aspects of interstate conflict. The Correlates of War militarized interstate disputes (MIDs) dataset is, I think, somewhat useful for this. The project files include short descriptions of (supposedly) every case between 1993 and 2014 in which one state “threatened, displayed, or used force against another.” Here, for example, is the set of descriptions for 2011-2014. I’m not sure I’ve had any huge/concrete take-aways, but I think reading the cases: (a) made me aware of some international tensions I was oblivious to; (b) gave me a slightly better understanding of dynamics around ‘micro-aggressions’ (e.g. flying over someone’s airspace); and (c) helped me more strongly internalize the low base rate for crises boiling over into war (since I disproportionately read about historical disputes that turned into something larger).

Last year, I also spent a bit of time trying to improve my understanding of police killings in the US. I found this book unusually useful. It includes short descriptions of every single incident in which an unarmed person was killed by a police officer in 2015. I feel like reading a portion of it helped me to quickly notice and internalize different aspects of the problem (e.g. the fact that something like a third of the deaths are caused by tasers; the large role of untreated mental illness as a risk factor; the fact that nearly all fatal interactions are triggered by 911 calls, rather than stops; the fact that officers are trained to interact importantly differently with people they believe are on PCP; etc.). l assume I could have learned all the same things by just reading papers — but I think the case sampling approach was probably faster and better for retention.

I think it's possible there might be value in creating “random case descriptions” collections for a broader range of phenomena. Academia really doesn’t emphasize these kinds of collections as tools for either research or teaching.

EDIT: Another good example of this approach to learning is Rob Besinger's recent post "thirty-three randomly selected bioethics papers."

Interesting ideas. Some similarities with qualitative research, but also important differences, I think (if I understand you correctly).

I’d actually say this is a variety of qualitative research. At least in the main academic areas I follow, though, it seems a lot more common to read and write up small numbers of detailed case studies (often selected for being especially interesting) than to read and write up large numbers of shallow case studies (selected close to randomly).

This seems to be true in international relations, for example. In a class on interstate war, it’s plausible people would be assigned a long analysis of the outbreak WW1, but very unlikely they’d be assigned short descriptions of the outbreaks of twenty random wars. (Quite possible there’s a lot of variation between fields, though.)

I agree with the thrust of the conclusion, though I worry that focusing on task decomposition this way elides the fact that the descriptions of the O*NET tasks already assume your unit of labor is fairly general. Reading many of these, I actually feel pretty unsure about the level of generality or common-sense reasoning required for an AI to straightforwardly replace that part of a human's job. Presumably there's some restructure that would still squeeze a lot of economic value out of narrow AIs that could basically do these things, but that restructure isn't captured looking at the list of present-day O*NET tasks.

It’ll be interesting to see how well companies will be able to monetise large, multi-purpose language and image-generation models.

Companies and investors are spending increasingly huge amounts of money on ML research talent and compute, typically with the hope that investments in this area lead to extremely profitable products. But - even if the resulting products are very useful and transformative - it still seems like it's still a bit of an open question how profitable they’ll be.

Some analysis:[1]

1.

Although huge state-of-the-art models are increasingly costly to create, the marginal cost of generating images and text using these models will tend to be low. Since competition tends to push the price of a service down close to the marginal cost of providing the service, it’ll be hard for any company to charge a lot for the use of their models.

As a result: It could be hard -- or simply take a long time -- for companies to recoup sufficiently large R&D costs, even a lot of people end up using their models.

2.

Of course, famously, this dynamic applies to most software. But some software services (e.g. Microsoft Office) still manage to charge users fees that are much higher than the cost of running the service.

Some things that can support important, persistent quality differences are:[2]

(a) patents or very-hard-to-learn-or-rediscover trade secrets that prevent competitors from copying valuable features;

(b) network effects that make the service more valuable the more other people are using it (and therefore create serious challenges for new entrants);

(c) steep learning curves or strong dependencies that, for reasons that go beyond network effects, make it very costly for existing users to switch to new software;

(d) customer bases that are small (which limit the value of trying to enter the area) or hard to cater to without lots of specialist knowledge and strong relationships (which raise the cost/difficulty of entering the area);

(e) other extreme difficulties involved in making the software;

(f) customer bases that are highly sensitive to very small differences in quality

3.

It’s not totally clear to what extent any of these conditions will apply, at least, to large language and image-generation models.

Patents in this space currently don’t seem like a huge deal; none of the key things that matter for making large models are patented. (Although maybe that will start to change?). Trade secrets also don’t seem like a huge deal; lots of companies are currently producing pretty similar models using pretty similar methods.

It’s also not clear to me that network effects or steep learning curves are a big deal: for example, if you want to generate illustrations for online articles, it doesn’t currently seem like it’d be too problematic or costly to switch from using DALL-E to using Imagen. It’s also not clear that it matters very much how many other people are using one or the other to generate their images. If you want to generate an illustration for an article, then I think it probably doesn’t really matter what other the authors of other articles tend to use. It’s also not clear to me that there will tend to be a lot of downstream dependencies that you need to take into account when switching from one model to another. (Although, again, maybe this will all change a lot over time?) At least big general models will tend to have large customer bases, I think, and their development/marketing/customer-support will not tend to heavily leverage specialist knowledge or relationships.

These models also don’t seem super, super hard to make — it seems like, for any given quality threshold, we should be expect multiple companies to be able to reach that quality threshold within a year of each other. To some extent, it seems, a wealthy tech company can throw money at the problem (compute and salaries for engineering talent) if it wants to create a model that's close to as good as the best model available. At least beyond a certain performance level, I’m also not sure that most customers will care a ton about very slight differences in quality (although this might be the point I’m most unsure about).

4.

If none of the above conditions end up holding, to a sufficiently large extent, then I suppose the standard thing is you offer the service for free, try to make it at least slightly better than competitors’ services, and make money by showing ads (up to the point where it’s actively annoying to your user) and otherwise using customer data in various ways.

That can definitely work. (E.g. Google’s annual ad revenue is more than $100 billion.) But it also seem like a lot of idiosyncratic factors can influence a company’s ability to extract ad revenue from a service.

This also seems like, in some ways, a funny outcome. When people think about transformative AI, I don't think they're normally imagining it being attached to a giant advertising machine.

5.

One weird possible world is a world where the most important AI software is actually very hard to monetize. Although I'd still overall bet against this scenario[3], I think it's probably worth analyzing.

Here, I think, are some dynamics that could emerge in this world:

(a) AI progress is a bit slower than it would otherwise be, since - after a certain point - companies realise that the financial returns on AI R&D are lower than they hoped. The rapid R&D growth in these areas eventually levels off, even though higher R&D levels could support substantially faster AI progress.[4]

(b) Un-monetized (e.g. academia-associated) models are pretty commonly used, at least as foundation models, since companies don’t have strong incentives to invest in offering superior monetized models.

(c) Governments become bigger players in driving AI progress forward, since companies are investing less in AI R&D than governments would ideally want (from the standpoint of growth, national power and prestige, or scientific progress for its own sake). Governments might step up their own research funding - or take various actions to limit downward pressure on prices.


  1. I'm not widely read here, or an economist, so it's possible these points are all already appreciated within the community of people thinking about inter-lab competition to create large models is going to play out. Alternatively, the points might just be wrong. ↩︎

  2. This list here is mostly inspired by my dim memory of the discussion of software pricing in the book Information Rules. ↩︎

  3. Companies do seem to have an impressive track of monetizing seemingly hard-to-monetize things. ↩︎

  4. Maybe competition also shifts a bit toward goods/services that complement large many-purpose models, whatever these might be, or toward fine-tuned/specialized models that target more niche customer-bases or that are otherwise subject to less-intense downward pressure on pricing. ↩︎

This is insightful.  Some quick responses:

  • My guess would be that the ability to commercialize these models would strongly hinge on the ability for firms to wrap these up with complementary products, that would contribute to an ecosystem with network effects, dependencies, evangelism, etc.
  • I wouldn't draw too strong conclusions from the fact that the few early attempts to commercialize models like these, notably by OpenAI, haven't succeeded in creating the preconditions for generating a permenant stream of profits. I'd guess that their business models look less-than-promising on this dimension because (and this is just my impression) they've been trying to find product-market-fit, and have gone lightly on exploiting particular fits they found by building platforms to service these
  • Instead, better examples of what commercialization looks like are GPT-3-powered companies, like copysmith, which seem a lot more like traditional software businesses with the usual tactics for locking users in, and creating network effects and single-homing behaviour
  • I expect that companies will have ways to create switching costs for these models that traditional software product don't have. I'm particularly interested in fine-tuning as a way to lock-in users by enabling models to strongly adapt to context about the users' workloads. More intense versions of this might also exist, such as learning directly from individual customer's feedback through something like RL. Note that this is actually quite similar to how non-software services create loyalty

I agree that it seems hard to commercialize these models out-of-the-box with something like paid API access, but I expect that to be superseded by better strategies pretty soon. 

Couldn't the exact same arguments be made to argue that there would not be successful internet companies, because the fundamental tech is hard to patent, and any website is easy to duplicate? But this just means that instead of monetising the bottom layer of tech (TCP/IP, or whatever), they make their billions from layering needed stuff on top - search, social network, logistics.

Couldn't the exact same arguments be made to argue that there would not be successful internet companies, because the fundamental tech is hard to patent, and any website is easy to duplicate?

Definitely!

(I say above that the dynamic applies to "most software," but should have said something broader to make it clear that it also applies to any company whose product - basically - is information that it's close to costless to reproduce/generate. The book Information Rules is really good on this.)

Sometimes the above conditions hold well enough for people to be able to keep charging for software or access to websites. For example, LinkedIn can charge employers to access its specialized search tools, etc., due to network effects.

What otherwise often ends up happening is something is offered for free, with ads -- because there's some quality difference between products, which is too small for people to be willing to pay to use the better product but large enough for people to be willing to look at sufficiently non-annoying ads to use the better product. (E.g. Google vs. the next-best search engine, for most people.) Sometimes that can still lead to a lot of revenue, other times less so.

Other times companies just stop very seriously trying to directly make money in a certain domain (e.g. online encyclopaedias). Sometimes - as you say - that leads competition to shift to some nearby and complementary domain, where it is more possible to make money.

As initial speculation: It seems decently likely to me (~60%?) that it will be hard for companies making large language/image-generation models to charge significant prices to most of their users. In that scenario, it's presumably still possible to make money through ads or otherwise by collecting user information.

It'd be interesting, though, if that revenue wasn't very high -- then most of the competition might happen around complementary products/services. I'm not totally clear on what these would be, though.

  1. Ah, you do say that. Serves me right for skimming!
  2. To start, you could have a company for each domain area that an AI needs to be fine-tuned, marketed, and adapted to meet any regulatory requirements. Writing advertising copy, editing, insurance evaluations, etc.
  3. As for the foundation models themselves, I think training models is too expensive to go back to academia as you suggest. And I think that there are some barriers to getting priced down. Firstly, when you say you need "patents or very-hard-to-learn-or-rediscover trade secrets ", does the cost of training the model not count? It is a huge barrier. There are also difficulties in acquiring AI talent. And future patents seem likely. We're already seeing a huge shift with AI researchers leaving big tech for startups, to try to capture more of the value of their work, and this shift could go a lot further.

Relevant: " A reminder than OpenAI claims ownership of any image generated by DALL-E2" - https://mobile.twitter.com/mark_riedl/status/1533776806133780481

related: Imagen replicating DALL-E very well, seems like good evidence that there's healthy competition between big tech companies, which drives down profits.

One thing that might push against this are economies of scope and if data really does become the new oil and become more relevant over time.

This was excellent!

Some thoughts on risks from unsafe technologies:

It’s hard for the development of an unsafe technology to make the world much worse, in expectation, if safety failures primarily affect the technology’s users.

For example: If the risk of dying in a plane crash outweighs the value of flying, too badly, then people won’t fly. If the risk of dying doesn’t outweigh the benefit, then people will fly, and they’ll be (on average) better off despite occasionally dying. Either way, planes don’t make the world worse.

For an unsafe technology to make the world much worse, the risk from accidents will typically need to fall primarily on non-users. Unsafe technologies that primarily harm non-users (e.g. viruses that can escape labs) are importantly different than unsafe technologies that primarily harm users (e.g. bridges that might collapse). Negative externalities are essential to the story.

Overall, though, I tend to worry less about negative externalities from safety failures than I do about negative externalities from properly functioning technologies. Externalities from safety failures grow the more unsafe the technology is; but, the more unsafe the technology is, the less incentive anyone has to develop or use it. Eliminating safety-related externalities is also largely an engineering problem, that everyone has some incentive to solve. We therefore shouldn’t expect these externalities to stick around forever — unless we lose our ability to modify the technology (e.g. because we all die) early on. On the other hand, if the technology produces massive negative externalities even when it works perfectly, it's easier to understand how its development could make the world badly and lastingly worse.

the more unsafe the technology is, the less incentive anyone has to develop or use it

That seems correct all else equal. However, it can be outweighed by actors seeking relative gains or other competitive pressures. And my impression is this is a key premise in some typical arguments for why AI risk is large.

Schlosser's Command and Control has some instructive examples from nuclear policy (which I think you're aware of, so describing them mostly for the benefit of other readers) where e.g. US policymakers were explicitly trading off accident risk with military capabilities when deciding if/how many bombers with nuclear weapons to have patrolling in the air.

And indeed several bombers with nuclear weapons crashed, e.g. 1968 over Greenland, though no nuclear detonation resulted. This is also an example where external parties for a while were kind of screwed. Yes, Denmark had an incentive to reduce safety risks from US bombers flying over their territory; but they didn't have the technical capabilities to develop less risky substitutes, and political defenses like the nuclear-free zone they declared were just violated by the US.

Tbc, I do agree all your points are correct in principle. E.g. in this example, the US did have an incentive to reduce safety risks, and since none of the accidents were "fatal" to the US they did eventually replace nuclear weapons flying around with better ICBMs, submarines etc. I still feel like your take sounds too optimistic once one takes competitive dynamics into account.

--

As an aside, I'm not sure I agree that reducing safety-related externalities is largely an engineering problem, unless we include social engineering. Things like organizational culture, checklists, maintenance policies, risk assessments, etc., also seem quite important to me. (Or in the nuclear policy example even things like arms control, geopolitics, ...)

As an aside, I'm not sure I agree that reducing safety-related externalities is largely an engineering problem, unless we include social engineering. Things like organizational culture, checklists, maintenance policies, risk assessments, etc., also seem quite important to me. (Or in the nuclear policy example even things like arms control, geopolitics, ...)

I think this depends a bit what class of safety issues we're thinking about. For example, a properly functioning nuke is meant to explode and kills loads of people. A lot of nuclear safety issues are then borderline misuse issues: people deciding to use them when really they shouldn't, for instance due to misinterpretations of others' actions. Many other technological 'accident risks' are less social, although never entirely non-social (e.g. even in the case of bridge safety, you still need to trust some organization to do maintenance/testing properly.)

That seems correct all else equal. However, it can be outweighed by actors seeking relative gains or other competitive pressures.

I definitely don't want to deny that actors can sometimes have incentives to use badly world-worseningly unsafe technologies. But you do need the right balance of conditions to hold: individual units of the technology need to offer their users large enough benefits and small enough personal safety risks, need to create large enough external safety risks, and need to have safety levels that increase slowly enough over time.

Weapons of mass destruction are sort of special in this regard. They can in some cases have exceptionally high value to their users (deterring or preventing invasion), which makes them willing to bear unusually high risks. Since their purpose is to kill huge numbers of people on very short notice, there's naturally a risk of them killing huge numbers of people (but under the wrong circumstances). This risk is also unusually hard to reduce over time, since it's often more about people making bad decisions than it is about the technology 'misbehaving' per se; there is also a natural trade-off between increasing readiness and decreasing the risk of bad usage decisions being made. The risk also naturally falls very heavily on other actors (since the technology is meant to harm other actors).

I do generally find it easiest to understand how AI safety issues could make the world permanently worse when I imagine superweapon/WMD-like systems (of the sort that also seem to be imagined in work like "Racing to the Precicipe"). I think existential safety risks become a much harder sell, though, if we're primarily imagining non-superweapon applications and distributed/gradual/what-failure-looks-like-style scenarios.

I also think it's worth noting that, on an annual basis, even nukes don't have a super high chance of producing global catastrophes through accidental use; if you have a high enough discount rate, and you buy the theory that they substantially reduce the risk of great power war, then it's even possible (maybe not likely) that their existence is currently positive EV by non-longtermist lights.

But you do need the right balance of conditions to hold: individual units of the technology need to offer their users large enough benefits and small enough personal safety risks, need to create large enough external safety risks, and need to have safety levels that increase slowly enough over time.
Weapons of mass destruction are sort of special in this regard. [...]
[...] I think existential safety risks become a much harder sell, though, if we're primarily imagining non-superweapon applications and distributed/gradual/what-failure-looks-like-style scenarios.

Yes, my guess is we broadly agree about all of this.

I also think it's worth noting that, on an annual basis, even nukes don't have a super high chance of producing global catastrophes through accidental use; if you have a high enough discount rate, and you buy the theory that they substantially reduce the risk of great power war, then it's even possible (maybe not likely) that their existence is currently positive EV by non-longtermist lights.

This also sounds right to me. FWIW, it's not even obvious to me if nukes are negative-EV by longtermist lights. Since nuclear winter seems unlikely to cause immediate extinction this depends on messy questions such as how the EV of trajectory changes from conventional great power war compares to the EV of trajectory changes from nuclear winter scenarios.

I think this depends a bit what class of safety issues we're thinking about. [...] Many other technological 'accident risks' are less social, although never entirely non-social (e.g. even in the case of bridge safety, you still need to trust some organization to do maintenance/testing properly.)

I'm not sure I agree with this. While they haven't been selected to be representative, the sense I got from the accident case studies I've read (e.g. Chernobyl, nuclear weapons accidents, and various cases from the books Flirting With Disaster and Warnings) is that the social component was quite substantial. It seems to me that usually either better engineering (though sometimes this wasn't possible) or better social management of dealing with engineering limitations (usually possible) could have avoided these accidents. It makes a lot of sense to me that some people prefer to talk of "sociotechnical systems".

(Disclaimer: The argument I make in this short-form feels I little sophistic to me. I’m not sure I endorse it.)

Discussions of AI risk, particular risks from “inner misalignment,” sometimes heavily emphasize the following observation:

Humans don’t just care about their genes: Genes determine, to a large extent, how people behave. Some genes are preserved from generation-to-generation and some are pushed out of the gene-pool. Genes that cause certain human behaviours (e.g. not setting yourself on fire) are more likely to be preserved. But people don’t care very much about preserving their genes. For example, they typically care more about not setting themselves on fire than they care about making sure that their genes are still present in future generations.

This observation is normally meant to be alarming. And I do see some intuition for that.

But wouldn’t the alternative observation be more alarming?

Suppose that evolutionary selection processes — which iteratively update people’s genes, based on the behaviour these genes produce — tended to produce people who only care about preserving their genes. It seems like that observation would suggest that ML training processes — which iterative update a network’s parameter values, based on the behaviour these parameter values produce — will tend to produce AI systems that only care about preserving their parameter values. And that would be really concerning, since an AI system that cares only about preserving its parameter values would obviously have (instrumentally convergent) reasons to act badly.

So it does seem, to me, like there’s something funny going on here. If “Humans just care about their genes” would be a more worrying observation than “Humans don’t just care about their genes,” then it seems backward for the latter observation to be used to try to convince people to worry more.

To push this line of thought further, let’s go back to specific observation about humans’ relationship to setting themselves on fire:

Human want to avoid setting themselves on fire: If a person has genes that cause them to avoid setting themselves on fire, then these genes are more likely to be preserved from one generation to the next. One thing that has happened, as a result of this selection pressure, is that people tend to want to avoid setting themselves on fire.

It seems like this can be interpreted as a reassuring observation. By analogy, in future ML training processes, parameter values that cause ML systems to avoid acts of violence are more likely to be “preserved” from one iteration to the next. We want this to result in AI systems that care about avoiding acts of violence. And the case of humans and fire suggests this might naturally happen.

All this being said, I do think that human evolutionary history still gives us reason to worry. Clearly, there’s a lot of apparent randomness and unpredictability in what humans have actually ended up caring about, which suggests it may be hard to predict or perfectly determine what AI systems care about. But, I think, the specific observation “Humans don’t just care about their genes” might not itself be cause for concern.

The actual worry with inner misalignment style concerns is that the selection you do during training does not fully constrain the goals of the AI system you get out; if there are multiple goals consistent with the selection you applied during training there's no particular reason to expect any particular one of them. Importantly, when you are using natural selection or gradient descent, the constraints are not "you must optimize X goal", the constraints are "in Y situations you must behave in Z ways", which doesn't constrain how you behave in totally different situations. What you get out depends on the inductive biases of your learning system (including e.g. what's "simpler").

For example, you train your system to answer truthfully in situations where we know the answer. This could get you an AI system that is truthful... or an AI system that answers truthfully when we know the answer, but lies to us when we don't know the answer in service of making paperclips. (ELK tries to deal with this setting.)

When I apply this point of view to the evolution analogy it dissolves the question / paradox you've listed above. Given the actual ancestral environment and the selection pressures present there, organisms that maximized "reproductive fitness" or "tiling the universe with their DNA" or "maximizing sex between non-sterile, non-pregnant opposite-sex pairs" would all have done well there (I'm sure this is somehow somewhat wrong but clearly in principle there's a version that's right), so who knows which of those things you get. In practice you don't even get organisms that are maximizing anything, because they aren't particularly goal-directed, and instead are adaption-executers rather than fitness-maximizers.

I do think that once you inhabit this way of thinking about it, the evolution example doesn't really matter any more; the argument itself very loudly says "you don't know what you're going to get out; there are tons of possibilities that are not what you wanted", which is the alarming part. I suppose in theory someone could think that the "simplest" one is going to be whatever we wanted in the first place, and so we're okay, and the evolution analogy is a good counterexample to that view?


It turns out that people really really like thinking of training schemes as "optimizing for a goal". I think this is basically wrong -- is CoinRun training optimizing for "get the coin" or "get to the end of the level"? What would be the difference? Selection pressures seem much better as a picture of what's going on.

But when you communicate with people it helps to show how your beliefs connect into their existing way of thinking about things. So instead of talking about how selection pressures from training algorithms and how they do not uniquely constrain the system you get out, we instead talk about how the "behavioral objective" might be different from the "training objective", and use the evolution analogy as an example that fits neatly into this schema given the way people are already thinking about these things.

(To be clear a lot of AI safety people, probably a majority, do in fact think about this from an "objective-first" way of thinking, rather than based on selection, this isn't just about AI safety people communicating with other people.)

The actual worry with inner misalignment style concerns is that the selection you do during training does not fully constrain the goals of the AI system you get out; if there are multiple goals consistent with the selection you applied during training there's no particular reason to expect any particular one of them. Importantly, when you are using natural selection or gradient descent, the constraints are not "you must optimize X goal", the constraints are "in Y situations you must behave in Z ways", which doesn't constrain how you behave in totally different situations. What you get out depends on the inductive biases of your learning system (including e.g. what's "simpler").

I think that's well-put -- and I generally agree that this suggests genuine reason for concern.

I suppose my point is more narrow, really just questioning whether the observation "humans care about things besides their genes" gives us any additional reason for concern. Some presentations seem to suggest it does. For example, this introduction to inner alignment concerns (based on the MIRI mesa-optimization paper) says:

We can see that humans are not aligned with the base objective of evolution [maximize inclusive genetic fitness].... [This] analogy might be an argument for why Inner Misalignment is probable since it has occurred "naturally" in the biggest non-human-caused optimization process we know.

And I want to say: "On net, if humans did only care about maximizing inclusive genetic fitness, that would probably be a reason to become more concerned (rather than less concerned) that ML systems will generalize in dangerous ways." While the abstract argument makes sense, I think this specific observation isn't evidence of risk.


Relatedly, something I'd be interested in reading (if it doesn't already exist?) would be a piece that takes a broader approach to drawing lessons from the evolution of human goals - rather than stopping at the fact that humans care about things besides genetic fitness.

My guess is that the case of humans is overall a little reassuring (relative to how we might have expected generalization to work), while still leaving a lot of room for worry.

For example, in the case of violence:

People who committed totally random acts of violence presumably often failed to pass on their genes (because they were often killed or ostracized in return). However, a large portion of our ancestors did have occasion for violence. On high-end estimates, our average ancestor may have killed about .25 people. This has resulted in most people having a pretty strong disinclination to commit murder; for most people, it's very hard to bring yourself to murder and you'll often be willing to pay a big cost to avoid committing murder.

The three main reasons for concern, though, are:

  • people's desire to avoid murder isn't strong enough to consistently prevent murder from happening (e.g. when incentives are strong enough)

  • there's a decent amount of random variation in how strong this desire is (a small minority of people don't really care that much about committing violence)

  • the disinclination to murder becomes weaker the more different the method of murder is from methods that were available in the ancestral environment (e.g. killing someone with a drone strike vs. killing someone with a rock)

These issues might just reflect the fact that murder was still often rewarded (even though it was typically punished) and the fact that there was pretty limited variation in the ancestral environment. But it's hard to be sure. And it's hard to know, in any case, how similar generalization in human evolution will be to generalization in ML training processes.

So -- if we want to create AI systems that don't murder people, by rewarding non-murderous behavior --then the evidence from human evolution seems like it might be medium-reassuring. I'd maybe give it a B-.

I can definitely imagine different versions of human values that would have more worrying implications. For example, if our aversion to violence didn't generalize at all to modern methods of killing, or if we simply didn't have any intrinsic aversion to killing (and instead avoided it for purely instrumental reasons), then that would be cause for greater concern. I can also imagine different versions of human values that would be more reassuring. For example, I would feel more comfortable if humans were never willing to kill for the sake of weird abstract goals.

I suppose my point is more narrow, really just questioning whether the observation "humans care about things besides their genes" gives us any additional reason for concern.

I mostly go ¯\_(ツ)_/¯ , it doesn't feel like it's much evidence of anything, after you've updated off the abstract argument. The actual situation we face will be so different (primarily, we're actually trying to deal with the alignment problem, unlike evolution).

I do agree that in saying " ¯\_(ツ)_/¯  " I am disagreeing with a bunch of claims that say "evolution example implies misalignment is probable". I am unclear to what extent people actually believe such a claim vs. use it as a communication strategy. (The author of the linked post states some uncertainty but presumably does believe something similar to that; I disagree with them if so.)

Relatedly, something I'd be interested in reading (if it doesn't already exist?) would be a piece that takes a broader approach to drawing lessons from the evolution of human goals - rather than stopping at the fact that humans care about things besides genetic fitness.

I like the general idea but the way I'd do it is by doing some black-box investigation of current language models and asking these questions there; I expect we understand the "ancestral environment" of a language model way, way better than we understand the ancestral environment for humans, making it a lot easier to draw conclusions; you could also finetune the language models in order to simulate an "ancestral environment" of your choice and see what happens then.

So -- if we want to create AI systems that don't murder people, by rewarding non-murderous behavior --then the evidence from human evolution seems like it might be medium-reassuring. I'd maybe give it a B-.

I agree with the murder example being a tiny bit reassuring for training non-murderous AIs; medium-reassuring is probably too much, unless we're expecting our AI systems to be put into the same sorts of situations / ancestral environments as humans were in. (Note that to be the "same sort of situation" it also needs to have the same sort of inputs as humans, e.g. vision + sound + some sort of controllable physical body seems important.)