Jim Buhler

Research fellow @ Existential Risk Alliance
245 karmaJoined Sep 2020London, UK



I'm researching how to predict what values might control the future (if any), ways to estimate the expected value/sign of human expansion, and cooperation and conflict in non-causal contexts (particularly between AI systems).

Interests of mine include GPR, AGI governance, malevolent actors, as well as CLR's research agenda on cooperative AI and S-risks.

I used to be EA France's community director and am still doing some event management there.

I've also recently completed a Master's degree in ethics.

I've also written some stuff on the LessWrong.

You can give me anonymous feedback here. :)


What values will control the Future?


Topic Contributions

Right so assuming no early value lock-in and the values of the AGI being (at least somewhat) controlled/influenced by its creators, I imagine these creators to have values that are more or less grabby, and these values are competing against one another in the big tournament that is cultural evolution.

For simplicity, say there are only two types of creators: the pure grabbers (who value grabbing (quasi-)intrinsically) and the safe grabbers (who are in favor of grabbing only if it is done in a "safe" way, whatever that means).

Since we're assuming there hasn't been any early value lock-in, the AGI isn't committed to some form of compromise between the values of the pure and safe grabbers. Therefore, you can imagine that the AGI allows for competition and helps both groups accomplish what they want proportionally to their size, or something like that. From there, I see two plausible scenarios:
A) The pure and safe grabbers are two cleanly separated groups running a space expansion race against one another, and we should -- all else equal -- expect the pure grabbers to win, for the same reasons why we should -- all else equal -- expect the AGI race to be won by the labs optimizing for AI capabilities rather than for AI safety.
B) The safe grabbers "infiltrate" the pure grabbers in an attempt to make their space-expansion efforts "safer", but are progressively selected against since they drag the pure-grabby project down. The few safe grabbers who might manage not to value drift and not to get kicked out of the pure grabbers are those who are complacent and not pushing really hard for more safety.

The reason why the intra-civ grabby values selection is currently fairly weak on Earth, as you point out, is that humans didn't even start colonizing space, which makes something like A or B very unlikely to have happened yet. Arguably, the process that may eventually lead to something like A or B hasn't even begun for real. We're unlikely to notice a selection for grabby values before people actually start running something like a space expansion race. And most of those we might expect to want to somehow get involved in the potential[1] space expansion race are currently focused on the race to AGI, which makes sense. It seems like this latter race is more relevant/pressing, right now.

  1. ^

    It seems like this race will happen (or actually be worth running) if, and only if, AGI has non-locked-in values and is corrigible(-ish) and aligned(-ish) with its creators, as we suggested.

Thanks a lot for this comment! I linked to it in a footnote. I really like this breakdown of different types of relevant evolutionary dynamics. :)

Thanks for the comment! :) You're assuming that the AGI's values will be pretty much locked-in forever once it is deployed such that the evolution of values will stop, right? Assuming this, I agree. But I can also imagine worlds where the AGI is made very corrigible (such that the overseers stay in control of the AGI's values) and where intra-civ value evolution continues/accelerates. I'd be curious if you see reasons to think these worlds are unlikely.

If you had to remake this 3D sim of the expansion of grabby aliens based on your beliefs, what would look different, exactly? (Sorry, I know you already answer this indirectly throughout the post, at least partially.)

Do you have any reading to suggest on that topic? I'd  be curious to understand that position more :)

Insightful! Thanks for taking the time to write these.

failing to act in perfect accord with the moral truth does not mean you're not influenced by it at all. Humans fail your conditions 4-7 and yet are occasionally influenced by moral facts in ways that matter.

Agreed and I didn't mean to argue against that so thanks for clarifying! Note however that the more you expect the moral truth to be fragile/complex, the further from it you should expect agents' actions to be.

you expect intense selection within civilizations, such that their members behave so as to maximize their own reproductive success.

Hum... I don't think the "such that..." part logically follows. I don't think this is how selection effects work. All I'm saying is that those who are the most bullish on space colonization will colonize more space.

I'm not sure what to say regarding your last two points. I think I need to think/read more, here. Thanks :)

Very interesting, Wei! Thanks a lot for the comment and the links. 

TL;DR of my response: Your argument assumes that the first two conditions I list are met by default, which is I think a strong assumption (Part 1). Assuming that is the case, however, your point suggests there might be a selection effect favoring agents that act in accordance with the moral truth, which might be stronger than the selection effect I depict for values that are more expansion-conducive than the moral truth. This is something I haven't seriously considered and this made me update! Nonetheless, for your argument to be valid and strong, the orthogonality thesis has to be almost completely false, and I think we need more solid evidence to challenge that thesis (Part 2).

Part 1: Strong assumption

This came about because there are facts about what preferences one should have, just like there exist facts about what decision theory one should use or what prior one should have, and species that manage to build intergalactic civilizations (or the equivalent in other universes) tend to discover all of these facts.

My understanding is that this scenario says the seven conditions I listed are met because it is actually trivial for a super-capable intergalactic civilization to meet those (or even required for it to become intergalactic in the first place, as you suggest later).

I think this is plausible for the following conditions:

  • #3 They find something they recognize as a moral truth.
  • #4 They (unconditionally) accept it, even if it is highly counterintuitive.
  • #5 The thing they found is actually the moral truth. No normative mistake.
  • #6 They succeed at acting in accordance with it. No practical mistake.
  • #7 They stick to this forever. No value drift.

You might indeed expect that the most powerful civs figure out how to overcome these challenges, and that those who don't are left behind.[1] This is something I haven't seriously considered before, so thanks!

However, recall the first two conditions:

  1. There is a moral truth. 
  2. It is possible to “find it” and recognize it as such. 

How capable a civilization is doesn't matter when it comes to how likely these two are to be met. And while most metaethical debates focus only on 1, saying 1 is true is a much weaker claim than saying 1&2 is true (see, e.g., the naturalism vs non-naturalism controversy, which is I think only one piece of the puzzle).

Part 2: Challenging the orthogonality thesis

Then, you say that in this scenario you depict

There are occasional paperclip maximizers that arise, but they are a relatively minor presence or tend to be taken over by more sophisticated minds.

Maybe, but what I argue I that they are (occasional) "sophisticated minds" with values that are more expansion-conducive than the (potential) moral truth (e.g., because they have simple unconstrained goals such as "let's just maximize for more life" or "for expansion itself"), and that they're the ones who tend to take over.

But then you make this claim, which, if true, seems to sort of debunk my argument:

you can't become a truly powerful civilization without being able to "do philosophy" and be generally motivated by the results.

(Given the context in your comment, I assume that by "being able to do philosophy", you mean "being able to do things like finding the moral truth".) 

But I don't think this claim is true.[1] However, you made me update and I might update more once I read the posts of yours that you linked! :)

  1. ^

    I remain skeptical because this would imply the orthogonality thesis is almost completely false. Assuming there is a moral truth and that it is possible to "find" it and recognize it as such, I tentatively still believe that extremely powerful agents/civs with motivations misaligned with the moral truth are very plausible and not rare. You can at least imagine scenarios where they started aligned but then value drifted (without that making them significantly less powerful).

Unfortunately we are unable to sponsor visas, so applicants must be eligible to work in the US.

Isn't it possible to simply contract (rather than employ) those who have or can get an ESTA, such that there's no need for a visa?

Answer by Jim BuhlerMar 27, 202342

As far as I know, there are no estimates (at least not public ones). But as Stan pointed out, Tobias Baumann has raised some very relevant considerations in different posts/podcasts.

Fwiw, researchers at the Center on Long-Term Risk think AGI conflict is the most concerning s-risk (see Clifton 2019), although it may be hard to comprehend all the details of why they think that if you just read their posts and don't talk to them.

Load more