Ryan Greenblatt

Member of Technical Staff @ Redwood Research
433 karmaJoined Sep 2022


This other Ryan Greenblatt is my old account[1]. Here is my LW account.

  1. ^

    Account lost to the mists of time and expired university email addresses.


Topic contributions

Where the main counterargument is that now the groups in power can be immortal and digital minds will be possible.

See also: AGI and Lock-in

My views are reasonably messy, complicated, hard to articulate, and based on a relatively diffuse set of intuitions. I think we also reason in a pretty different way about the situation than you seem to (3). I think it wouldn't be impossible to try to write up a post on my views, but I would need to consolidate and think about how exactly to express where I'm at. (Maybe 2-5 person days of work.) I haven't really consolidated my views or something close to reflective equilibrium.

I also just that arguing about pure philosophy very rarely gets anywhere and is very hard to make convincing in general.

I'm somewhat uncertain on the "inside view/mechanistic" level. (But my all considered view is partially defering to some people which makes me overall less worried that I should immediately reconsider my life choices.)

I think my views are compelling, but I'm not sure if I'd say "very compelling"

I'm in agreement that this consideration makes it hard to do a direct comparison. But I think this consideration should mostly make us more uncertain, rather than making us think that humans are better than the alternative.

Actually, I was just trying to say "I can see what humans are like, and it seems pretty good relative to me current guesses about AIs in ways that dont just update me up about AIs" sorry about the confusion.

Currently, humans seem much closer to me in a values level than GPT-4 base. I think this is also likely to be true of future AIs, though I understand why you might not find this convincing.

I think the architecture (learning algorithm, etc.) and training environment between me and other humans seems vastly more similar than between me and likely AIs.

I don't think I'm going to flesh this argument out to an extent to which you'd find it sufficiently rigorous or convincing, sorry.

You should compare against human nature, which was optimized for something quite different from utilitarianism. If I add up the pros and cons of the thing humans were optimized for and compare it against the thing AIs will be optimized for, I don't see why it comes out with humans on top, from a utilitarian perspective. Can you elaborate on your reasoning here?

I can't quickly elaborate in a clear way, but some messy combination of:

  • I can currently observe humans which screens off a bunch of the comparison and let's me do direct analysis.
  • I can directly observe AIs and make predictions of future training methods and their values seem to result from a much more heavily optimized and precise thing with less "slack" in some sense. (Perhaps this is related to genetic bottleneck, I'm unsure.)
  • AIs will be primarily trained in things which look extremely different from "cooperatively achieving high genetic fitness".
  • Current AIs seem to use the vast, vast majority of their reasoning power for purposes which aren't directly related to their final applications. I predict this will also apply for internal high level reasoning of AIs. This doesn't seem true for humans.
  • Humans seem optimized for something which isn't that far off from utilitarianism from some perspective? Make yourself survive, make your kin group survive, make your tribe survive, etc? I think utilitarianism is often a natural generalization of "I care about the experience of XYZ, it seems arbitrary/dumb/bad to draw the boundary narrowly, so I should extend this further" (This is how I get to utilitarianism.) I think the AI optimization looks considerably worse than this by default.

(Again, note that I said in my comment above: "Some of these can be defeated relatively easily if we train AIs specifically to be good successors, but the default AIs which end up with power over the future will not have this property." I edited this in to my prior comment, so you might have missed it, sorry.)

What are these a priori reasons and why don't they similarly apply to AI?

I am a human. Other humans might end up in a similar spot on reflection.

(Also I care less about values of mine which are highly contingent wrt humans.)

The ones I would say are something like (approximately in priority order):

  • AI's values could result mostly from playing the training game or other relatively specific optimizations they performed in training which might result in extremely bizarre values from our perspective.
    • More generally AI values might be highly alien in a way where caring about experience seems very strange to them.
  • AIs by default will be optimized for very specific commercial purposes with narrow specializations and a variety of hyperspecific heuristics and the resulting values and  generalizations of these will be problematic
  • I care ultimately about what I would think is good upon (vast amounts of) reflection and there are good a priori reasons to think this is similar to what other humans (who care about using vast amounts of compute) will end up thinking is good.
    • As a sub argument, I might care specifically about things which are much more specific than "lots of good diverse experience". And, divergences from what I care about (even conditioning on something roughly utilitarian) might result in massive discounts from my perspective.
    • I care less about my values and preferences in worlds where they seem relatively contingent, e.g. they aren't broadly shared on reflection by reasonable fractions of humanity.
  • AIs don't have a genetic bottleneck and thus can learn much more specific drives that perform well while evolution had to make values more discoverable and adaptable.
    • E.g. various things about empathy.
  • AIs might have extremely low levels of cognitive diversity in their training environments as far as co-workers go which might result in very different attitudes as far as caring about diverse experience. 

Some of these can be defeated relatively easily if we train AIs specifically to be good successors, but the default AIs which end up with power over the future will not have this property.

Also, I should note that this isn't a very strong list, though in aggregate it's sufficient to make me think that human control is perhaps 4x better than AIs. (For reference, I'd say that me personally being in control is maybe 3x better than human control.) I disagree with a MIRI style view about the disvalue of AI and the extent of fragility of value that seems implicit.

Another relevant consideration along these lines is that people who selfishly desire high wealth might mostly care about positional goods which are similar to current positional goods. Usage of these positional goods won't burn much of any compute (resources for potential minds) even if these positional goods become insanely valuable in terms of compute. E.g., land values of interesting places on earth might be insanely high and people might trade vast amounts of comptuation for this land, but ultimately, the computation will be spent on something else. 

why you care about the small fraction of resources spent on altruism

I'm also not sold it's that small.

Regardless, doesn't seem like we're making progresss here.

Load more