Superintelligent AI is necessary for an amazing future, but far from sufficient

So8res

Superintelligent AI is necessary for an amazing future, but far from sufficient

So8res

41 min readOct 31, 2022

Comments 4

Sorted by

New & upvoted

RobBensinger

An interface for registering your probabilities (or you can just say stuff in comments):

Charlie_Guthmann

I started filling this out and then stopped because I'm confused about this CEV and cosmopolitan value stuff and just generally what OP means by value. It's possible I'm confused because I missed something (I skimmed the post but read most of it). Questions that would help me answer the prediction's above.

What is the definition of value are we are supposed to be using (my current intuition is average CEV of humans)?
Was I meant to just answer the above question with my own values (or my CEV)?
Do other people feel like the above questions are invariant to the definition of value/ specific value of CEVs?
What is the definition of cosmopolitian value and how is it action relevant in all of this?

The stuff below is a bit rambly so apologies in advance.

I don't really get the purpose of CEV for this stuff or why it solves any deep problems of defining value. I definitely think we should reflect on our moral values and update on new information as it feels right to us. This doesn't mean we solved ethics. It also begs the question of whose CEV we are using? CEV is agent dependent, so we need to specficy how we weight the CEV's of all the agents we are taking into consideration. In any case, my main complaint is that if the answers to the above questions are at least in part a function of what our CEV is(or what definition of value we are use), then I feel like we are stacking two questions on top of each other and not necessarily leaving room to talk through cruxes of either.

Let's assume we are just taking the average CEV of human's alive today as our definition of value. Some vales might be more difficult to pull off then others, as they may trend further from what aliens want or just be harder to pull of in the context of the amount of shards we have. Plus like, I just assumed we are taking the average of human's CEVs but we don't know what political system we will have. Who's to say that just because we have the ASI and have an average CEV value the human's will agree to push towards this average CEV. I guess in short I feel like I'm guessing the CEV and how that achievable that CEV is.

I also don't really follow the cosmopolitan stuff. I have cosmopolitan intuitions but I'm unclear what the author is getting at with it. I have some vague sense that this is trying to address the fact that CEV gives special weight to agents that are alive now. Not really sure how to even express my confusion if I'm being honest.

That being said I loved this post. Lot's of information from disparate places put together. A summary could be nice, maybe i'll try to write one if no one else does.

RobBensinger

What is the definition of value are we are supposed to be using (my current intuition is average CEV of humans)?
Was I meant to just answer the above question with my own values (or my CEV)?

The OP defines Strong Utopia as "At least 95% of the future’s potential value is realized.", and then defines the other scenarios via various concrete scenarios that serve as benchmarks for how "good" the universe is.

CEV isn't mentioned at that part of the article, nor is any other account of what "good" and "value" mean, so IMO you should use your own conception of what it means for things to be good, valuable, etc. Which outcomes would actually be better or worse, by your own lights?

My own personal view is that CEV is a good way of hand-waving at "good" and "valuable", and I can say more about that if helpful. The main resource I'd recommend reading is https://arbital.com/p/cev/.

I don't know what you mean by the "average" CEV of humans. Eliezer's proposal on https://arbital.com/p/cev/ is to use all humans as the extrapolation base for CEV.

I predict that if you ran a CEV-ish process extrapolating from my brain, it would give the same ultimate answers as a CEV-ish process extrapolating from all humans' brains. (Among other things, because my brain would probably prefer to run a CEV that takes into account everyone else's brain-state too, and it can just go do that; and because the universe is way too abundant in resources and my selfish desires get saturated almost immediately, leaving the rest of the cosmic endowment for the welfare of other minds.)

What is the definition of cosmopolitian value and how is it action relevant in all of this?

The article for that is https://arbital.com/p/value_cosmopolitan/. "'Cosmopolitan', lit. "of the city of the cosmos", intuitively implies a very broad, embracing standpoint that is tolerant of other people (entities) and ways that may at first seem strange to us; trying to step out of our small, parochial, local standpoint and adopt a broader one."

E.g., someone who values all people on Earth is more cosmopolitan than someone who just cares about the people in their country. Someone who values digital minds is more cosmopolitan than someone who just cares about biological ones.

A lot of where this gets tricky (and where the post directs its focus) is that cosmopolitanism encourages you to tolerate (or even embrace) values diversity in many respects. But if you're maximally embracing of values diversity, then this seems to reduce to having no preferences or priorities at all -- anything goes. So there's then a question, "What would the ideal cosmopolitan value system say about various forms of values divergence?"

I don't really get the purpose of CEV for this stuff or why it solves any deep problems of defining value. I definitely think we should reflect on our moral values and update on new information as it feels right to us.

If you think CEV is an obvious and prosaic idea, then you probably understand CEV pretty well. :P It's not meant to be anything fancy or special; it's just meant to articulate in very broad terms the sort of thing we probably want to do in order to solve morality (in all the ways that are required for us to steer the future well).

CEV isn't a full specification of morality, but it's a simple, informal articulation of how humanity can attain such a specification (insofar as we need one in order to know what to do).

CEV is agent dependent, so we need to specficy how we weight the CEV's of all the agents we are taking into consideration.

Eliezer's proposal (which seems fine to me) is that we weight all humans equally. "CEV with all humans weighted equally" could then choose to defer to a different CEV that has some other, more sophisticated weighting; but weighting all humans equally at the outset seems fine to me, and has the advantage of being simple and less-likely-to-cause-controversy.

In any case, my main complaint is that if the answers to the above questions are at least in part a function of what our CEV is(or what definition of value we are use), then I feel like we are stacking two questions on top of each other and not necessarily leaving room to talk through cruxes of either.

I think I'm pretty used to this because I treat most moral questions as questions about CEV. So, e.g., if we're debating what the tax rate should be on luxury goods, I'm doing my best to estimate what my CEV would think about fairness, compassion, etc. when I make my decision.

Let's assume we are just taking the average CEV of human's alive today as our definition of value. Some vales might be more difficult to pull off then others, as they may trend further from what aliens want or just be harder to pull of in the context of the amount of shards we have.

average -> aggregate

shards -> shard

I think that we'll have way more cosmic resources than we know what to do with, in terms of maximizing the welfare of all currently-living humans. (Though there's still a question of what to do with the remaining resources -- e.g., creating new humans or other new minds to live cool lives.)

A summary could be nice, maybe i'll try to write one if no one else does.

People summarizing this sounds great to me! Among other things, it's a good way to check whether the post miscommunicated in some way. :)

RobBensinger

Nate evidently disagrees:

I’ll note in passing that the view I’m presenting here reflects a super low degree of cynicism relative to the surrounding memetic environment. I think the surrounding memetic environment says "humans left unstomped tend to create dystopias and/or kill themselves", whereas I'm like, "nah, you'd need somebody else to kill us; absent that, we'd probably do fine". (I am not a generic cynic!)

There basically aren't any natural threats that threaten all humans, once we've spread a bit through space. "Entropy" isn't really a threat, except as a stand-in for "we might not use our resources efficiently, resulting in waste". (Or I guess "we might not do due diligence in trying to discover novel physics that might grant us unlimited negentropy".)

Comments

RobBensinger

What is the definition of value are we are supposed to be using (my current intuition is average CEV of humans)?
Was I meant to just answer the above question with my own values (or my CEV)?

I don't know what you mean by the "average" CEV of humans. Eliezer's proposal on https://arbital.com/p/cev/ is to use all humans as the extrapolation base for CEV.

What is the definition of cosmopolitian value and how is it action relevant in all of this?

I don't really get the purpose of CEV for this stuff or why it solves any deep problems of defining value. I definitely think we should reflect on our moral values and update on new information as it feels right to us.

CEV isn't a full specification of morality, but it's a simple, informal articulation of how humanity can attain such a specification (insofar as we need one in order to know what to do).

CEV is agent dependent, so we need to specficy how we weight the CEV's of all the agents we are taking into consideration.

In any case, my main complaint is that if the answers to the above questions are at least in part a function of what our CEV is(or what definition of value we are use), then I feel like we are stacking two questions on top of each other and not necessarily leaving room to talk through cruxes of either.

Let's assume we are just taking the average CEV of human's alive today as our definition of value. Some vales might be more difficult to pull off then others, as they may trend further from what aliens want or just be harder to pull of in the context of the amount of shards we have.

average -> aggregate

shards -> shard

A summary could be nice, maybe i'll try to write one if no one else does.

People summarizing this sounds great to me! Among other things, it's a good way to check whether the post miscommunicated in some way. :)

^{^}

And my concept of “what makes life worth living” is very likely an impoverished one today, and a friendly superintelligence could guide us to discovering even cooler versions of things like “art” and “adventure”, transcending the visions of fun that humanity has considered to date. The limit of how good the universe could become, once humanity has matured and grown into its full potential, likely far surpasses what any human today can concretely imagine.

^{^}

I’ll flag that I do think that some people overestimate how “unimaginable” the future is likely to be, out of some sense of humility/modesty.

I think there's a decent chance that if you showed me the future I'd be like “ah, so that's what computronium looks like” or “so reversible computers wrapped around black holes did turn out to be best”, and that when you show me the experiences running on those computers, I'm like "neato, yeah, lots of minds having fun, I'm sure some of that stuff would look pretty fun to me if you decoded it". I wouldn’t expect to immediately understand everything going on, but I wouldn’t be surprised if I can piece together the broad strokes.

In that sense, I find it plausible that ~optimal futures will turn out to be familiar/recognizable/imaginable to a digital-era transhumanist in a way they wouldn't be to an ancient Roman. We really are better able to see the whole universe and its trajectory than they were.

To be clear, it's very plausible to me that it'll somehow be unrecognizable or shocking to me, as it would have been to an ancient Roman, at least on some axes. But it's not guaranteed, and we don't have to pretend that it's guaranteed in order to avoid insinuating that we're in a better epistemic position than people were in the past. We are in a better epistemic position than people were in the past!

There's a separate point about how much translation work you need to do before I recognize a particular arc of fun unfolding before me as something actually fun. On that point I’m like, "Yeah, I'm not going to recognize/understand my niece's generation's memes, never mind a posthuman’s varieties of happiness, without a lot more context (and plausibly a much bigger and deeply-changed mind)".

Separately, I don't want to make any claims about how hard and fast humanity becomes "strongly transhuman" / changes to using minds that would be unrecognizable (as humans) to the present. I'd be surprised if it were super-fast for everyone, and I'd be surprised if some humans' minds weren’t very different a thousand sidereal years post-singularity. But I have wide error bars.

^{^}

Provided that this turns out to be a good use of stellar resources. (I'm not confident one way or the other. E.g., I'm not confident that human-originated minds get relevantly more interesting/fun at Matrioshka-brain scales. Maybe we’ll learn that slapping on more matter at that scale lets you prove some more theorems or whatever, but isn’t the best way to convert negentropy into fun, compared to e.g. spending that compute on whole civilizations full of interacting and flourishing people who don't have star-sized brains.)

^{^}

A separate reason it’s a terrible idea to destroy ourselves is that, e.g., if the nearest aliens are 500 million years away then our death means that a ~500 million lightyear radius sphere of stellar fuel is going to be entirely wasted, instead of spent on rad stuff.

^{^}

As I’ll note later, this odds ratio is a result of giving 0.2x weight to “humans control the universe-shard”, 0.5x to “aliens control it”, and 0.3x to “unfriendly AI built by aliens controls it”. Rob rounded the resulting odds ratio in this table to 1 : 5 : 7 : 5 : 14 : 1 : ~0.

Also, as a general reminder: I’m giving my relatively off-the-cuff thoughts in this post, recognizing that I’ll probably recognize some of my numbers as inconsistent — or otherwise mistaken — if I reflect more. But absent more reflection, I don’t know which direction the inconsistencies would shake out.

^{^}

I’d have some inclination to go lower, but for the one evolved species we've seen seeming dead-set on destroying itself.

^{^}

Though another input to the value of the future, in this scenario, is “What happens to the places that the pilgrims had to leave behind until some pilgrim group hit upon a non-terrible organizational system?” Hopefully it’s not too terrible, but it’s hard to say with humans!

One note of optimism is that there’s likely to be a strong negative correlation (in this ~impossible hypothetical) between “how terrible is the civilization?” and “how interested is it in spreading to the stars, or spreading far?” Many ways of shutting down moral progress, robust civic debate, open exploration of ideas, etc. also cripple scientific and technological progress in various ways, or involve commitment to a backwards-looking ideology. It’s possible for the universe-shard to be colonized by Space Amish, but it’s a weirder hypothetical.

^{^}

Note that I’ll use phrasings like “there’s something it’s like to be them”, “they’re sentient”, and “they’re conscious” interchangeably in this post. (This is not intended to be a bold philosophical stance, but rather a flailing attempt to wave at properties of personhood that seem plausibly morally relevant.)

^{^}

Eliezer uses the term “outcome pump” to introduce a similar idea:

The Outcome Pump is not sentient. It contains a tiny time machine, which resets time unless a specified outcome occurs. For example, if you hooked up the Outcome Pump's sensors to a coin, and specified that the time machine should keep resetting until it sees the coin come up heads, and then you actually flipped the coin, you would see the coin come up heads. (The physicists say that any future in which a "reset" occurs is inconsistent, and therefore never happens in the first place - so you aren't actually killing any versions of yourself.)
Whatever proposition you can manage to input into the Outcome Pump, somehow happens, though not in a way that violates the laws of physics. If you try to input a proposition that's too unlikely, the time machine will suffer a spontaneous mechanical failure before that outcome ever occurs.

I think his example is underspecified, though. Suppose that you ask the outcome pump for paperclips, and physics says “sorry, this outcome is too improbable” and exhibits a mechanical failure. This would then mean that it’s true that the outcome pump outputting paperclips is “improbable”, which makes the hypothetical consistent. We need some way to resolve which internally-consistent set of physical laws compatible with this description (“make paperclips” or “don’t make paperclips”) actually occurs; the so-called "outcome pump" is not necessarily pumping the desired outcome.

Giving the time machine the ability to output a random sequence of actions addresses this problem: we can say that the machine only undergoes a mechanical failure if some large number (e.g., Graham’s number) of random action sequences all fail to produce the target outcome. We can then be confident that the outcome pump will eventually brute-force a solution, provided that one is physically possible.

Other examples of easily-understood non-conscious optimization processes that can achieve very impressive things include AIXI and natural selection. The AIXI example is made pedagogically complicated for present purposes, however, by the fact that AIXI’s hypothesis space contains many smaller conscious optimizers (that don't much matter to the point, but that might confuse those who can see that some hypotheses contain conscious reasoners and can't see their irrelevance to the point at hand); and the natural selection example is weakened by the fact that selection isn't a very powerful optimizer.

^{^}

A possible objection here is “Human emotional responses often cause us to get into violent conflicts in cases where this foreseeably isn’t worth it; why couldn’t aliens be the same?”. But “technology for widening the space of profitable trades” is in the end just another technology, and ambitious spacefaring species are likely to discover such tech for the same reason they’re likely to discover other tech that’s generally useful for getting more of what you want. Humans have certainly gotten better at this over time, and if we continue to advance our scientific understanding, we’re likely to get far better still.

^{^}

Like, we've seen that the seeds are there, and it would be pretty weird for us to go around uprooting seeds of value on a whim.

As a side-note: one of my hot takes about how morality shakes out is "we don't sacrifice anything (among the seeds of value)". Like, values like sadism and spite might be tricky to redeem, but if we do our job right I think we should end up finding a way to redeem them.

^{^}

Unless we’ve made some bargain across counterfactual worlds that justifies our offering this gift in our world. But there are friction costs to bargains, and my guess is that the way it pans out is that you keep what you can get in your branch and it evens out across branches.

As a side-note, another possible implication of my view on “alien brethren” is: in the much less likely event that we meet weak young non-spacefaring aliens, the future might go drastically better if we help guide their development as a species, teaching them about the Magic of Friendship and all that.

(Or perhaps not. I remain very uncertain about whether it’s positive-human-EV to guide alien development.)

^{^}

Though some aliens may shake out to be simple too! Humans are pretty far from "tile the universe with vats of genes", but it's not clear how contingent that fact is.

^{^}

Though it should be emphasized that we're totally allowed to find that evolved life tends to go some completely different way than how humans shook out. Generalizing from one example is hard!!

^{^}

And even if you succeeded, it’s not clear that you’d get any utility as a result; my guess that evolved aliens tend to be better than paperclippers can just be wrong, easily.

And even if you got some utility, it’s going to be a paltry amount compared to if you’d built aligned AGI.

^{^}

Possibly this is too extreme; I haven’t refined these probabilities much, and am still just giving my off-the-cuff numbers.

In any case, I want to emphasize that my view isn’t “most misaligned AGIs aren’t sentient, but if you randomly spin up a large number of them you’ll occasionally get a sentient one”. Rather, my view is “almost no random misaligned AGIs are sentient” (but with some uncertainty about whether that’s true). I’m much more uncertain about whether this background view is true than I am uncertain about whether, given this background view, a given misaligned AGI will happen to be sentient.

(Like how I think the chance that the lightspeed limit turns out to be violable is greater than 1 in a billion; but that doesn't mean that if you threw a billion baseballs, I would expect one of them to break the lightspeed limit on average.)

Strong Utopia	Weak Utopia	Pretty Good	Con. Meh	Uncon. Meh	Weak Dystopia	Strong Dystopia
10	50	2	5	5	1	~0

	conscious	unconscious
squiggle maximizer	A sentient alien that converts galaxies into something ~valueless.	A non-sentient alien that converts galaxies into something ~valueless.
alien brethren	A sentient alien that converts galaxies into something cool.	A non-sentient alien that converts galaxies into something cool.

Superintelligent AI is necessary for an amazing future, but far from sufficient

Superintelligent AI is necessary for an amazing future, but far from sufficient

Unboosted humans << Friendly superintelligent AI

Alien CEV << Human CEV

3. The superintelligent AI we’re likely to build by default << Aliens

Strong Utopia
Elicit Prediction (forecast.elicit.org/binary/questions/J_BQIG-KD)	Elicit Prediction (forecast.elicit.org/binary/questions/DY7buVchR)
Elicit Prediction (forecast.elicit.org/binary/questions/88hE3y6i8)	Elicit Prediction (forecast.elicit.org/binary/questions/gcMXnPnT1)
Weak Utopia
Elicit Prediction (forecast.elicit.org/binary/questions/isgYV7473)	Elicit Prediction (forecast.elicit.org/binary/questions/Vbkdbuawu)
Elicit Prediction (forecast.elicit.org/binary/questions/iV31CbqXK)	Elicit Prediction (forecast.elicit.org/binary/questions/T-oqGaIJ-)
"Pretty good" outcome
Elicit Prediction (forecast.elicit.org/binary/questions/zmobANN1H)	Elicit Prediction (forecast.elicit.org/binary/questions/yyw4PiRVi)
Elicit Prediction (forecast.elicit.org/binary/questions/UvSIskkNS)	Elicit Prediction (forecast.elicit.org/binary/questions/dLESZN9iS)
Conscious Meh outcome
Elicit Prediction (forecast.elicit.org/binary/questions/CyYmoBL2l)	Elicit Prediction (forecast.elicit.org/binary/questions/nVY4LlbPL)
Elicit Prediction (forecast.elicit.org/binary/questions/b5dl1i1Ml)	Elicit Prediction (forecast.elicit.org/binary/questions/4zRHC0KLB)
Unconscious Meh outcome
Elicit Prediction (forecast.elicit.org/binary/questions/dw6dxVYUg)	Elicit Prediction (forecast.elicit.org/binary/questions/ps9d7dN11)
Elicit Prediction (forecast.elicit.org/binary/questions/HhXosyta3)	Elicit Prediction (forecast.elicit.org/binary/questions/hcE_dDG6w)
Weak dystopia
Elicit Prediction (forecast.elicit.org/binary/questions/EjkA0F1Xj)	Elicit Prediction (forecast.elicit.org/binary/questions/llRoP6-Vf)
Elicit Prediction (forecast.elicit.org/binary/questions/qfE-7XKt2)	Elicit Prediction (forecast.elicit.org/binary/questions/yDTUVz4Eo)
Strong dystopia
Elicit Prediction (forecast.elicit.org/binary/questions/yExMEqL1g)	Elicit Prediction (forecast.elicit.org/binary/questions/8tDdLurRw)
Elicit Prediction (forecast.elicit.org/binary/questions/aZXA9Wr6S)	Elicit Prediction (forecast.elicit.org/binary/questions/qbuJBk_ka)