T

tylermjohn

755 karmaJoined

Comments
83

Some nice and insightful comments from Anders Sandberg on X

This is very interesting, I liked reading it. I am not sure I entirely agree with the analysis, but I think MPL may well be true, and if this is true leads to very different long term strategies (e.g. a need for more moral hedging).

I am less convinced that we cannot find high-value states, or that they have to be human-parochial.

A key assumption seems to be that not getting maximal value is a disaster, but I think one can equally have a glass-half-full positive view that the search will find greater and greater values.

This essay also fits with my thinking that there might be new values out there, as yet unrealized. Once there was no life, and hence none of the values linked to living beings. Then consciousness, thinking and culture emerged, adding new kinds of value.

I suspect this might keep on going. Not obvious that new levels add fundamentally greater values, but potentially fundamentally different values. And it is not implausible that some are lexically better than others.

... if we sample new values at a constant rate and the values turn out to have a power law distribution, then the highest value found will in expectation grow (trying to work out the formula, but roughly linearly).


 

That is an excellent question. I think ethical theory matters a lot — see Power Laws of Value. But I also just think our superintelligent descendants are going to be pretty derpy and act on enlightened self-interest as they turn the stars into computers, not pursue very good things. And that might be somewhere where, e.g., @William_MacAskill and I disagree.

I think there is a large minority chance that we will successfully align ASI this century, so I definitely think it is possible.

Thank you! IMO the best argument for subjectivists not having these views would be thinking that (1) humans generally value reasoning processes, (2) there are not that many different reasoning processes you could adopt or as a matter of biological or social fact we all value roughly the same reasoning processes, and (3) these processes have clear and determinate implications. Or, in short, Kant was right: if we reason from the standpoint of "reason", which is some well-defined and unified thing that we all care about, we all end up in the same place. But I reject all of these premises.

The other argument is that our values are only determinate over Earthly things we are familiar with in our ancestral environment, and among Earthly things we empirically all kinda care about the same things. (I discuss this a bit here.)

These are excellent comments, and unfortunately they all have the virtue of being perspicuous and true so I don't have that much to say about them.

I doubt how rare near-best futures are among desired futures is a strong guide to the expected value of the future. At least, you need to know more about e.g. the feasibility of near-best futures; whether deliberative processes and scientific progress converge on an understanding of which futures are near-best, etc.

Is the core idea here that human desires and the values people reach on deliberation come apart? That makes sense, though it also leaves open how much deliberation our descendants will actually do / how much their values will be based on a deliberating process. I guess I'll just state my view without defending it that after a decade in philosophy I have become pretty pessimistic about convergence happening through deliberation rather than more divergence as more choice points are uncovered and reasoners either think they have a good loss function or just choose not to do backpropagation.

I hope my position statement makes my view at least sort of clear. Though as I said to you, my moral values and my practices do come apart!

Yeah, do you have other proposed reconceptualisations of the debate?

One shower thought I've had is that maybe we should think of the debate as about whether to focus on ensuring that humans have final control over AI systems or ensuring that humans do good things with that control. But this is far from perfect.

Have you thought about whether there any interventions that could transmit human values to this technologically capable intelligence? The complete works of Bentham and an LLM on a ruggedised solar powered laptop that helps them translate English into their language...

Not very leveraged given the fraction within a fraction within a fraction of success, but maybe worth one marginal person.

Thank you, Will, excellent questions. And thanks for drawing out all of the implications here. Yeah I'm a super duper bullet biter. Age hasn't dulled my moral senses like it has yours! xP

2. But maybe you think it's just you who has your values and everyone else would converge on something subtly different - different enough to result in the loss of essentially all value. Then the 1-in-1-million would no longer seem so pessimistic. 

Yes, I take (2) on the 1 vs 2 horn. I think I'm the only person who has my exact values. Maybe there's someone else in the world, but not more than a handful at most. This is because I think our descendants will have to make razor-thin choices in computational space about what matters and how much, and these choices will amount to Power Laws of Value.

But if so, then suppose I'm Galactic Emperor and about to turn everything into X, best by my lights... do you really take a 99.9% chance of extinction, and a 0.1% chance of stuff optimised by you, instead?  

I generally like your values quite a bit, but you've just admitted that you're highly scope insensitive. So even if we valued the same matter equally as much, depending on the empirical facts it looks like I should value my own judgment potentially nonillions as much as yours, just on scope sensitivity grounds alone!

3. And if so, do you think that Tyler-now has different values than Tyler-2026? Or are you worried that he might have slightly different values, such that you should be trying to bind yourself to the mast in various ways?

Yup, I am worried about this and I am not doing much about it. I'm worried that the best thing that I could do would simply be to go into cryopreservation right now and hope that my brain is uploaded as a logically omniscient emulation with its values fully locked in and extrapolated. But I'm not super excited about making that sacrifice. Any tips on ways to tie myself to the mast?

what's the probability you have that:
i. People in general just converge on what's right?

It would be something like:  P(people converge on my exact tastes without me forcing them to) + [P(kind of moral or theistic realism I don't understand)*P(the initial conditions are such that this convergence happens)*P(it happens quickly enough before other values are locked in)*P(people are very motivated by these values)]. To hazard an-off-the cuff guess, maybe 10^-8 + 10^-4*0.2*0.3*0.4, or about 2.4*10^-6.


ii. People don't converge, but a significant enough fraction converge with you that you and others end up with more than 1milllionth of resources?

I should be more humble about this. Maybe it turns out there just aren't that many free parameters on moral value once you're a certain kind of hedonistic consequentialist who knows the empirical facts and those people kind of converge to the same things. Suppose that's 1/30 odds vs my "it could be anything" modal view. Then suppose 1/20 elites become that kind of hedonistic consequentialist upon deliberation. Then it looks like we control 1/600th of the resources. I'm just making these numbers up, but hopefully they illustrate that this is a useful push that makes me a bit less pessimistic.


iii. You are able to get most of what you want via trade with others?

Maybe 1/20 that we do get to a suitably ideal kind of trade. I believe what I want is a pretty rivalrous good, i.e. stars, so at the advent of ideal trade I still won't get very much of what I want. But it's worth thinking about whether I could get most of what I want in other ways, such as by trading with digital slave-owners to make their slaves extremely happy, in a relatively non-rivalrous way.

I don't have a clear view on this and think further reflection on this could change my views a lot.
 

Yes. Which, at least on optimistic assumptions, means sacrificing lots of lives.

Load more