Hide table of contents


The purpose of this post is to answer Holden Karnofsky's question of "How should we value various possible long-run outcomes relative to each other?"  


1 Musings on moral uncertainty (skip if you already have read lots about moral uncertainty) 

I'll begin with a series of thoughts on moral uncertainty based on Will Macaskill's book.  TLDR, we should take it seriously.  

1 Under moral uncertainty it makes sense to maximize moral choice worthiness.  

2 Moral uncertainty is justified given the difficulty of ethics and the absence of agreement.  

3 Humans are naturally overconfident.  98% stated confidence results in being wrong 30% of the time.  People are wrong 10% of the time for things where they say they're 100% certain.  

4 Taking moral uncertainty seriously can also be justified by intuitions about the foie gras case.  Suppose one is pretty sure that foie gras is neutral but they think it might be very bad.  They shouldn't eat it if there's an equally good risotto.  

5 It's analogous with empirical uncertainty.  Suppose in the fois gras case she is pretty sure that she has no effect on increasing animal suffering by eating it, but is certain that animal suffering is bad.  In this case the choiceworthiness of each option is the same as the first case.  

6 It avoids collective self defeat based on lots of moral uncertainty.  Suppose I have 60% credence in theory one you have 60% credence in theory 2.  It makes sense to make a compromise to avoid being collectively money pumpable.  

7 It's the only way to accommodate true principles.  If X has a 10% chance of being bad, but a 90% chance of being fine because of moral reasons, X is just as good as Y which has a 10% chance of being bad and a 90% chance of being fine because of practical considerations.  Anything else would require not treating equally not bad cases as equal and not treating equally bad cases as equal.   

8 There are cases where it's not clear whether something verges into moral uncertainty or empirical uncertainty.  Suppose Sophie isn't sure whether or not to eat chicken, but she knows she shouldn't eat if it has personhood.  Is that normative of empirical?

9 Things that are not what matters can correlate heavily with what matters.  Lots of morally uncertain views can be very bad if programmed into an AGI.  For example, suppose that we think that hedonic act utilitarianism is correct and desire are only instrumentally relevant.  Having an AGI optimize for desire satisfaction could miss out on lots of EV. 

10 A decent response to cases of non infinite moral fanaticism is to do a moorean shift.  EG with average utilitarianism suppose I can torture someone.  However, I have 10^-3 credence in average utilitarianism and 10^-3 credence in solipsism.  This action would bring my utility from 5 to 10 at the cost of bringing the other person's utility from 10 to -1000.  Under moral uncertainty there's a 10^-6 chance of doubling the value of the world, while on regular utilitarianism refraining from doing it has little effect.  This can just be run as a Moorean shift against having even that very low credence in average utilitarianism.  

This follows from plausible principles.  Consider two states of affairs assuming I know with metaphysical certainty that solipsism is not true.  The first state of affairs is assaulting someone if total utilitarianism is true, second is if average utilitarianism is true.  We have three options.  

  1. Say that average utilitarianism and total utilitarianism holds the same wrongness of the act.  However, if we accept this then we run into the solipsism reductio given above relating to low credence in solipsism and average utilitarianism.
  2. Say that average utilitarianism holds that the act is much more wrong than total utilitarianism.  If we hold this, then we once again run into the solipsism reductio.
  3. Say that total utilitarianism holds the act is much more wrong that average utilitarianism.  However, this runs into its own deeply implausible results.  Suppose we have 99.99% credence in average utilitarianism and .01% credence in total utilitarianism.  Given that to avoid the solipsism reductio we had to hold that average utility having a 1/10^5 chance of being doubled is less important than preventing a torture, this would mean that average utilitarianism is rendered almost totally irrelevant in regular considerations (including the one given above).  This would thus mean that you should take the second option, even if my credence in average utilitarianism is near certain.  Average utilitarianism has crazy extreme results which means that it's either ignored in plausible cases or dominates in them.  We can run a moorean shift of the type that "If average utilitarianism was even .0001% as important in our deliberations as total utilitarianism we should be egoists, but we shouldn't be egoists so it shouldn't be even .0001% as important as total utilitarianism.  We can also run a Moorean shift where we say that something else has a low chance of being vastly more important than expected ie maybe time loops infinitely so that pleasure has infinite ev or maybe some element of pleasure is infinitely good in currently unanticipated ways.

2 Reasons why utilitarian considerations of some sort should mostly dominate our evaluations of the far future 

I've previously argued for utilitarianism (most of my blog posts are about it).  However, even if we are not utilitarians, utilitarian considerations should roughly dominate our consideration of the far future for a few reasons.  

1 As I hope to show later, the far future has the potential for vast amounts of utilitarian value.  Even if we think that elsewhere in life rights tend to trump well-being, the monumental scales of the future make well-being dominate.  

2 All plausible moral views hold that we should care at least to some degree about happiness and misery (or closely related concepts like desire satisfaction).  Pressing a button that would double the happiness of the world would be very good and pressing a button that would halve the value of the world would be very bad.  

3 There are lots of very good arguments for utilitarianism.  

4 It's a good way of getting future proof ethics.  

3 Reasons to think the far future could be very good if there's a utopia

This section will argue that the scenario with the highest expected value could have truly immense expected value.  

1) Number of people.  The future could have lots of people.  Bostrom calculated that there could be 10^52 people by reasonable assumptions.  This is such a vast number that even a 1 in 10 billion chance of there being an excellent future produces in expectation 10^42 people in the future.  

Additionally, it seems like there's an even smaller but far from zero probability that it would be possible to bring about vastly greater numbers of sentient beings living very good lives.  Several reasons to think this. 

1)  Metaculus says as of 2/25/22 that the odds that the universe will end are about 85%.  Even if we think that this is a major underestimate, if it's even 99%, then it seems imminently possible for us to have a civilization that survives either forever or nearly forever.  

2) Given the large number of unsolved problems in physics, the correct model could be very different form what we believe.  

3) Given our lack of understanding of consciousness, it's possible that there's a way to infinitely preserve consciousness.  

4) As Inslor says on Metaculus "I personally subscribe to Everett Many Worlds interpretation of QM and it seems to me possible that one branch can result in infinitely many downstream branches with infinity many possible computations. But my confidence about that is basically none."  

5) Predictions in general have a fairly poor track record.  Claims of alleged certainty are wrong about 1/8th of the time.  We thus can't be very confident about such matters relating to how we can affect the universe 10 billion years from now.  

Sandberg and Manheim argue against this, writing "This criticism cannot be refuted, but there are two reasons to be at least somewhat skeptical. First, scientific progress is not typically revisionist, but rather aggregative. Even the scientific revolutions of Newton, then Einstein, did not eliminate gravity, but rather explained it further. While we should regard the scientific input to our argument as tentative, the fallibility argument merely shows that science will likely change. It does not show that it will change in the direction of allowing infinite storage."  

It's not clear that this is quite right.  Modern scientific theories have persuasively argued against previous notions of time, causality, substance dualism, and many others.  Additionally, whether or not something is aggregative or revisionist seems like an ill defined category.  Theories may have some aggregationist components and other revisionist ones.  Additionally, there might be interesting undiscovered laws of physics that allow us to do extra things that we currently can't.  

While it's unlikely that we'll be able to go faster than light or open up wormholes, it's certainly far from impossible.  And this is just one mechanism by which the survival of sentient beings could advance past the horizon imagined by Sandberg and Manheim.  The inability of cavemen to predict what would go on in modern society should leave us deeply skeptical of claims relating to the possibilities of civilizations hundreds of millions of years down the line.  

Sandberg and Manheim add "Second, past results in physics have increasingly found strict bounds on the range of physical phenomena rather than unbounding them. Classical mechanics allow for far more forms of dynamics than relativistic mechanics, and quantum mechanics strongly constrain what can be known and manipulated on small scales."  This is largely true, though not entirely.  The aforementioned examples explain how more modern physics allows us to figure out true things.  

Sandberg and Manheim finish, writing "While all of these arguments in defense of physics are strong evidence that it is correct, it is reasonable to assign a very small but non-zero value to the possibility that the laws of physics allow for infinities. In that case, any claimed infinities based on a claim of incorrect physics can only provide conditional infinities. And those conditional infinities may be irrelevant to our decisionmaking, for various reasons."  

I'd generally agree with the assessment.  I'd currently give about 6% credence in it being theoretically possible for a civilization to last forever.  However, the upside is literally infinite, so even low risks matter a great deal.  

One might be worried about the possibility of dealing with infinities.  This is a legitimate worry.  However, rather than thinking of it as infinity, for now we can just treat it as some unimaginably big number (say Graham's number).  This avoids paradoxes relating to it and is justified if we rightly think that an infinity of bliss is better than Graham's number years of bliss.  

One might additionally worry that the odds are sufficiently low that this potential scenario can be ignored.  This is, however, false, as can be shown with a very plausible principle called The Level Up Principle:

Let N be a number of years of good life 

M is a different number of years of good life where M<N 

P is a probability that's less than 100% 

The principle states the following: For any state of the world with M, there is some value of P and N for which P(N) is overall better than certainty of M.  

This is a very plausible principle. 

Suppose that M is 10 trillion.  For this principle to be true there would have to be some much greater amount of years of happy life for which a 99.999999999999999999% chance of it being realized is more choice worthy than certainty of 10 trillion years of happy life.  This is obviously true.  A 99.9999999999999999999999999999999999999999999999999999999999999999999999999999999% chance of 10^100^100^100 years of happy life is more choice worthy than certainty of 10 trillion years of happy life.  However, if we take this principle seriously then we find that chances of infinity or inconceivably large numbers of years of happy life dominate all else.  If we accept transitivity (as we should), then we would conclude that each state of the world has a slightly less probable state of the world that's more desirable because the number of years of good life is sufficiently greater.  This would mean that we can keep diminishing the probability of the event, but increasing the number of years of good life, until we get to a low probability of some vast number of years of good life (say Graham's number) being better than a higher probability of trillions of years of happy life.  This conclusion also follows straightforwardly if we shut up and multiply.  

Other reasons to think that the far future could be very good.  

2) The possibility of truly excellent states of consciousness.  

We currently don't have a very well worked out theory of consciousness.  There are lots of different scientific and philosophical views about consciousness.  However, there are good reasons to be optimistic about the possibility of super desirable consciousness.  

  1. The immense malleability of consciousness.  Our experiences are so strange and varied that it seems like conscious experience can take a wide number of forms.  One would be a priori surprised to find that an experience as horrific as brutal torture, as good as certain pleasurable experiences, or as strange and captivating experience as people have when taking psychedelics drugs, are able to actually exist in the real world.  All of these processes are extremely strange contours of conscious experience, showing that consciousness is at least very malleable.  Additionally, all of these experiences were produced by the blind process of darwinian evolution, meaning that the true possibilities of conscious experience opened up by AI's optimizing for good experiences are far beyond that which randomly emerged.
  2. The fact that these experiences have emerged despite our relatively limited computational capacities.  Consciousness probably has something to do with mental computation.  The human brain is a relatively inefficient computational device.  However, despite that, we can have very vivid experiences--ones that are extremely horrific.  The experience of being fried to death in an iron bull, being beaten to death, and many others discussed here , show that even with our fairly limited computational abilities, we have the ability to experience intensely vivid experiences.  It seems like it should be possible to--with far more advanced computation--create positive experiences with hedonic value that far surpasses even the most horrific of current experiences.  We don't have good reason to believe that there's some computational asymmetry that makes it more difficult to produce immensely positive experiences than immensely negative experiences.  Darwinian evolution provides a perfectly adequate account of why the worst experiences are far more horrific than the best experiences are good, based on their impact on our survival.  Dying in a fire hampers passing on ones genes more than having sex one time enables passing on of gene.  This means that the current asymetry between the best and worst experiences shouldn't lead us to conclude that there's some fundamental computational difference between the resources needed to produce very good experiences and the resources needed to produce very bad experiences.
  3. Based on the reasons given here, including peoples descriptions of intense feelings of pleasure, it seems possible to create states of unfathomable bliss even with very limited human minds, resulting in a roughly logarithmic scale of pain.

          Even if we did have reason to think there was a computational asymmetry, there's no reason to think that the computational asymmetry is immense.  No doubt the most intense pleasures for humans can be far better than the most horrific suffering is for insects.  

I'd thus have about 93% credence in, if digital consciousness were possible, it being possible to create pleasure that's more intense than the most horrific instances of suffering are bad.  Thus, the value of a future utopia could be roughly as good as the disvalue of dystopia would be bad.  This gives us good reason to think that the far future could have immense value if there is successful digital sentience.    

This all relies on the possibility of digital sentience.  I have about 92% confidence in the possibility of digital sentience, for the following reasons.

1 The reason described in this article, "Imagine that you develop a brain disease like Alzheimer’s, but that a cutting-edge treatment has been developed. Doctors replace the damaged neurons in your brain with computer chips that are functionally identical to healthy neurons. After your first treatment that replaces just a few thousand neurons, you feel no different. As your condition deteriorates, the treatments proceed and, eventually, the final biological neuron in your brain is replaced. Still, you feel, think, and act exactly as you did before. It seems that you are as sentient as you were before. Your friends and family would probably still care about you, even though your brain is now entirely artificial.[1]

This thought experiment suggests that artificial sentience (AS) is possible[2] and that artificial entities, at least those as sophisticated as humans, could warrant moral consideration. Many scholars seem to agree.[3]"  

2 Given that humans are conscious, unless one thinks that consciousness relates to arbitrary biological facts relating to the fleshy stuff in the brain, it should be possible at least in theory to make computers that are conscious.  It would be parochial to assume that the possibility of being sentient merely relates to the specific line of biological lineage that lead to our emergence, rather than more fundamental computational features of consciousness. 

3 Consider the following argument, roughly given by Eliezer Yudkowsky in this debate. 

P1 Consciousness exerts a causally efficacious influence on information processing.  

P2 If consciousness exerts a causally efficacious influence on information processing, copying human information processing would generate to digital consciousness. 

P3 It is possible to copy human information processing through digital neurons.  

Therefore, it is possible to generate digital consciousness.  All of the premises seem true.  

P1 is supported here.  

P2 is trivial.  

P3 just states that there are digital neurons, which there are.  To the extent that we think that there are extreme tail ends to both experiences and numbers of people this gives us good reason to expect the tail end scenario for the long terms to dominate other considerations.  

The inverse of these considerations obviously apply for a dystopia.  

4 Counterbalancing Considerations 

The previous section outlined my argument that the best scenarios could be very good and the worst scenarios could be very bad.  However, there are lots of counterbalancing considerations.  

  1. Given that agents seem to like being in states of intense euphoria, the considerations presented in the last section might give us reason to think that agents would be likely to experience extreme amounts of bliss, even in the absence of a utopian scenario.  Thus, the middle end of the spectrum of good futures might be similar to the best possible futures.
  2. If we expect values to be locked in, given moral uncertainty we might think that it's bad for an AI to follow the moral system that we think is correct.  After all, if hedonic states are what matters and an AI wireheads for preference satisfaction, that would be quite bad.  Thus, we might think that a pluralist scenario for the far future would be optimal, that attempts to optimize across a wide range of values.
  3. If we accept broad considerations about the future getting better over time, this would perhaps lead us to conclude that the future will be similarly good, regardless of whether or not it starts out maximizing for the things we think are important.  However, this consideration is undercut by many of the factors that improve the world being nullified by locked in AI.
  4. Even if we accept that the best scenario possible is much better than the next best scenario, we might think that there's no good way to lock in optimal values.  If we have skepticism about the long reflection--which I think we should--then it seems very difficult to lock in a nearly optimal future.

5 Implications of this 

The previous sections have a few important implications.  

  1. Scenarios in which AI are optimizing for things that roughly track what we really care about either creating or avoiding should dominate our considerations.  A world of paperclip maximizers would be bad--but most of the harm would come from preventing the best scenarios related to AI.  The difference between a scenario in which AI is never developed and one in which AI is developed and ends the world maximizing paperclips is fairly negligible.
  2. However, there are lots of scenarios in which, even in the absence of perfect AI optimizing for good things, lots of very good things would be produced nonetheless.  If one is a hedonistic utilitarian, they might think that an AI optimizing for some rough package of things that people mostly think are good like rights, happiness, and freedom would be a significant departure from the optimal scenario, yet would still be a very good scenario that's worth pursuing.
  3. Moral circle expansion is quite important given the potential for astronomical suffering from even slightly misaligned AI.
  4. Gaining a better understanding of consciousness will be relevant for figuring out how we should value the far future and different scenarios relating to it.  If, for example, we discovered that digital consciousness is not possible, or that digital consciousness of pleasure can't be anywhere near as vivid as digital consciousness of pain, this gives us good reason to be less optimistic about the future, and perhaps would make existential threat reduction have negative EV.  It will also be crucial given that consciousness is a prerequisite for value (plausibly)--so if we misunderstand what beings are conscious or what types of consciousness are desirable, this could lead to catastrophic failure.  This might have implications relating to the importance of funding groups like the qualia research institute.
  5. This has a few important implications relating to evaluation of possible future states.  Future states that are optimizing for things including what fundamentally matters have very high EV--while future states that involve optimizing for things that currently correlate with what matters could be very bad.  Thus, value pluralism is immensely important to capture the things that we care about because including too little in the account of what matters could be far worse than including too much.  This means that we should take moral uncertainty very seriously in our calculations.

Here's a rough chart of how much I would value various states of the world.  These are just my intuitions and thus subject to significant revision.  

8.3 was from pluralist world optimizing for lots of things people think are good 

6.2 was World optimizing for thing that is like the true value but slightly different (Eg preferences when what really matters is hedonic states)

4.4 was Median estimate of the future

9.2 was coherent extrapolated volition.  





More posts like this

No comments on this post yet.
Be the first to respond.