Why I prioritize moral circle expansion over artificial intelligence alignment

You say

I think a given amount of dolorium/dystopia (say, the amount that can be created with 100 joules of energy) is far larger in absolute moral expected value than hedonium/utopia made with the same resources

Could you elaborate more on why this is the case? I would tend to think that a prior would be that they're equal, and then you update on the fact that they seem to be asymmetrical, and try to work out why that is the case, and whether those factors will apply in future. They could be fundamentally asymmetrical, or evolutionary pressures may tend to create minds with these asymmetries. The arguments I've heard for why are:

  • The worst thing that can happen to an animal, in terms of genetic success, is much worse than the best thing.

This isn't entirely clear to me: I can imagine a large genetic win such as securing a large harem could be comparable to the genetic loss of dying, and many animals will in fact risk death for this. This seems particularly true considering that dying leaving no offspring doesn't make your contribution to the gene pool zero, just that it's only via your relatives.

  • There is selection against strong positive experiences in a way that there isn't against strong negative experiences.

The argument here is, I think, that strong positive experiences will likely result in the animal sticking in the blissful state, and neglecting to feed, sleep, etc, whereas strong negative experiences will just result in the animal avoiding a particular state, which is less maladaptive. This argument seems stronger to me but still not entirely satisfying - it seems to be quite sensitive to how you define states.

Why I prioritize moral circle expansion over artificial intelligence alignment

Thanks very much for writing this, and thanks to Greg for funding it! I think this is a really important discussion. Some slightly rambling thoughts below.

We can think about 3 ways of improving the EV of the far future:

1: Changing incentive structures experienced by powerful agents in the future (e.g. avoiding arms races, power struggles, selection pressures)

2: a) Changing the moral compass of powerful agents in the future in specific directions (e.g. MCE).

b) Indirect ways to improve the moral compass of powerful agents in the future (e.g. philosophy research, education, intelligence/empathy enhancement)

All of these are influenced both by strategies such as activism, improving institutions, and improving education, as well as by AIA. I am inclined to think of AIA as a particularly high-leverage point at which we can have influence on these.

However, these are issues are widely encountered. Consider 2b: we have to decide how to educate the next generation of humans, and they may well end up with ethical beliefs that are different from ours, so we must judge how much to try and influence or constrain them, and how much to accept that the changes are actually progress. This is similar to the problem of defining CEV: we have some vague idea of the direction in which better values lie (more empathy, more wisdom, more knowledge), but we can't say exactly what the values should be. For this intervention, working on AIA may be more important than activism because it has more leverage - it is likely to be more tractable and have greater influence on the future than the more diffuse ways in we can push on education and intergenerational moral progress.

This framework also suggests that MCE is just one example of a collection of similar interventions. MCE involves pushing for a fairly specific belief and behaviour change on a principle that's fairly uncontroversial. You could also imagine similar interventions - for instance, helping people reduce unwanted aggressive or sadistic behaviour. We could call this something like 'uncontroversial moral progress': helping individuals and civilisation to live by their values more. (on a side note: sometimes I think of this as the minimal core of EA: trying to live according to your best guess of what’s right)

The choice between working on 2a and 2b depends, among other things, on your level of moral uncertainty.

I am inclined to think that AIA is the best way to work on 1 and 2b, as it is a particularly high-leverage intervention point to shape the power structures and moral beliefs that exist in the future. It gives us more of a clean slate to design a good system, rather than having to work within a faulty system.

I would really like to see more work on MCE and other examples of 'uncontroversial moral progress'. Historical case studies of value changes seem like a good starting point, as well as actually testing the tractability of changing people's behaviour.

I also really appreciated your perspective on different transformative AI scenarios, as I’m worried I’m thinking about it in an overly narrow way.

The person-affecting value of existential risk reduction

See also the models in (cost-effectiveness of mitigating biorisk) and (asteroid risk), which have estimates for the risk level, cost of reducing it, and cost per qualy for different future discount levels.

"If we ignore distant future generations by discounting, the benefits of reducing existential risk fall by between 3 and 5 orders of magnitude (with a 1% to 5% discount rate), which is still far more cost-effective than measures to reduce small-scale casualty events. Under our survey model (Model 1), the cost per life-year varies between $1,300 and $52,000 for a 5% discount rate and between $770 and $30,000 for a 1% discount rate. These costs are even competitive with first-world healthcare spending, where typically anything less than $100,000 per quality adjusted life-year is considered a reasonable purchase.29

This suggests that even if we are concerned about welfare only in the near term, reducing existential risks from biotechnology is still a cost-effective means of saving expected life if the future chance of an existential risk is anything above 0.0001 per year."

I think their model ought to include a category of catastrophic risk - they don't have anything between disaster (100,000 deaths) and extinction.

"Even if we expected humanity to become extinct within a generation, traditional statistical life valuations would warrant a 32 billion annual investment in asteroid defense (Gerrard & Barber, 1997). Yet the United States spends only $4 million per year on asteroid detection and there is no direct spending on mitigation."