On Values Spreading

Great post!

And if they don’t then I’m probably wrong about it being important.

I'm not sure what you mean by "wrong" here. :) Maybe you place a lot of value on the values that would be reached by a collective of smart, rational people thinking about the issues for a long time, and your current values are just best guesses of what this idealized group of people would arrive at? (assuming, unrealistically, that there would be a single unique output of that idealization process across a broad range of parameter settings)

For people who hold very general values of caring what other smart, rational people would care about, values-spreading seems far less promising. In contrast, if -- in light of the utter arbitrariness of values -- you care more about whatever random values you happen to feel now based on your genetic and environmental background, values-spreading seems more appealing.

People are negative utilitarians because the worst possible suffering outweighs the best possible happiness in humans (and probably in all sentient animals), but this is likely untrue over the space of all possible minds. If we could modify humans to experience happiness equal to their capacity for suffering, they should choose, for example, 2 seconds of extreme happiness plus 1 second of extreme suffering rather than none of either.

And if we could modify humans to recognize just how amazing paperclips are, they should choose, for example, 2 paperclips plus 1 second of extreme suffering rather than none of either.

However, it seems considerably more likely that we will go extinct than that we will get locked in to values that are bad but not bad in a way that kills all humans.

I'm curious to know your probabilities of these outcomes. If the chance of extinction (including by uncontrolled AI) in the next century is 20%, and if human-level AI arrives in the next century, then the chance of human-controlled AI would be 80%. Within that 80%, I personally would put most of the probability mass on AIs that favor particular values or give most of the decision-making power to particular groups of people. (Indeed, this has been the trend throughout human history and up to the present. Even in democracies, wealthy people have far more power than ordinary citizens.)

Addressing each of your comments in turn:

I'm fairly confident that hedonistic utilitarianism is true (for some sense of "true"). Much of my confidence comes from the observation that people's objections to utilitarianism play into well-known cognitive biases, and if these biases were removed, I'd expect more people to agree with me. If they didn't agree with me even if they didn't have these biases, that would be grounds for questioning my confidence in utilitarianism.
I think there's a difference between modifying people to be able to experience more happiness and modifying them to believe paperclips are great. The former modifies an experience and lets people's preferences arise naturally; the latter modifies preferences directly, so we can't trust that their preferences reflect what's actually good for them. Of course, preferences that arise naturally don't always reflect what's good for people either, but they do tend in that direction.

Within that 80%, I personally would put most of the probability mass on AIs that favor particular values or give most of the decision-making power to particular groups of people.

I hadn't considered this as a particularly likely possibility. If you'll allow me to go up one meta level, this sort of argument is why I prefer to be more epistemically modest about far-future concerns, and why I wish more people would be more modest. This argument you've just made had not occurred to me during the many hours of thinking and discussion I've already conducted, and it seems plausible that a nontrivial portion of the probability mass of the far future falls on "a small group of people get control of everything and optimize the world for their own benefit." The existence of this argument, and the fact that I hadn't considered it before, makes me uncertain about my own ability to reason about the expected value of the far future.

Brian_Tomasik

Thanks!

One man's bias is another's intrinsic value, at least for "normative" biases like scope insensitivity, status-quo bias, and failure to aggregate. But at least I understand your meaning better. :) Most of LessWrong is not hedonistic utilitarian (most people there are more preference utilitarian or complexity-of-value consequentialist), so one might wonder why other people who think a lot about overcoming those normative biases aren't hedonistic utilitarians.
Of course, one could give people the experience of having grown up in a culture that valued paperclips, of meeting the Great Paperclip in the Sky and hearing him tell them that paperclips are the meaning of life, and so on. These might "naturally" incline people to intrinsically value paperclips. But I agree there seem to be some differences between this case and the pleasure case.
I'm glad that comment was useful. :) I think it's unfortunate that it's so often assumed that "human-controlled AI" means something like CEV, when in fact CEV seems to me a remote possibility. I don't know that you should downshift your ability to reason about the far future that much. :) Over time you'll hear more and more perspectives, which can help challenge previous assumptions.

one might wonder why other people who think a lot about overcoming those normative biases aren't hedonistic utilitarians.

Simple: just because LessWrongers know that these biases exist doesn't mean they're immune to them.

I don't know that you should downshift your ability to reason about the far future that much.

It was already pretty low, this is just an example of why I think it should be low.

Squark

The question is what is the mechanism of value spreading.

If the mechanism is having rational discussions then it is not necessarily urgent to have these discussions right now. Once we create a future in which there is no death and no economic pressures to self-modify in ways that are value destructive, we'll have plenty of time for rational discussions. Things like "experience machine" also fit into this framework, as long as the experiences are in some sense non-destructive (this rules out experiences that create addiction, for example).

If the mechanism is anything but rational discussion then

It's not clear in what sense the values you're spreading are "correct" if it's impossible to convince other people through rational discussion.
I would definitely consider this sort of intervention as evil and would fight rather than cooperate with it (at least assuming the effect cannot be reversed by rational discussion; I also consider hedonistic utilitarianism abhorrent except as an approximate model in very restricted contexts).

Regarding MIRI in particular, I don't think the result of their work depends on the personal opinions of its director in the way you suggest. I think that any reasonable solution to the FAI problem will be on the meta-level (defining what does it mean for values to be "correct") rather than the object level (hard-coding specific values like animal suffering).

I mostly agree with you. I am less confident than you are that a solution to the FAI problem will be on the meta-level. I think you're probably right, but I have enough uncertainty about it that I much prefer someone who's doing AI safety research to share my values so I can be more confident that they will do research that's useful for all sentient beings and not just humans.

Do you think that trying to supplant others' plans with your own is uncooperative? Coercing them for some greater good? We oughtn't define 'cooperative' as 'good', lest it lose all meaning.

Paul could argue that cooperating with someone means helping them achieve their values. Cooperative approaches would be to help people to live out their values, and if you don't agree with their values, then you can trade your plans with theirs to get some Pareto-optimal outcome. That's probably a simple definition of cooperation in some economic fields... A more interesting edge case is trying to help them weigh together their meta-ethical views to arrive at ethical principles, which feels cooperative to me intuitively.

The distinction here is a bit fuzzy. Some sorts of values spreading care clearly uncooperative, but other times it's unclear. Like what about trying to convince selfish people to be more cooperative? That's uncooperative in that it works against their goals, but if you're a "cooperation consequentialist" then you're still doing good because you're increasing the total amount of cooperation in the world.

If you're a war criminal, and I slap you, it's still violence, irrespective of whether I call myself a "pacifism conseqentialist"!

Yeah that's true. So it depends on whether you're talking about increasing the total amount of cooperation in the world, or increasing your personal level of cooperation with other agents. It seems to me that the former matters more than the latter.

One point that I like to make is that for some philosophies, its more important to just help people to think clearly in general, rather than to promote one morality, because it's hard to justify moralising if you don't have strong objective reasons to think your metamoral reasoning is superior. If objectively bad thinking procedures led people to have a 'wrong' moral view, then correcting these could be easier than promoting a more dubious moral conclusion, while also helping selfish people.

jasonk

I'm curious under what circumstances we can judge thinking to be better or worse but can't make such judgments of "metamoral reasoning".

A frontier AI company should shut down

I'm saying that on some views, you might want to make people do better things on their values, so long as those values are supported by good metamoral thinking. One way to do that is promote good clear thinking, or philosophical thinking in general, rather than just promoting your personal moral system. And for some reasons, perhaps signalling-related, it's much more common to see people profess and evangelise their personal moral beliefs than metaethical or general philosophical ones.

Comments

More from the author

MichaelDickens·2w ago·3m read

Worlds where we solve AI alignment on purpose don't look like the world we live in

MichaelDickens·3mo ago·6m read

The Future Will Be Weirder Than That

MichaelDickens·3mo ago·8m read

Curated and popular this week

Was Partisanship Good for the Environmental Movement?

Jeffrey Heninger·2y ago·Curated 3d ago·6m read

This is the third in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan? Summary Rising partisanship did not make environmentalism more popular or politically effective. Instead, it saw flat or falling overall public opinion, fewer major legislative achievements, and fluctuating executive actions. Public Opinion...

127

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·5d ago·4m read

I think right now EAs might be making a significant mistake by paying insufficient attention to the political realm. As EAs we tend to figure out what’s most impactful for us to work on and focus hard. That’s great! But there are various actions that are ‘non-delegatable’ - the extent to which an individual can do the action is limited (like voting, going to a protest, making hard money contributions to particular campaigns). It might be useful if we were all more in the habit of doing variou...

105

New Video from AI in Context: The Fall and Rise of Sam Altman

ChanaMessinger, phoebe b, Aric Floyd·1w ago·3m read

New Video from AI in Context: The Fall and Rise of Sam Altman If you want to skip straight to the video, here it is! AI in Context is excited to be back with our fourth video! For those just hearing from us, we make videos for 80,000 Hours, telling stories about transformative AI...

Recent opportunities to take action

$1M AI x-risk grant round is live on grantmaking.ai - apply for funding, review applicants, or fund projects

Matt Brooks·14h ago·3m read

127

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·5d ago·4m read

Build a flourishing EA group at the University of Toronto

Joseph Kostousov, Sophia Wan (navarhontes)·1w ago·1m read

Brian_Tomasik