Jack Malde

3945 karmaJoined Sep 2018


Feel free to message me on here.


That is fair. I still think the idea that aligned superintelligent AI in the wrong hands can be very bad may be under-appreciated. The implication is that something like moral circle expansion seems very important at the moment to help mitigate these risks. And of course work to ensure that countries with better values win the race to powerful AI.

Well I'm assigning extinction a value of zero and a neutral world is any world that has some individuals but also has a value of zero. For example it could be a world where half of the people live bad (negative) lives and the other half live equivalently good (positive) lives. So the sum total of wellbeing adds up to zero. 

A dystopia is one which is significantly negative overall. For example a world in which there are trillions of factory farmed animals that live very bad lives. A world with no individuals is a world without all this suffering.

Could it be more important to improve human values than to make sure AI is aligned?

Consider the following (which is almost definitely oversimplified):













For clarity, let’s assume dystopia is worse than extinction. This could be a scenario where factory farming expands to an incredibly large scale with the aid of AI, or a bad AI-powered regime takes over the world. Let's assume neutral world is equivalent to extinction.

The above shows that aligning AI can be good, bad, or neutral. The value of alignment exactly depends on humanity’s values. Improving humanity’s values however is always good. 

The only clear case where aligning AI beats improving humanity’s values is if there isn’t scope to improve our values further. An ambiguous case is whenever humanity has positive values in which case both improving values and aligning AI are good options and it isn’t immediately clear to me which wins.

The key takeaway here is that improving values is robustly good whereas aligning AI isn’t - alignment is bad if we have negative values. I would guess that we currently have pretty bad values given how we treat non-human animals and alignment is therefore arguably undesirable. In this simple model, improving values would become the overwhelmingly important mission. Or perhaps ensuring that powerful AI doesn't end up in the hands of bad actors becomes overwhelmingly important (again, rather than alignment).

This analysis doesn’t consider the moral value of AI itself. It also assumed that misaligned AI necessarily leads to extinction which may not be accurate (perhaps it can also lead to dystopian outcomes?).

I doubt this is a novel argument, but what do y’all think?

I was surprised by your "dearest" and "mirror" tests.

Call the first the “dearest test.” When you have some big call to make, sit down with a person very dear to you—a parent, partner, child, or friend—and look them in the eyes. Say that you’re making a decision that will affect the lives of many people, to the point that some strangers might be hurt. Say that you believe that the lives of these strangers are just as valuable as anyone else’s. Then tell your dearest, “I believe in my decisions, enough that I’d still make them even if one of the people who could be hurt was you.”

Or you can do the “mirror test.” Look into the mirror and describe what you’re doing that will affect the lives of other people. See whether you can tell yourself, with conviction, that you’re willing to be one of the people who is hurt or dies because of what you’re now deciding. Be accountable, at least, to yourself.

These tests seem to rule out many actions that seem desirable to undertake. For example:

  1. Creating and distributing a COVID vaccine: there is some small risk of serious side effects and it is likely that a small number of people will die even though many, many more will be saved. So this may not pass the "dearest" and "mirror" tests. Should we not create and distribute vaccines?
  2. Launching a military operation to stop genocide: A leader may need to order military action to halt an ongoing genocide, knowing that some innocent civilians and their own soldiers will likely die even though many more will be saved. This may not pass the "dearest" and "mirror" tests. Should we just allow the genocide?

Do you bite the bullet here? Or accept that these tests may be flawed?

  • We can think of this stance as analogous to: 
    • The utilitarian parent: “I care primarily about doing what’s best for humanity at large, but I wouldn’t want to neglect my children to such a strong degree that all defensible notions of how to be a decent parent state that I fucked up.”

I wonder if we don't mind people privileging their own children because:

  1. People love their kids too damn much and it just doesn't seem realistic for people to neglect their children to help others.
  2. A world in which it is normalised to neglect your children to "focus on humanity" is probably a bad world by utilitarian lights. A world full of child neglect just doesn't seem like it would produce productive individuals who can make the world great. So even on an impartial view we wouldn't want to promote child neglect.

Neither of these points are relevant in the case of privileging existing-and-sure-to-exist people/beings vs possible people/beings:

  1. We don't have some intense biologically-driven urge to help present people. For example, most people don't seem to care all that much that a lot of present people are dying from malaria. So focusing on helping possible people/beings seems at least feasible.
  2. We can't use the argument that it is better from an impartial view to focus on existing-and-sure-to-exist people/beings because of the classic 'future could be super-long' argument.

And when you say that a person with totalist/strong longtermist life goals also chooses between two separate values (what their totalist axiology says versus existing people), I'm not entirely sure that's true. Again, massive neglect of existing people just doesn't seem like it would work out well for the long term - existing people are the ones that can make the future great! So even pure strong longtermists will want some decent investment into present people.

Looking forward to reading this. A quick note: in 3. Tomi’s argument that creating happy people is good your introductory text doesn't match what is in the table.


Not sure I entirely agree with the second paragraph. The white paper outlines how philanthropy in this area is quite neglected, and there are organisations like LaMP which could certainly use more funding. Page 5 of the white paper also outlines bottlenecks in the process - even if firms do have strong incentives to acquire talent there can be informational gaps that prevent them from finding the best individuals, and similar informational gaps exist for the individuals that prevent them from actively utilising the best pathways. 

Having said that I'm not claiming this is the best use of EA dollars - just posting for people's information.

Yeah I have a feeling that the best way to argue for this on EA grounds might surprisingly be on the basis of richer world economic growth, which is kind of antithetical with EA's origins, but has been argued to be of overwhelming importance e.g.:

Thank you, good to flag these points. 

Regarding the AI Safety point, I want to think through this more, but I note that the alignment approach of OpenAI is very capabilities-driven, requiring talent and compute to align AI using AI. I think one's belief of the sign of immigration on x-risk here might depend on how much you think top labs like OpenAI actually take the safety risks seriously. If they do, more immigration can help them make safe AI.

Regarding the meat-eater problem, I think the possibility of an animal Kuznets curve is relevant. If such a relationship exists (and there is some evidence it might), speeding up economic growth through immigration to higher-income countries might reduce animal suffering in the long-run.

Or perhaps we're just in a state of cluelessness here...

Load more