This is a special post for quick takes by Jack Malde. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Sorted by Click to highlight new quick takes since: Today at 7:21 AM

Could it be more important to improve human values than to make sure AI is aligned?

Consider the following (which is almost definitely oversimplified):

 

ALIGNED AI

MISALIGNED AI

HUMANITY GOOD VALUES

UTOPIA

EXTINCTION

HUMANITY NEUTRAL VALUES

NEUTRAL WORLD

EXTINCTION

HUMANITY BAD VALUES

DYSTOPIA

EXTINCTION

For clarity, let’s assume dystopia is worse than extinction. This could be a scenario where factory farming expands to an incredibly large scale with the aid of AI, or a bad AI-powered regime takes over the world. Let's assume neutral world is equivalent to extinction.

The above shows that aligning AI can be good, bad, or neutral. The value of alignment exactly depends on humanity’s values. Improving humanity’s values however is always good. 

The only clear case where aligning AI beats improving humanity’s values is if there isn’t scope to improve our values further. An ambiguous case is whenever humanity has positive values in which case both improving values and aligning AI are good options and it isn’t immediately clear to me which wins.

The key takeaway here is that improving values is robustly good whereas aligning AI isn’t - alignment is bad if we have negative values. I would guess that we currently have pretty bad values given how we treat non-human animals and alignment is therefore arguably undesirable. In this simple model, improving values would become the overwhelmingly important mission. Or perhaps ensuring that powerful AI doesn't end up in the hands of bad actors becomes overwhelmingly important (again, rather than alignment).

This analysis doesn’t consider the moral value of AI itself. It also assumed that misaligned AI necessarily leads to extinction which may not be accurate (perhaps it can also lead to dystopian outcomes?).

I doubt this is a novel argument, but what do y’all think?

If you don't think misalignment automatically equals extinction, then the argument doesn't work. The neutral world is now competing with "neutral world where the software fucks up and kills people sometimes", which seems to be worse. 

That is fair. I still think the idea that aligned superintelligent AI in the wrong hands can be very bad may be under-appreciated. The implication is that something like moral circle expansion seems very important at the moment to help mitigate these risks. And of course work to ensure that countries with better values win the race to powerful AI.

I think a neutral world is much better than extinction, and most dystopias are also preferable to human extinction. The latter is debatable but the former seems clear? What do you imagine by a neutral world?

Well I'm assigning extinction a value of zero and a neutral world is any world that has some individuals but also has a value of zero. For example it could be a world where half of the people live bad (negative) lives and the other half live equivalently good (positive) lives. So the sum total of wellbeing adds up to zero. 

A dystopia is one which is significantly negative overall. For example a world in which there are trillions of factory farmed animals that live very bad lives. A world with no individuals is a world without all this suffering.

I have significant concerns with some of the Global Health and Wellbeing (GH&W) work carried out in the EA community, for example by GiveWell or the Happier Lives Institute (HLI). And these aren’t just the commonly-cited ones e.g. that this work isn’t obviously good for the future in expectation or that it ignores animals.

One concern I have is that GH&W work tends to only evaluate existing interventions. The problem with this is that you completely ignore what might be possible with more research. For example, HLI recommends StrongMinds, a mental health charity that treats women with depression in Africa through free, group talk therapy. I’m sure StrongMinds is an excellent charity, but the opportunity cost of giving to StrongMinds is not giving to say world-class researchers looking into psychedelics to treat depression. I’m not saying the money should go to the research - I’m not sure - which is exactly the problem. I’m not sure because organisations like HLI are not investigating the answer. An approach of relying on self-reported wellbeing can only recommend existing interventions, and this may be inhibiting substantial progress.

Further research is only one of multiple things that GH&W work cannot or does not evaluate. Neither GiveWell nor HLI really has the tools to evaluate say climate change interventions - which likely delivers most of its value to future generations. It seems to me that GiveWell could evaluate such interventions, perhaps with difficulty, but doesn’t. Meanwhile I’m not sure how HLI could do so - it’s pretty impossible to evaluate the effect of climate change prevention on self-reported wellbeing. The general criticism here is that GH&W work often ignores (near) future people, which seems unreasonable. And don’t get me started on the far future ones… 

Any thoughts?

As I continue to consider the implications of a longtermist philosophy, it is with a heavy heart that the animal-focused side of me feels less enthused about welfare improvements.

This post by Tobias Baumann provides some insight into what might be important when thinking about animal advocacy from a longtermist point of view and is leading me to judge animal advocacy work against three longtermist criteria:

  1. Persistence: A focus on lasting change as opposed to just short-term suffering reduction
  2. Stability: Low risk of bringing controversy to the effective animal advocacy movement or causing divisions within it
  3. Moral Circle Expansion: The potential to expand humanity's moral circle to include anything sentient

I don’t feel that welfare improvements score particularly well against any of these criteria. 

Persistence - Corporate welfare improvements are promises that are made public and reversing them should come with reputational damage, however there are reasons to believe that some companies will fail to follow through with their commitments (see here). It isn't clear how well welfare improvements might persist beyond the short-run into the medium/long-run.

Stability - A fairly healthy contingent in the animal advocacy and even EAA movements, feels uncomfortable about a focus on welfare improvements as this can be seen to implicitly support animal agriculture and so may be counterproductive to the goal of abolition of all animal exploitation.

Moral Circle Expansion - Unclear. It is possible that welfare improvements may make people feel less concerned about consuming animal products, resulting in a persistent lack of concern for their moral status. It is also possible that there is an opposite effect (see here).

I look forward to further work on longtermism and animal advocacy. I suspect such work may redirect efforts away from welfare improvements and towards areas such as legal/legislative work, wild animal suffering, capacity building, cultured meat, and maybe even general advocacy. Whilst I feel slightly uncomfortable about a potential shift away from welfare improvements, I suspect it may be justified.

I think of welfare reforms as being excellent complements to work on cultured meat. By raising prices, and drawing attention to the issue of animal welfare, they may increase demand for cultured meat when it becomes available. 

This is plausible. Unfortunately the opposite possibility - that people become less concerned about eating animals if their welfare is better - is also quite plausible. I would be interested in seeing some evidence on this matter.

Might be outdated, and the selection of papers is probably skewed in favor of welfare reforms, but here's a bibliography on this question.

Thanks for that

Might (applied) economists make a resurgence in longtermist EA?

Over the past few years I've had the impression that, within the longtermist EA movement, there isn't really a place for economists unless they are doing highly academic global priorities research at places like the Global Priorities Institute.

Is this set to change given recent arguments about the importance of economic growth / technological progress for reducing total existential risk? There was Leopold Aschenbrenner's impressive paper arguing that we should accelerate growth to raise the amount we spend on safety and speed through the time of perils.

Phil Trammell has also written favourably about the argument, but it still seemed fairly fringe in the movement...until now?

It appears that Will MacAskill is taking a similar argument about the dangers of technological stagnation very seriously now as he said at EA Global in his fireside chat around the 7-minute mark.

Generally I get the impression that not much  has been said about achieving existential security. It seems to me that boosting economic growth may be emerging as one of the most promising ways to do so. Could this mean that working as an economist, even outside of academia, on growth / innovation / technological progress becomes a very credible path e.g. in think tanks or government etc.? Are economists about to make a resurgence?

Any thoughts welcome!

I think in general we should consider the possibility that we could just fund most of the useful x-risk work ourselves (by expected impact), since we have so much money ($50 billion and growing faster than the market) that we're having a hard time spending. Accelerating growth broadly seems to accelerate risks without actually counterfactually giving us much more safety work. If anything, decelerating growth seems better, so we can have more time to work on safety.

In case it matters who gets a technology first, then targeted accelerating or decelerating might make sense.

(I'm saying this from a classical utilitarian perspective, which is not my view. I don't think these conclusions follow for asymmetric views.)

My main objection to the idea that we can fund all useful x-risk ourselves is that what we really want to achieve is existential security, which may require global coordination. Global coordination isn't exactly something you can easily fund.

Truth be told though I'm actually not entirely clear on the best pathways to existential security, and it's something I'd like to see more discussion on.

Economic growth seems likely to accelerate AI development, without really increasing AI safety work. This might apply to other risks, although I think AI safety work is basically all from our community, so it applies especially here.

When introducing the ‘repugnant conclusion’, Parfit considers that a life barely worth living might be of the “painless but drab” variety, consisting only of “muzak and potatoes”. The mundanity of such a life then gives the repugnant conclusion its repugnance. This is probably the first and only time I will ever say this, but I’m amazed at Parfit’s sloppiness here. A life of just muzak and potatoes isn’t even close to being worth living.

Parfit’s general idea that a life that is barely worth living might be one with no pains and only very minor pleasures seems reasonable enough, but he should have realised that boredom and loneliness are severe pains in themselves. Can anyone honestly tell me that they would choose right now a life of listening to mundane music and eating potatoes with no other pleasures at all, over just being put in a coma? Bear in mind that we currently torture people by putting them in solitary confinement (whilst ensuring they remain fed). I would think the only people who could actually survive muzak and potatoes without going crazy would be buddhist monks who have trained themselves to rid themselves of craving.

Maybe we can remove the boredom and loneliness objections by imagining a life that lasts just a minute and just consists of muzak and potatoes. However that is a bizarre life that differs from any sort of life we can realistically imagine living, so it’s hard to properly judge its quality. If we are going to opine on the repugnance of the repugnant conclusion we need to think up a realistic concept of a life barely worth living. I’m sure such a life is far better than one with just muzak and potatoes.

I agree with this: a lot of the argument (and related things in population ethics) depends on the zero-level of well-being. I would be very interested to see more interest into figuring out what/where this zero-level is.

I take this to be a shortcut Parfit took to conjure an image of a drab existence, rather than what he actually conceived of as the minimum viable positive life. 

If you pressed him on this point, I'd guess he would argue that there are actual humans who have lives that are barely worth living. And even if those humans don't subside only on "muzak and potatoes," the idea of bland food + a lot of boredom + repetitive days probably hits on some real features of the kind of life Parfit and many others would classify as "just barely worthwhile."

Caveat: I haven't read Parfit in a while, and I could easily be forgetting the context of this remark. Maybe he uses the example in such a way that it's clear he meant it literally?

I am referring to this paper where Parfit says:

Best of all would be Z . This is an enormous population all of whom have lives that are not much above the level where they would cease to be worth living. A life could be like this either because its ecstasies make its agonies seem just worth enduring, or because it is painless but drab. Let us imagine lives in Z to be of this second kind. There is nothing bad in each of these lives; but there is little happiness, and little else that is good. The people in Z never suffer; but all they have is muzak and potatoes. Though there is little happiness in each life in Z , because there are so many of these lives Z is the outcome in which there would be the greatest total sum of happiness.

He refers to muzak and potatoes a few more times in the paper in the same vein.

I realise I have not been charitable enough to Parfit as he does make the assumption that the life of muzak and potatoes would not be characterised by intense boredom and loneliness when he says "never suffer". In that case he is simply presenting a life with no pains and only very minor pleasures, and saying that that is one example of a life that may be barely worth living.

The problem is that it was counterproductive to make that assumption in the first place because, in reality, very few people could actually live a life of muzak and potatoes without severe pain. This presents an issue when we actually have to imagine vast numbers of people living with just muzak and potatoes, and then make a judgement on how good/bad this is.

To put it another way, people may imagine muzak and potatoes to be boring as hell and think "OK the repugnant conclusion is repugnant then". But the point is they shouldn't be imagining it to be as boring as hell, as in this case it is supposed to be a completely painless existence. Therefore I think we need to give people a more realistic conception of a life that is barely worth living to wrap their heads around.

On a charitable reading of Parfit,  the 'muzak and potatoes' expression is meant to pick out the kind of phenomenal experience associated with the "drab existence" he wants to communicate to the reader. So he is not asking you to imagine a life where you do nothing but listen to muzak and eat potatoes. Instead, he is asking you to consider how it typically feels like to listen to muzak and eat potatoes, and to then imagine a life that feels like that, all the time.

he is asking you to consider how it typically feels like to listen to muzak and eat potatoes

I always found this very confusing. Potatoes are one of my favourite foods!

I was thinking the same! I had to google Muzak, but that also seems like pretty nice music to me.

Very good point!

Ah well fair enough that makes a lot of sense. I think he could have worded it a bit better, although judging by your upvotes I probably just missed the point!

Having said that I still think it's quite natural to consider a life where it feels like you're eating muzak and potatoes all the time to be very boring, which of course would be a mistake given that such a life is supposed to be entirely painless.

Indeed I don't think it helps that Parfit calls it a "drab existence". "Drab" is a negative word, but Parfit's "drab existence" is actually supposed to be completely lacking in anything negative.

Therefore I think we need to give people a more realistic conception of a life that is barely worth living to wrap their heads around.

My personal mental image of the Repugnant Conclusion always involved people living more realistic/full lives, with reasonable amounts of boredom being compensated for with just enough good feelings to make the whole thing worthwhile. When I read "muzak and potatoes", my mind conjured a society of people living together as they consumed those things, rather than people in isolation chambers. But I could be unusual, and I think someone could write up a better example than Parfit's if they tried.

I think life without pleasure can still be one of wellbeing. If you practise life like buddist monks that is. I think the zero point is reached by having to do work that stops you from being mindful.

Curated and popular this week
Relevant opportunities