Kaj_Sotala's Comments

Finding it hard to retain my belief in altruism

You seem to be working under the assumption that we have either emotional or logical motivations for doing something. I think that this is mistaken: logic is a tool for achieving our motivations, and all of our motivations ultimately ground in emotional reasons. In fact, it has been my experience that focusing too much on trying to find "logical" motivations for our actions may lead to paralysis, since absent an emotional motive, logic doesn't provide any persuasive reason to do one thing over another.

You said that people act altruistically because "ultimately they're doing it to not feel bad, to feel good, or to help a loved one". I interpret this to mean that these are all reasons which you think are coming from the heart. But can you think of any reason for doing anything which does *not* ultimately ground in something like these reasons?

I don't know you, so I don't want to suggest that I think that I know how your mind works... but reading what you've written, I can't help getting the feeling that the thought of doing something which is motivated in emotion rather than logic makes you feel bad, and that the reason why you don't want to do things which are motivated by emotion is that you have an emotional aversion to it. In my experience, it's very common for people to have an emotional aversion to what they think emotional reasoning is, causing them to convince themselves that they are making their decisions based on logic rather than emotion. If someone has a strong (emotional) conviction that logic is good and emotion is bad, then they will be strongly motivated to try to ground all of their actions in logical reasoning. All the while being unmotivated to notice the reason why they are so invested in logical reasoning. I used to do something like this, which is how I became convinced of the inadequacy of logical reasoning for resolving conflicts such as these. I tried and failed for a rather long time before switching tactics.

The upside of this is that you don't really need to find a logical reason for acting altruistically. Yes, many people who are driven by emotion end up acting selfishly rather than altruistically. But since everyone is ultimately driven by emotions, then as long as you believe that there are people who act altruistically, then that implies that it's possible to act altruistically while being motivated emotionally.

What I would suggest, would be to embrace everything being driven by emotion, and then trying to find a solution which satisfies all of your emotional needs. You say that studying to get a PhD in machine learning would make you feel bad, and also that not doing it is also bad. I don't think that either of these feelings is going to just go away: if you just chose to do a machine learning PhD, or just chose to not do it, then the conflict would keep bothering you regardless, and you'd feel unhappy either way you chose. I'd recommend figuring out the reasons why you would hate the machine learning path, and also the conditions under which you feel bad about not doing enough altruistic work, and then figuring out a solution which would satisfy all of your emotional needs. (CFAR's workshops teach exactly this kind of thing .)

I should also remark that I was recently in a somewhat similar situation as you: I felt that the right thing to do would be to work on AI stuff, but also that I didn't want to. Eventually I came to the conclusion that the reason why I didn't want it was that a part of my mind was convinced that the kind of AI work that I could do, wouldn't actually be as impactful as other things that I could be doing - and this judgment has mostly held up under logical analysis. This is not to say that doing the ML PhD would genuinely be a bad idea for you as well, but I do think that it would be worth examining the reasons for why exactly you wouldn't want to do studies. Maybe your emotions are actually trying to tell you something important? (In my experience, they usually are, though of course it's also possible for them to be mistaken.)

One particular question that I would ask is: you say you would enjoy working in AI, but you wouldn't enjoy learning the stuff that you need to do in order to work in AI. This might make sense in a field where you are required to study something that's entirely unrelated to what's useful for your job. But particularly once you get around doing doing your graduate studies, much of that stuff will be directly relevant for your work. If you think that you would hate to be in an environment where you get to spend most of your time learning about AI, why do you think that you would enjoy a research job, which also requires you to spend a lot of time learning about AI?

The case for taking AI seriously as a threat to humanity
My perspective here is that many forms of fairness are inconsistent, and fall apart on significant moral introspection as you try to make your moral preferences consistent. I think the skin-color thing is one of them, which is really hard to maintain as something that you shouldn't pay attention to, as you realize that it can't be causally disentangled from other factors that you feel like you definitely should pay attention to (such as the person's physical strength, or their height, or the speed at which they can run).

I think that a sensible interpretation of "is the justice system (or society in general) fair" is "does the justice system (or society) reward behaviors that are good overall, and punish behaviors that are bad overall"; in other words, can you count on society to cooperate with you rather than defect on you if you cooperate with it. If you get jailed based (in part) on your skin color, then if you have the wrong skin color (which you can't affect), there's an increased probability of society defecting on you regardless of whether you cooperate or defect. This means that you have an extra incentive to defect since you might get defected on anyway. This feels like a sensible thing to try to avoid.

The harm of preventing extinction

On the other hand, there are also arguments for why one should work to prevent extinction even if one did have the kind of suffering-focused view that you're arguing for; see e.g. this article. To briefly summarize some of its points:

If humanity doesn't go extinct, then it will eventually colonize space; if we don't colonize space, it may eventually be colonized by an alien species with even more cruelty than us.

Whether alternative civilizations would be more or less compassionate or cooperative than humans, we can only guess. We may however assume that our reflected preferences depend on some aspects of being human, such as human culture or the biological structure of the human brain[48]. Thus, our reflected preferences likely overlap more with a (post-)human civilization than alternative civilizations. As future agents will have powerful tools to shape the world according to their preferences, we should prefer (post-)human space colonization over space colonization by an alternative civilization.

A specific extinction risk is the creation of unaligned AI, which might first destroy humanity and then go on to colonize space; if it lacked empathy, it might create a civilization where none of the agents cared about the suffering of others, causing vastly more suffering to exist.

Space colonization by an AI might include (among other things of value/disvalue to us) the creation of many digital minds for instrumental purposes. If the AI is only driven by values orthogonal to ours, it would likely not care about the welfare of those digital minds. Whether we should expect space colonization by a human-made, misaligned AI to be morally worse than space colonization by future agents with (post-)human values has been discussed extensively elsewhere. Briefly, nearly all moral views would most likely rather have human value-inspired space colonization than space colonization by AI with arbitrary values, giving extra reason to work on AI alignment especially for future pessimists.

Trying to prevent extinction also helps avoid global catastrophic risks (GCRs); GCRs could set social progress back, causing much more violence and other kinds of suffering than we have today.

Global catastrophe here refers to a scenario of hundreds of millions of human deaths and resulting societal collapse. Many potential causes of human extinction, like a large scale epidemic, nuclear war, or runaway climate change, are far more likely to lead to a global catastrophe than to complete extinction. Thus, many efforts to reduce the risk of human extinction also reduce global catastrophic risk. In the following, we argue that this effect adds substantially to the EV of efforts to reduce extinction risk, even from the very-long term perspective of this article. This doesn’t hold for efforts to reduce risks that, like risks from misaligned AGI, are more likely to lead to complete extinction than to a global catastrophe. [...]
Can we expect the “new” value system emerging after a global catastrophe to be robustly worse than our current value system? While this issue is debated[60], Nick Beckstead gives a strand of arguments suggesting the “new” values would in expectation be worse. Compared to the rest of human history, we currently seem to be on a unusually promising trajectory of social progress. What exactly would happen if this period was interrupted by a global catastrophe is a difficult question, and any answer will involve many judgements calls about the contingency and convergence of human values. However, as we hardly understand the driving factors behind the current period of social progress, we cannot be confident it would recommence if interrupted by a global catastrophe. Thus, if one sees the current trajectory as broadly positive, one should expect this value to be partially lost if a global catastrophe occurs.

Efforts to reduce extinction risk often promote coordination, peace and stability, which can be useful for reducing the kinds of atrocities that you're talking about.

Taken together, efforts to reduce extinction risk also promote a more coordinated, peaceful and stable global society. Future agents in such a society will probably make wiser and more careful decisions, reducing the risk of unexpected negative trajectory changes in general. Safe development of AI will specifically depend on these factors. Therefore, efforts to reduce extinction risk may also steer the world away from some of the worst non-extinction outcomes, which likely involve war, violence and arms races.
The harm of preventing extinction
Do you have a short summary of why he thinks that someone answering the question of "would you have preferred to die right after child birth?" with "No?" is not strong evidence that they should have been born?

I don't know what Benatar's response to this is, but - consider this comment by Eliezer in a discussion of the Repugnant Conclusion:

“Barely worth living” can mean that, if you’re already alive and don’t want to die, your life is almost but not quite horrible enough that you would rather commit suicide than endure. But if you’re told that somebody like this exists, it is sad news that you want to hear as little as possible. You may not want to kill them, but you also wouldn’t have that child if you were told that was what your child’s life would be like.

As a more extreme version, suppose that we could create arbitrary minds, and chose to create one which, for its entire existence, experienced immense suffering which it wanted to stop. Say that it experienced the equivalent of being burned with a hot iron, for every second of its existence, and never got used to it. Yet, when asked whether it wanted to die, or would have preferred to die right after it was born, we'd design it in such a way that it would consider death even worse and respond "no". Yet it seems obvious to me that it outputting this response is not a compelling reason to create such a mind.

If people already exist, then there are lots of strong reasons about respecting people's autonomy etc. for why we should respect their desire to continue existing. But if we're making the decision about what kinds of minds should come to existence, those reasons don't seem to be particularly compelling. Especially not since we can construct situations in which we could create a mind that preferred to exist, but where it nonetheless seems immoral to create it.

You can of course reasonably argue that whether a mind should exist, depends on whether they would want to exist and some additional criteria about e.g. how happy they would be. Then if we really could create arbitrary minds, then we might as well (and should) create ones that were happy and preferred to exist, as opposed to ones which were unhappy and preferred to exist. But in that case we've already abandoned the simplicity of just basing our judgment on asking whether they're happy with having survived to their current age.

I surely prefer to exist and would be pretty sad about a world in which I wasn't born (in that I would be willing to endure significant additional suffering in order to cause a world in which I was born).

This doesn't seem coherent to me; once you exist, you can certainly prefer to continue existing, but I don't think it makes sense to say "if I didn't exist, I would prefer to exist". If we've assumed that you don't exist, then how can you have preferences about existing?

If I ask myself the question, "do I prefer a world where I hadn't been born versus a world where I had been born", and imagine that my existence would actually hinge on my answer, then that means that I will in effect die if I answer "I prefer not having been born". So then the question that I'm actually answering is "would I prefer to instantly commit a painless suicide which also reverses the effects of me having come into existence". So that's smuggling in a fair amount of "do I prefer to continue existing, given that I already exist". And that seems to me unavoidable - the only way we can get a mind to tell us whether or not it prefers to exist, is by instantiating it, and then it will answer from a point of view where it actually exists.

I feel like this makes the answer to the question "if a person doesn't exist, would they prefer to exist" either "undefined" or "no" ("no" as in "they lack an active desire to exist", though of course they also lack an active desire to not-exist). Which is probably for the better, given that there exist all kinds of possible minds that would probably be immoral to instantiate, even though once instantiated they'd prefer to exist.

2018 AI Alignment Literature Review and Charity Comparison
In the past [EAF/FRI] have been rather negative utilitarian, which I have always viewed as an absurd and potentially dangerous doctrine. If you are interested in the subject I recommend Toby Ord’s piece on the subject. However, they have produced research on why it is good to cooperate with other value systems, making me somewhat less worried.

(I work for FRI.) EA/FRI is generally "suffering-focused", which is an umbrella term covering a range of views; NU would be the most extreme form of that, and some of us do lean that way, but many disagree with it and hold some view which would be considered much more plausible by most people (see the link for discussion). Personally I used to lean more NU in the past, but have since then shifted considerably in the direction of other (though still suffering-focused) views.

Besides the research about the value of cooperation that you noted, this article discusses reasons why the expected value of x-risk reduction could be positive even from a suffering-focused view; the paper of mine referenced in your post also discusses why suffering-focused views should care about AI alignment and cooperate with others in order to ensure that we get aligned AI.

And in general it's just straightforwardly better and (IMO) more moral to try to create a collaborative environment where people who care about the world can work together in support of their shared points of agreement, rather than trying to undercut each other. We are also aware of the unilateralist's curse, and do our best to discourage any other suffering-focused people from doing anything stupid.

Is Effective Altruism fundamentally flawed?

The following is roughly how I think about it:

If I am in a situation where I need help, then for purely selfish reasons, I would prefer people-who-are-capable-of-helping-me to act in such a way that has the highest probability of helping me. Because I obviously want my probability of getting help, to be as high as possible.

Let's suppose that, as in your original example, I am one of three people who need help, and someone is thinking about whether to act in a way that helps one person, or to act in a way that helps two people. Well, if they act in a way that helps one person, then I have a 1/3 chance of being that person; and if they act in a way that helps two people, then I have a 2/3 chance of being one of those two people. So I would rather prefer them to act in a way that helps as many people as possible.

I would guess that most people, if they need help and are willing to accept help, would also want potential helpers to act in such a way that maximizes their probability of getting help.

Thus, to me, reason and empathy would say that the best way to respect the desires of people who want help, is to maximize the amount of people you are helping.

The Technological Landscape Affecting Artificial General Intelligence and the Importance of Nanoscale Neural Probes

Hi Daniel,

you argue in section 3.3 of your paper that nanoprobes are likely to be the only viable route to WBE, because of the difficulty in capturing all of the relevant information in a brain if an approach such as destructive scanning is used.

You don't however seem to discuss the alternative path of neuroprosthesis-driven uploading:

we propose to connect to the human brain an exocortex, a prosthetic extension of the biological brain which would integrate with the mind as seamlessly as parts of the biological brain integrate with each other. [...] we make three assumptions which will be further fleshed out in the following sections:

There seems to be a relatively unified cortical algorithm which is capable of processing different types of information. Most, if not all, of the information processing in the brain of any given individual is carried out using variations of this basic algorithm. Therefore we do not need to study hundreds of different types of cortical algorithms before we can create the first version of an exocortex.
We already have a fairly good understanding on how the cerebral cortex processes information and gives rise to the attentional processes underlying consciousness. We have a good reason to believe that an exocortex would be compatible with the existing cortex and would integrate with the mind.
The cortical algorithm has an inbuilt ability to transfer information between cortical areas. Connecting the brain with an exocortex would therefore allow the exocortex to gradually take over or at least become an interface for other exocortices.

In addition to allowing for mind coalescence, the exocortex could also provide a route for uploading human minds. It has been suggested that an upload can be created by copying the brain layer-by-layer [Moravec, 1988] or by cutting a brain into small slices and scanning them [Sandberg & Bostrom, 2008]. However, given our current technological status and understanding of the brain, we suggest that the exocortex might be a likely intermediate step. As an exocortex-equipped brain aged, degenerated and eventually died, an exocortex could take over its functions, until finally the original person existed purely in the exocortex and could be copied or moved to a different substrate.

This seems to avoid the objection of it being too hard to scan the brain in all detail. If we can replicate the high-level functioning of the cortical algorithm, then we can do so in a way which doesn't need to be biologically realistic, but which will still allow us to implement the brain's essential functions in a neural prosthesis (here's some prior work that also replicates some aspect of brain's functioning and re-implements it in a neuroprosthesis, without needing to capture all of the biological details). And if the cortical algorithm can be replicated in a way that allows the person's brain to gradually transfer over functions and memories as the biological brain accumulates damage, the same way that function in the biological brain gets reorganized and can remain intact even as it slowly accumulates massive damage61127-1), then that should allow the entirety of the person's cortical function to transfer over to the neuroprosthesis. (of course, there are still the non-cortical parts of the brain that need to be uploaded as well)

A large challenge here is in getting the required amount of neural connections between the exocortex and the biological brain; but we are already getting relatively close, taking into account that the corpus callosum that connects the two hemispheres "only" has on the order of 100 million connections:

Earlier this year, the US Defense Advanced Research Projects Agency (DARPA) launched a project called Neural Engineering System Design. It aims to win approval from the US Food and Drug Administration within 4 years for a wireless human brain device that can monitor brain activity using 1 million electrodes simultaneously and selectively stimulate up to 100,000 neurons. (source)

2017 AI Safety Literature Review and Charity Comparison

Also, one forthcoming paper of mine released as a preprint; and another paper that was originally published informally last year but published in somewhat revised and peer-reviewed form this year:

Both were done as part of my research for the Foundational Research Institute; maybe include us in your organizational comparison next year? :)

Anti-tribalism and positive mental health as high-value cause areas

There seem to be a lot of leads that could help us figure out the high-value interventions, though: i) knowledge about what causes it and what has contributed to changes of it over time ii) research directions that could help further improve our understanding of what causes it / what doesn't cause it iii) various interventions which already seem like they work in a small-scale setting, though it's still unclear how they might be scaled up (e.g. something like Crucial Conversations is basically about increasing trust and safety in one-to-one and small-group conversations) iv) and of course psychology in general is full of interesting ideas for improving mental health and well-being that haven't been rigorously tested, which also suggests that v) any meta-work that would improve psychology's research practices would also be even more valuable than we previously thought.

As for the "pointing out a problem people have been aware of for millenia", well, people have been aware of global poverty for millenia too. Then we got science and randomized controlled trials and all the stuff that EAs like, and got better at fixing the problem. Time to start looking at how we could apply our improved understanding of this old problem, to fixing it.

Load More