Caspar Oesterheld

27 karmaJoined


Nice post! I generally agree and I believe this is important.

I have one question about this. I'll distinguish between two different empirical claims. My sense is that you argue for one of them and I'd be curious whether you'd also agree with the other. Intuitively, it seems like there are lots of different but related alignment problems: "how can we make AI that does what Alice wants it to do?", "how can we make AI that does what the US wants it to do?", "how can we make AI follow some set of moral norms?", "how can we make AI build stuff in factories for us, without it wanting to escape and take over the world?", "how can we make AI that helps us morally reflect (without manipulating us in ways we don't want)?", "how can we make a consequentialist AI that doesn't do any of the crazy things that consequentialism implies in theory?". You (and I and everyone else in this corner of the Internet) would like the future to solve the more EA-relevant alignment questions and implement the solutions, e.g., help society morally reflect, reduce suffering, etc. Now here are two claims about how the future might fail to do this:

1. Even if all alignment-style problems were solved, then humans would not implement the solutions to the AI-y alignment questions. E.g., if there was the big alignment library that just contains the answer to all these alignment problems, then individuals would grab "from pauper to quadrillionaire and beyond with ChatGPT-n", not "how to do the most you can do better with ChatGPT-n", and so on. (And additionally one has to hold that people's preferences for the not-so-ethical books/AIs will not just go away in the distant future. And I suppose for any of this to be relevant, you'd also need to believe that you have some sort of long-term influence on which books people get from the library.)
2. Modern-day research under the "alignment" (or "safety") umbrella is mostly aimed at solving the not-so-EA-y alignment questions, and does not put much effort toward the more specifically-EA-relevant questions. In terms of the alignment library analogy, there'll be lots of books in the aisle on how to get your AI to build widgets without taking over the world, and not so many books in the aisle on how to use AI to do moral reflection and the like. (And again one has to hold that this has some kind of long-term effect, despite the fact that all of these problems can probably be solved _eventually_. E.g., you might think that for the future to go in a good direction, we need AI to help with moral reflection immediately once we get to human-level AI, because of some kinds of lock-in.)

My sense is that you argue mostly for 1. Do you also worry about 2? (I worry about both, but I mostly think about 2, because 1 seems much less tractable, especially for me as a technical person.)

Very nice post! As a late-stage CS PhD student, I agree with pretty much everything. I wish more people would read this before deciding whether to get a PhD or not.

One extremely minor thing:
>From what I have heard, these are some problems you might have:
>[...] your supervisor wants to be co-author even if they did nothing to help.
In computer science (at least in AI at top institutions in the US), it is the norm for PhD supervisors to be a co-author on most or all papers that their students write, even if they contribute very little. One can debate whether this is reasonable. (I think there are various reasons why it is more reasonable than it may appear on first sight. For example, it's good for the supervisor's incentives to be aligned with the students publishing papers. Supervisors should get credit for causing their students to do well, regardless of whether they do so by contributing object-level results or not. Since the main way to get credit in academia is to be a co-author on papers, the simplest way to do this is for the supervisor to be a co-author on everything.) In any case, because this is norm, these co-authorship listings are, I believe, inconsequential for the student. People will typically expect that if the authors listed are a PhD student and their PhD advisor, the PhD student will have done the vast majority of the work. This is definitely different in other disciplines. For example, in economics papers that require a lot of grunt work, the PhD student author often does the grunt work and the PhD advisor does the more high-level thinking.

I agree with the argument. If you buy into the idea of evidential cooperation in large worlds (formerly multiverse-wide superrationality), then this argument might go through even if you don't think alien values are very aligned with humans. Roughly, ECL is the idea that you should be nice to other value systems because that will (acausally via evidential/timeless/functional decision theory) make it more likely that agents with different values will also be nice to our values. Applied to the present argument: If we focus more on existential risks that take resources from other (potentially unaligned) value systems, then this makes it more likely that elsewhere in the universe other agents will focus on existential risks that take away resources from civilizations that happen to be aligned with us.

I use Pocket Casts and I couldn't find it there. Apparently one can submit to their database here. Also: Is there an RSS feed for the podcast?

Probably you're already aware of this, but the APA's Goldwater rule seems relevant. It states:

On occasion psychiatrists are asked for an opinion about an individual who is in the light of public attention or who has disclosed information about himself/herself through public media. In such circumstances, a psychiatrist may share with the public his or her expertise about psychiatric issues in general. However, it is unethical for a psychiatrist to offer a professional opinion unless he or she has conducted an examination and has been granted proper authorization for such a statement.

From the perspective of this article, this rule is problematic when applied to politicians and harmful traits. (This is similar to how the right to confidentiality has the Duty to Warn exception.) A quick Google Scholar search gives a couple of articles since 2016 that basically make this point. For example, see Lilienfeld et al. (2018): The Goldwater Rule: Perspectives From, and Implications for, Psychological Science.

Of course, the other important (more empirical than ethical) question regarding the Goldwater rule is whether "conducting an examination" is a necessary prerequisite for gaining insight into a person's alleged pathology. Lilienfeld et al. also address this issue at length.

I would guess there are many other related movements. For instance, I recently found this article about Comte. Much of it also sounds somewhat EA-ish:

[T]he socialist philosopher Henri de Saint-Simon attempted to analyze the causes of social change, and how social order can be achieved. He suggested that there is a pattern to social progress, and that society goes through a number of different stages. But it was his protégé Auguste Comte who developed this idea into a comprehensive approach to the study of society on scientific principles, which he initially called “social physics” but later described as “sociology.”

Comte was a child of the Enlightenment, and his thinking was rooted in the ideals of the Age of Reason, with its rational, objective focus. [...] He had seen the power of science to transform: scientific discoveries had provided the technological advances that brought about the Industrial Revolution and created the modern world he lived in. The time had come, he said, for a social science that would not only give us an understanding of the mechanisms of social order and social change, but also provide us with the means of transforming society, in the same way that the physical sciences had helped to modify our physical environment.

The article also says that Comte was supported monetarily by (famous utilitarian) John Stuart Mill and how he changed his mind in later life and started some religious movement.

I guess Scientific Charity Movement is special in that it (like EA) doesn't focus on systemic change.

I agree that altruistic sentiments are a confounder in the prisoner's dilemma. Yudkowsky (who would cooperate against a copy) makes a similar point in The True Prisoner's Dilemma, and there are lots of psychology studies showing that humans cooperate with each other in the PD in cases where I think they (that is, each individually) shouldn't. (Cf. section 6.4 of the MSR paper.)

But I don't think that altruistic sentiments are the primary reason for why some philosophers and other sophisticated people tend to favor cooperation in the prisoner's dilemma against a copy. As you may know, Newcomb's problem is decision-theoretically similar to the PD against a copy. In contrast to the PD, however, it doesn't seem to evoke any altruistic sentiments. And yet, many people prefer EDT's recommendations in Newcomb's problem. Thus, the "altruism error theory" of cooperation in the PD is not particularly convincing.

I don't see much evidence in favor of the "wishful thinking" hypothesis. It, too, seems to fail in the non-multiverse problems like Newcomb's paradox. Also, it's easy to come up with lots of incorrect theories about how any particular view results from biased epistemics, so I have quite low credence in any such hypothesis that isn't backed up by any evidence.

before I’m willing to throw out causality

Of course, causal eliminativism (or skepticism) is one motivation to one-box in Newcomb's problem, but subscribing to eliminitavism is not necessary to do so.

For example, in Evidence, Decision and Causality Arif Ahmed argues that causality is irrelevant for decision making. (The book starts with: "Causality is a pointless superstition. These days it would take more than one book to persuade anyone of that. This book focuses on the ‘pointless’ bit, not the ‘superstition’ bit. I take for granted that there are causal relations and ask what doing so is good for. More narrowly still, I ask whether causal belief plays a special role in decision.") Alternatively, one could even endorse the use of causal relationships for informing one's decision but still endorse one-boxing. See, e.g., Yudkowsky, 2010; Fisher, n.d.; Spohn, 2012 or this talk by Ilya Shpitser.

A few of the points made in this piece are similar to the points I make here:

For example, the linked piece also argues that returns may diminish in a variety of different ways. In particular, it also argues that the returns diminish more slowly if the problem is big and that clustered value problems only produce benefits once the whole problem is solved.