richard_ngo

Former AI safety research engineer, now PhD student in philosophy of ML at Cambridge. I'm originally from New Zealand but have lived in the UK for 6 years, where I did my undergrad and masters degrees (in Computer Science, Philosophy, and Machine Learning). Blog: thinkingcomplete.blogspot.com

Sequences

EA Archives Reading List

Comments

richard_ngo's Shortform

Hmm, I agree that we're talking past each other. I don't intend to focus on ex post evaluations over ex ante evaluations. What I intend to focus on is the question: "when an EA make the claim that GiveWell charities are the charities with the strongest case for impact in near-term human-centric terms, how justified are they?" Or, relatedly, "How likely is it that somebody who is motivated to find the best near-term human-centric charities possible, but takes a very different approach than EA does (in particular by focusing much more on hard-to-measure political effects) will do better than EA?"

In my previous comment, I used a lot of phrases which you took to indicate the high uncertainty of political interventions. My main point was that it's plausible that a bunch of them exist which will wildly outperform GiveWell charities. I agree I don't know which one, and you don't know which one, and GiveWell doesn't know which one. But for the purposes of my questions above, that's not the relevant factor; the relevant factor is: does someone know, and have they made those arguments publicly, in a way that we could learn from if we were more open to less quantitative analysis? (Alternatively, could someone know if they tried? But let's go with the former for now.)

In other words, consider two possible worlds. In one world GiveWell charities are in fact the most cost-effective, and all the people doing political advocacy are less cost-effective than GiveWell ex ante (given publicly available information). In the other world there's a bunch of people doing political advocacy work which EA hasn't supported even though they have strong, well-justified arguments that their work is very impactful (more impactful than GiveWell's top charities), because that impact is hard to quantitatively estimate. What evidence do we have that we're not in the second world? In both worlds GiveWell would be saying roughly the same thing (because they have a high bar for rigour). Would OpenPhil be saying different things in different worlds? Insofar as their arguments in favour of GiveWell are based on back-of-the-envelope calculations like the ones I just saw, then they'd be saying the same thing in both worlds, because those calculations seem insufficient to capture most of the value of the most cost-effective political advocacy. Insofar as their belief that it's hard to beat GiveWell is based on other evidence which might distinguish between these two worlds, they don't explain this in their blog post - which means I don't think the post is strong evidence in favour of GiveWell top charities for people who don't already trust OpenPhil a lot.

richard_ngo's Shortform

it leaves me very baseline skeptical that most 'systemic change' charities people suggest would also outperform, given the amount of time Open Phil has put into this question relative to the average donor. 

I have now read OpenPhil's sample of the back-of-the-envelope calculations on which their conclusion that it's hard to beat GiveWell was based. They were much rougher than I expected. Most of them are literally just an estimate of the direct benefits and costs, with no accounting for second-order benefits or harms, movement-building effects, political effects, etc. For example, the harm of a year of jail time is calculated as 0.5 QALYs plus the financial cost to the government - nothing about long-term effects of spending time in jail, or effects on subsequent crime rates, or community effects. I'm not saying that OpenPhil should have included these effects, they are clear that these are only intended as very rough estimates, but it means that I now don't think it's justified to treat this blog post as strong evidence in favour of GiveWell.

Here's just a basic (low-confidence) case for the cost-efficacy of political advocacy: governmental policies can have enormous effects, even when they attract little mainstream attention (e.g. PEPFAR). But actually campaigning for a specific policy is often only the last step in the long chain of getting the cause into the Overton Window, building a movement, nurturing relationships with politicians, identifying tractable targets, and so on, all of which are very hard to measure, and  which wouldn't show up at all in these calculations by OpenPhil. Given this, what evidence is there that funding these steps wouldn't outperform GiveWell for many policies?

(See also Scott Alexander 's rough calculations on the effects of FDA regulations, which I'm not very confident in, but which have always stuck in my head as an argument that how dull-sounding policies might have wildly large impacts.)

Your other points make sense, although I'm now worried that abstaining about near-term human-centric charities will count as implicit endorsement. I don't know very much about quantitatively analysing interventions though, so it's plausible that my claims in this comment are wrong.

Lessons from my time in Effective Altruism

I get the impression that you (Richard) think it would've been better if you'd skipped trying out engineering-style roles and gone straight into philosophy-style roles. Do you indeed think that?

I don't think this; learning about technical ideas in AI, and other aspects of working at DeepMind, have been valuable for me; so it's hard to point to things which I should have changed. But as I say in the post, in worlds where I wasn't so lucky, then I expect it would have been useful to weight personal fit more. For example, if I'd had the option of committing to a ML PhD instead of a research engineering role, then I might have done so despite uncertainty about the personal fit; this would probably have gone badly.

Clarifying the core of Effective Altruism

For future reference, Linch's comment was in response to a comment of mine which I deleted before Linch replied, in which I used the example of saying "Federer is the best tennis player". Sorry about that! I replaced it with a comment that tried to point at the heart of the objection; but since I disagree with the things in your reply, I'll respond here too.

I think I just disagree with your intuitions here. When someone says Obama is the best person to be president, they are presumably taking into account factors like existing political support and desire to lead, which make it plausible that Obama actually is the best person.

And when people say "X is the best fiction author ever", I think they do mean to make a claim about the literal probability that this person is, out of all the authors who ever wrote fiction, the best one. In that context, the threshold at which I'd call something a "belief" is much lower than in most contexts, but nevertheless I think that when (for example) a Shakespeare fan says it, they are talking about the proposition that nobody was greater than Shakespeare. And this is not an implausible claim, given how much more we study Shakespeare than anyone else.

(By contrast, if they said: nobody had as much latent talent as Shakespeare, then this would be clearly false).

Anyway, it seems to me that judging the best charitable intervention is much harder than judging the best author, because for the latter you only need to look at books that have already been written, whereas in the former you need to evaluate the space of all interventions, including ones that nobody has proposed yet.

Clarifying the core of Effective Altruism

I think the main problem with your definition is that it doesn't allow you to be wrong. If you say "X is the best bet", then how can I disagree if you're accurately reporting information about your subjective credences? Of course, I could respond by saying "Y is the best bet", but that's just me reporting my credences back to you. And maybe we'll change our credences, but at no point in time was either of us wrong, because we weren't actually talking about the world itself.

Which seems odd, and out of line with how we use this type of language in other contexts. If I say "Mathematics is the best field to study, ex ante" then it seems like I'm making a claim not just about my own beliefs, but also about what can be reasonably inferred from other knowledge that's available; a claim which might be wrong. In order to use this interpretation, we do need some sort of implicit notion of what knowledge is available, and what can be reasonably inferred from it, but that saves us from making claims that are only about our own beliefs. (In other words, not the local map, nor the territory, but some sort of intermediate "things that we should be able to infer from human knowledge" map.)

Clarifying the core of Effective Altruism

No, I'm using the common language meaning. Put it this way: there are seven billion people in the world, and only one of them is the best person to fund (ex ante). If you pick one person, and say "I believe that this is the BEST person to fund, given the information available in 2021", then there's a very high chance that you're wrong, and so this claim isn't justified. Whereas you can justifiably claim that this person is a very good person to fund.

Scope-sensitive ethics: capturing the core intuition motivating utilitarianism

I'm pretty suspicious about approaches which rely on personal identity across counterfactual worlds; it seems pretty clear that either there's no fact of the matter here, or else almost everything you can do leads to different people being born (e.g. by changing which sperm leads to their conception).

And secondly, this leads us to the conclusion that unless we quickly reach a utopia where everyone has positive lives forever, then the best thing to do is end the world as soon as possible. Which I don't see a good reason to accept.

Scope-sensitive ethics: capturing the core intuition motivating utilitarianism

The problem is that one man's modus ponens is another man's modus tollens. Lots of people take the fact that utilitarianism says that you shouldn't care about your family more than a stranger as a rebuttal to utilitarianism.

Now, we could try to persuade them otherwise, but what's the point? Even amongst utilitarians, almost nobody gets anywhere near placing as much moral value on a spouse as a stranger. If there's a part of a theory that is of very little practical use, but is still seen as a strong point against the theory, we should try find a version without it. That's what I intend scope-sensitive ethics to be.

In other words, we go from "my moral theory says you should do X and Y, but everyone agrees that it's okay to ignore X, and Y is much more important" to "my moral theory says you should do Y", which seems better. Here X is "don't give your family special treatment" and Y is "spend your career helping the world".

Lessons from my time in Effective Altruism

Yepp, I agree with this. On the other hand, since AI safety is mentorship-constrained, if you have good opportunities to upskill in mainstream ML, then that frees up some resources for other people. And it also involves building up wider networks. So maybe "similar expected value" is a bit too strong, but not that much.

Load More