At 80,000 Hours, we want to be transparent about our research process. So, I had a go at listing the key principles that guide our research.
I thought it might be interesting to the forum as a take on an epistemology for effective altruism i.e. what principles should EAs use to make judgements about which causes to support, which careers to take, which charities to donate to and so on?
I'm interested to hear your ideas on (i) which principles you disagree with (ii) which principles we've missed.
See the original page here.
What evidence do we consider?
Use of scientific literature
We place relatively high weight on what scientific literature says about a question, when applicable. If there is relevant scientific literature, we start our inquiry by doing a literature search.
Expert common sense
When we first encounter a question, our initial aim is normally to work out: (i) who are the relevant experts? (ii) what would they say about this question? We call what they would say ‘expert common sense’, and we think it often forms a good starting position (more). We try not to deviate from expert common sense unless we have an account of why it’s wrong.
Quantification
Which careers make the most difference can be unintuitive, since it’s difficult to grasp the scale and scope of different problems, which often differ by orders of magnitude. This makes it important to attempt to quantify and model key factors when possible. The process of quantification is also often valuable for learning more about an issue, and making your reasoning transparent to others. However, we recognise that for most questions we care about, quantified models contain huge (often unknown) uncertainties, and therefore, should not be followed blindly. We always weigh the results of quantified models against their robustness compared to qualitative analysis and common sense.
The experience of the people we coach
We’ve coached hundreds of people on career decisions and have a wider network of people we gather information from who are aligned with our mission. We place weight on their thoughts about the pros and cons of different areas.
How do we combine evidence?
We strive to be Bayesian
We attempt to explicitly clarify our prior guess on an issue, and then update in favor or out of favor based on the strength of our evidence for or against. See an example here. This is called ‘Bayesian reasoning’, and, although not always it adopted, seems to be regarded as best practice for decision making under high uncertainty among those who write about good decision making process.1
We use ‘cluster thinking’
As opposed to relying on one or two strong considerations, we seek to evaluate the question from many angles, weighting each perspective according to its robustness and the importance of the consequences. We think this process provides more robust answers in the context of decision making under high uncertainty than alternatives (such as making a simple quantified model and going with the answer). This style of thinking has been supported by various groups and has several names, including ‘cluster thinking’, ‘model combination and adjustment’, ‘many weak arguments’, and ‘fox style’ thinking.
We seek to make this process transparent by listing the main perspectives we’ve considered on a question. We also make regular use of structured qualitative evaluations, such as our framework.
We seek robustly good paths
Our aim is to make good decisions. Since the future is unpredictable and full of unknown unknowns, and we’re uncertain about many things, we seek actions that will turn out to be good under many future scenarios.
Avoiding bias
We’re very aware of the potential for bias in our work, which often relies on difficult judgement calls, and have surveyed the literature on biases in career decisions. To avoid bias, we aim to make our research highly transparent, so that bias is easier to spot. We also aim to state our initial position, so that readers can see the direction in which we’re most likely to be biased, and write about why we might be wrong.
Seeking feedback
We see all of our work as in progress, and seek to improve it by continually seeking feedback.
We seek feedback through several channels:
- All research is vetted within the team.
- For major research, we’ll send it to external researchers and people with experience in the area for comments.
- We aim to publish all of our substantial research publicly on our blog.
- Blog posts are rated by a group of external raters.
In the future, we intend to carry out internal and external research evaluations.
We aim to make our substantial pieces of research easy to critique by:
- Clearly explaining our reasoning and evidence. If you see a claim that isn’t backed up by a link or citation, you can assume there’s no further justification.
- Flagging judgement calls.
- Giving an overview of our research process.
- Stating our key uncertainties.
I think this is a valuable heuristic, but that it gets stronger by also trying to consider the degree of expertise, and letting that determine how much weight to put on it. The more the question is of the same type they routinely answer, and the more there are good feedback mechanisms to help their judgement, the stronger we should expect their expertise to be.
For some questions we have very good experts. If I've been hurt by someone else's action, I would trust the judgement of a lawyer about whether I have a good case for winning damages. If I want to buy a light for my bike to see by at night, I'll listen to the opinions of people who cycle at night rather than attempt a first-principles calculation of how much light I need it to produce to see a certain distance.
Some new questions, though, don't fall clearly into any existing expertise, and the best you can do is find someone who knows about something similar. I'd still prefer this over the opinion of someone chosen randomly, but it should get much less weight, and may not be worth seeking out. In particular, it becomes much easier for you to become more of an expert in the question than the sort-of-expert you found.
I think this is especially true for AI safety. Sometimes people will cite prominent computer scientists' lack of concern for AI safety as evidence it is an unfounded concern. However, computer scientists seem to typically answer questions on AI progress moreso than AI safety, and these questions seem pretty categorically different, so I'm hesitant to give serious weight to their opinions on this topic. Not to mention the biases we can expect from AI researchers on this topic, e.g. from their incentives to be optimistic about their own field.
Thanks, I'll adapt the page to point this out.
Excellent post. One minor question: what if one or two considerations actually do outweigh all others?
I take it that hedgehogs (as opposed to toxes) are biased in the sense that they are prone to focus on a single argument or a single piece of evidence even when other argument or pieces of evidence should be considered. That seems to me to be a very common mistake. But, in cases where a single argument or a single piece of evidence is so overwhelming that other arguments or pieces of evidence become unimportant, it seems one actually should rely only on that single argument or piece of evidence.
I think the right way is to weight each argument by its robustness and the significance of the conclusions i.e.: If you have a strong argument that an action, A, would be very bad, then you shouldn't do A. If you have a speculative argument that A would be very bad, then you probably shouldn't do A. If you have a speculative argument that A would be a little bit bad, then that doesn't mean much.
I'd add a consideration for relative evidence here. If you have an action, A, that you know little about aside from one speculative argument that A would be very bad, but no strong arguments, how does that work?
Here's a relevant example: say we have two forms of animal rights advocacy. One (A) that has a speculative argument that is very very good in inspiring Democrats, but a strong argument that it's bad in inspiring Republicans. (These are the two main US political parties.) The other (B) has only a strong argument that's it's decent in inspiring Republicans. There is no other evidence. Normally, we'd take down the speculative argument in (A) with a strong, low prior, like we have for developing world health charities (since the base rates for success are pretty low and after in-depth investigation many ones with high speculative arguments turn out to be duds). But in this case, there is no strong prior. To maximize expected value, we should then lean towards (A) in this example, even though this is, in some sense, putting speculative evidence above strong evidence.
Overall, to maximize expected value, I would argue that strength/robustness only matters when comparing different sources of evidence on the same effect of the same action. But when speculative arguments are the only sources of evidence available for a particular effect of a particular action, they should be accounted for no differently than strong arguments of the same effect size.
It seems like quite a few people have downvoted this post. I'd be curious to know why to avoid posting something similar next time.
I don't perceive a need to be frugal with upvotes. I was surprised this article didn't get as upvoted either, because I believe it covers a very important issue. Having read the article, I'm not too surprised by new information. Maybe others feel the same, because as users of this forum we're already familiar with 80,000 Hours methodology, and feel as if this is a rehash.
I've upvoted the article so it will get more visibility, because more important than what's written in it is attracting the critical feedback that 80,000 Hours is seeking.
Yeah, I'd say the main factor in lack of upvotes was the lack of new insight or substantive points to (dis)agree with.
When I hover over the 3 upvotes in the corner by the title, it says "100% positive" - which suggests people haven't downvoted it, it's just that not many people have upvoted it? But maybe I'm reading that wrong.
I thought it was a good and useful post, I don't see any reason why people would downvote it - but would also be interested to hear why if there were people who did.
Same. Doesn't show any downvotes for me either. Maybe it's a bug?
Yes, it seems like the Forum's established a slightly more positive culture than LessWrong, where people are supportive (a la Jess_Whittlestons's post) and don't downvote all that much which seems to me to be a good thing. I would think that people might have refrained from upvoting this post because it might have seemed narrowly focused on promoting 80,000 Hours, but not have downvoted it either.
Ah I didn't know you could see the % by hovering over that icon. I must have misremembered how many upvotes it had before.
I believe selecting between cause areas is something that this epistemology may be insufficient for, and may need tweaking to work better. I don't believe this because the methodology is flawed in principle. These methods work by relying on the work of others who know what they're doing, which makes sense.
However, there seems to be few experts to ask for advice on selecting cause areas. I mean, that's a peculiar problem I didn't encounter in any form before effective altruism posed it. I imagine there's not as much expert common sense, scientific literature, or experience to be learned from here. I imagine the United Nations, and governments of wealthy nations, have departments dedicated to answering these questions. Additionally, I thought of the Copenhagen Consensus. The CEA is in touch with the Copenhagen Consensus, correct?
This is a question that I don't have an answer to, but I thought of it while I was reading the post, and it doesn't seem addressed. Here goes. As 80,000 Hours does atypical research, how much will they worry about biases that will affect their research, that don't usually affect other social science research?
I'm interested in how that differs from the way Givewell does it's assessments. What justifies these differences?
My impression is that the methodology doesn't significantly differ from GiveWell, but we might apply it with a different emphasis. It seems like GiveWell puts a bit less weight on speculative arguments and a bit more weight on common sense within their clusters. However, these differences are pretty small on the scale of things, and are hard to disentangle from having come across different evidence rather than having different methodology.
The obvious gap here is the process formative of the pre-given question digested by this methodology, that obviously being the most consequential step. How are such questions arrived at and by whom? It seems difficult for such questions to completely transcend the prejudices of the group giving rise to them, ergo, value should be attributed to the particular steps taken in their formation.
I have a related concern about boundary problems between questions. If you artificially individuate questions do you arrive at an appropriate view of the whole? i.e. the affect of one question on another, and the value of goods which have a small but significant positive influence across questions. I'm thinking particularly of second-order goods whose realisation will almost certainly benefit any possible future; like collective wisdom, moral virtue, world peace, and so forth. These issues clearly aren't reducible to a single question about a particular type of career, or assimilable to an 'expert common sense'. Or do you reject wide-spectrum goods at first principle because of analytic intractability?
I think you could use this methodology to focus your questions too. Start from something very broad like "what's a good life", then use the methodology to work out what the key sub-questions are within that question; and so on. My aim wasn't, however, to give a full account of rational inquiry, starting from zero.
I also don't see there being an especial neglect for second-order goods. Experts and common sense generally think these things are good, so they'll come up in your assessment, even if you can't further analyse them or quantify them.