How scale is often misused as a metric and how to fix it

Joey🔸

One of the big criteria used for cause area selection is scale and importance of the issue. This has been used by 80,000 Hours and OpenPhil amongst others. This is often defined as the size and intensity of the problem. For example, if an issue affects 100,000 people deeply, that would be considered higher scale than an issue that minorly affects 1,000 people. Although this version of scale is pretty common in EA, I think there are some major problems with it, the biggest of which being bottlenecking.

The broad idea of measuring scale this way has an implication baked into it that the total scale of the problem is the factor to most consider. However, with almost all large problems this seems very unlikely to be true as they are almost always going to be capped or bottlenecked by something much faster than they will by the total capacity of the problem. Take for example bednets. If AMF only gets the funding to give out 10 million bednets a year it doesn't really matter if the total scale of the malaria burden would require 20 million or 500 million. Effectively AMF is capped by money before it hits other scaling considerations. If you were a billionaire perhaps you could give enough money to make money no longer the capping feature, but even in that situation it’s likely another factor would cap before reaching all those in need of nets. In AMF’s case it would likely be number of partners who can effectively be worked with or the political stability of remaining countries.

This concern of bottlenecking is even more dramatic in fields that have tighter caps or very large scale. To take an example in animal rights, if your organization can only raise $100,000 it doesn't really matter how big the population is from a scale perspective as long as it’s much larger than you are likely to effectively help with $100,000. When comparing a cause like animal rights vs bednets, clearly the total size of the animal rights issues hits a lot more individuals, but its “true scale,” in many cases, will be more strictly capped than a more popular, better funded, and more well understood poverty intervention that affects fewer individuals.

Money is one of the most common capping features to scale but it’s not the only one. Sometimes it can be logistical factors like partners or total production in the market of a certain good. Sometimes it can be people capped (it seems likely a charity focused on surgeries would run into a skilled people shortage before running into the problem of not having enough people to do surgeries on). A capping feature could also tied to crowdedness of a space. It could also be our understanding of the problem. For example, wild animals may suffer enormously, but if we don’t know how to help all of them. In general, when looking at a cause area or charity, it seems one should consider what factor is most likely to cap scale first rather than just looking at the total size of the problem and assuming no other capping features happen.

A counter argument to this might be that scale is just used as a proxy to narrow down cause selection. This use of scale I have far less concerns with, but many people, including major organizations, explicitly use scale in the way I described to make end line calls about what causes to support.

Another counter argument is that if you think your intervention has a small chance of helping all the population. For example, if you think your action produces a 0.000001% increase in the chance of ending all factory farming, then a more normal understanding of scale makes sense. However given the huge scale of most problems EAs work on, few of our solutions are aimed at solving the whole problem (e.g. we cannot even fill fill AMF’s room for funding which is only one of many charities working on malaria). We want to be careful not being far overconfident about our ability to affect change and let that change our cause selection.

10 Reactions

Mentioned in

77Review of ITN Critiques

7Effective Altruism as Global Catastrophe Mitigation

More posts like this

Comments15

Sorted by

New & upvoted

Click to highlight new comments since: Today at 8:47 PM

RayTaylorDec 15 20241

"Scaling Studies" are a thing, now part of "Implementation Science".

The focus tends to be what makes pilot projects scale-able, and what interferes with. Politicians and funders get (understandably) irritated when pilots keep failing to scale - this was happening a lot in the 1990s, which gave rise to the first studies on scaling.

Implementation science looks more generally at what really works IRL / in the field - a lot is going on in this in Chicago and Global Health.

MaxDaltonJan 30 20185

[Writing personally] This post seems to argue (at least implicitly) that scale is bad if it is the only metric that is used to assess a cause area or intervention. I think this is clearly correct.

But I don't think anyone has used scale as the only metric: 80,000 Hours very explicitly modify it with other factors (neglectedness, solvability), to account for things like the "bottlenecking" you discuss.

There's a separate thing you could mean, which is "Scale, neglectedness, solvability is only one model for prioritisation. It's useful to have multiple different models for prioritising. One alternate model is to assess what the biggest bottleneck is for solving a problem." (Note that this does not really support the claim that scale is misused: it's just that other lenses are also useful.)

I respect the inclination to use multiple models, and I think that thinking in terms of bottlenecks is useful for e.g. organizational prioritization. I think it's harder to apply to cause prioritization because we face so many problems and potential solutions that it's hard to see what the bottlenecks are. It may be useful for prioritizing how to use resources to pursue an intervention, which seems to be how you are mostly using it in this case.

Overall, I worry that your title doesn't really reflect what you show in the text.

David_MossJan 30 20186

I didn't read the post as meaning either "scale is bad if it is the only metric that is used" _or_ "Scale, neglectedness, solvability is only one model for prioritisation. It's useful to have multiple different models...."

When looking at scale in a scale, neglectedness, tractability, framework, it's true that the other factors can offset the influence of scale. e.g. if something is large in scale but intractable, the intractability counts against the cause being considered and at least somewhat offsets the consideration that the cause is large in scale. But this doesn't touch on the point this post makes, which is that looking at scale itself as a consideration, the 'total scale' may be of little or no relevance to the evaluation of the cause, and rather 'scale' is only of value up to a given bottleneck and of no value beyond that. I almost never see people talking of scale in this way in the context of a scale, neglectedness, tractability, framework: dividing up the total scale into tractable bits, less tractable bits and totally intractable bits. Rather, I more typically see people assigning some points for scale, evaluating tractability independently and assigning some points for that and evaluating neglectedness independently and assigning some points for that.

Joey🔸Feb 4 20184

Thanks, David. Your interpretation is indeed what I was trying to get across.

ArepoJan 30 20181

I read this the same way as Max. The issue of cost to solve (eg) all cases of malaria is really tractability, not scale. Scale is how many people would be helped (and to what degree) by doing so. Divide the latter by the former, and you have a sensible-looking cost-benefit analysis, (that is sensitive to the 'size and intensity of the problem', ie the former).

I do think there are scale-related issues with drawing lines between 'problems', though - if a marginal contribution to malaria nets now achieves twice as much good as the same marginal contribution would in 5 years, are combatting malaria now and combatting malaria in five years 'different problems', or do you just try to average out the cost-benefit ratio between somewhat arbitrary points (eg now and when the last case of malaria is prevented/cured). But I also think the models Max and Owen have written about on the CEA blog do a decent job of dealing with this kind of question.

[anonymous]Feb 2 20180

Your argument does not suggest that there is a problem with the commonly used conception of scale, but rather with how it is combined with tractability and neglectedness. Thus, it does not support the claims made in the main piece.

David_MossFeb 3 20181

I disagree on both counts. I think my comment is recapitulating the core claims of the main piece (and am pretty confident the author would agree).

In my comment I mention the total S/T/N framework only because MaxDalton suggested that when properly viewed within that framework, the concerns with 'scale' Joey raised, don't apply. I argued that that Joey's concerns apply even if you are applying the full S/T/N framework, but I don't think they apply only if you are applying the full framework.

[anonymous]Feb 3 20180

OK, but then the issue is problem individuation, not the conception of scale used.

David_MossJan 30 20184

Agree. We might ask: why do we care about scale in the first place? Presumably because, in many cases, it means our efforts can help more. But in cases where larger scale does not mean that our efforts can help more (because we cannot help beyond a certain scale), we should not care about the larger scale of the problem.

Peter WildefordJan 31 20183

Do people really think of scale as a bottleneck? I take this article to mean "maybe scale isn't really important to think about if you're unlikely to ever reach that scale".

Perhaps scale could be thought as the inverse of the diminishing returns rate (e.g., more scale = less diminishing returns = more ability to take funding). This seems useful to think about to me.

Maybe the argument should be that when thinking about scale, neglectedness, and tractability, we should put more emphasis on tractability and also think about the tractability of attracting funding / resources needed to meet the scale?

Benjamin_ToddFeb 10 20181

Perhaps scale could be thought as the inverse of the diminishing returns rate (e.g., more scale = less diminishing returns = more ability to take funding). This seems useful to think about to me.

Yes, this is why you need to consider the ratio of scale and neglectedness (for a fixed definition of the problem).

Benjamin_ToddFeb 10 20182

Quick comment: note that you can apply INT to any fraction of the problem (1% / 10% / 100%). The key is just that you use the same fraction for N and T as well. That's why we define the framework using "% of problem solved" rather than "solve the whole problem". https://80000hours.org/articles/problem-framework/

If you run into heavily diminishing returns at the 10% mark, then applying INT to 10% of the problem should yield better results.

This can mean that very narrowly defined problems will often be more effective than broad ones, so it's important to compare problems of roughly the same scale. Also note that narrowly defined problem areas are less useful - the whole point of having relatively broad areas is to build career capital that's relevant to more than just one project.

Finally, our overall process is (i) problems (ii) methods (iii) personal fit. Within methods you should think about the key bottlenecks within the problem area, so it partly gets captured there. Expected impact is roughly the multiple of the three. So, I agree people shouldn't use problem selection as an absolute filter, since it could be better to work on a medium-ranked problem with a great method and personal fit.

MichaelPlantJan 30 20181

You've scooped me! I've got a post on the SNT framework in the works. On the scale bit:

The relevant consideration here seems to be systemic vs atomic changes. Former affects all of the cause, or has a chance of doing so. Latter just affects a small part with no further impacts, hence 'atom'. Example of former would be curing cancer, example of latter would be treating one case of it.

Assessing the total scale of a cause is only relevant if you're calculating the expected value of systemic interventions. I generally agree it's a mistake to force people to size up the entire cause - as 80k do, for instance - because it's not necessary if you're just look at atomic interventions.

Benjamin_ToddFeb 10 20182

I generally agree it's a mistake to force people to size up the entire cause - as 80k do

We don't - see my comment above.

RandomEAJan 31 20181

For an atomic intervention, the relevant scale is the amount of good that can be done by a given amount of money, the relevant tractability is whether there is good evidence that the intervention works, and the relevant neglectedness is room for more funding. (This is the GiveWell framework.)

For a systemic intervention, the relevant scale is the amount of good that can be done by solving the problem, the relevant tractability is how much of the problem would be solved (in expectation) by increasing the resources going towards it by X%, and the relevant neglectedness is the amount of resources it would take to increase by X% the amount of resources devoted to the problem. (If there are increasing marginal returns or diminishing marginal returns, then X should be chosen based on the amount that the prospective donor/prospective employee would increase resources. If there are constant marginal returns, X can be set based on what would make it easier to predict how much of a problem would be solved (e.g. choosing a 50% increase even though your donation would only amount to a 0.1% increase because it's easier to get a sense of how much of a problem would be solved by a 50% increase). (This is the 80,000 Hours framework.)

When EAs discuss scale, they generally mean scale in the sense that the term is used for systemic interventions (i.e. the scale of the problem, not the scale of the good the intervention would do). When EAs discuss tractability, they generally mean tractability in the sense that the term is used for atomic interventions (i.e. whether the intervention would be successful, not how much of the problem it would solve in expectation). EAs should avoid mixing and matching scale and tractability in this way.

See my previous comment here for a lengthier discussion of this issue.