2

259

So taking a step back for a second, I think the primary point of collaborative written or spoken communication is to take the picture or conceptual map in my head and put it in your head, as accurately as possible. Use of *any* terms should, in my view, be assessed against whether those terms are likely to create the right picture in a reader's or listener's head. I appreciate this is a somewhat extreme position.

If everytime you use the term heavy-tailed (and it's used a lot - a quick CTRL + F tells me it's in the OP 25 times) I have to guess from context whether you mean the mathematical or commonsense definitions, it's more difficult to parse what you actually mean in any given sentence. If someone is reading and doesn't even know that those definitions substantially differ, they'll probably come away with bad conclusions.

This isn't a hypothetical corner case - I keep seeing people come to bad (or at least unsupported) conclusions in exactly this way, while thinking that their reasoning is mathematically sound and thus nigh-incontrovertible. To quote myself above:

The above, in my opinion, highlights the folly of ever thinking 'well, log-normal distributions are heavy-tailed, and this should be log-normal because things got multiplied together, so the top 1% must be at least a few percent of the overall value'.

If I noticed that use of terms like 'linear growth' or 'exponential growth' were similarly leading to bad conclusions, e.g. by being extrapolated too far beyond the range of data in the sample, I would be similarly opposed to their use. But I don't, so I'm not.

If I noticed that engineers at firms I have worked for were obsessed with replacing exponential algorithms with polynomial algorithms because they are better in some limit case, but worse in the actual use cases, I would point this out and suggest they stop thinking in those terms. But this hasn't happened, so I haven't ever done so.

I do notice that use of the term heavy-tailed (as a binary) in EA, especially with reference to the log-normal distribution, is causing people to make claims about how we should expect this to be 'a heavy-tailed distribution' and how important it therefore is to attract the top 1%, and so...you get the idea.

Still, a full taboo is unrealistic and was intended as an aside; closer to 'in my ideal world' or 'this is what I aim for my own writing', rather than a practical suggestion to others. As I said, I think the actual suggestions made in this summary are good - replacing the question 'is this heavy-tailed or not' with '*how *heavy-tailed is this' should do the trick- and hope to see them become more widely adopted.

Briefly on this, I think my issue becomes clearer if you look at the full section.

If we agree that log-normal is more likely than normal, and log-normal distributions are heavy-tailed, then saying 'By contrast, [performance in these jobs] is thin-tailed' is just incorrect? Assuming you meant the mathematical senses of heavy-tailed and thin-tailed here, which I guess I'm not sure if you did.

This uncertainty and resulting inability to assess whether this section is true or false obviously loops back to why I would prefer not to use the term 'heavy-tailed' at all, which I will address in more detail in my reply to your other comment.

Ex-postperformance appears ‘heavy-tailed’ in many relevant domains, but with very large differences inhowheavy-tailed: the top 1% account for between 4% to over 80% of the total. For instance, we find ‘heavy-tailed’ distributions (e.g. log-normal, power law) of scientific citations, startup valuations, income, and media sales. By contrast, a large meta-analysis reports ‘thin-tailed’ (Gaussian) distributions for ex-post performance in less complex jobs such as cook or mail carrier

Hi Max and Ben, a few related thoughts below. Many of these are mentioned in various places in the doc, so seem to have been understood, but nonetheless have implications for your summary and qualitative commentary, which I sometimes think misses the mark.

- Many distributions are heavy-tailed mathematically, but not in the common use of that term, which I think is closer to 'how concentrated is the thing into the top 0.1%/1%/etc.', and thus 'how important is it I find top performers' or 'how important is it to attract the top performers'. For example, you write the following:

What share of total output should we expect to come from the small fraction of people we’re most optimistic about (say, the top 1% or top 0.1%) – that is, how

heavy-tailedis the distribution of ex-ante performance?

- Often, you can't derive this directly from the distribution's mathematical type. In particular, you cannot derive it from whether a distribution is heavy-tailed in the mathematical sense.
- Log-normal distributions are particuarly common and are a particular offender here, because they tend to occur whenever lots of independent factors are multiplied together. But here is the approximate* fraction of value that comes from the top 1% in a few different log-normal distributions:

EXP(N(0,0.0001)) -> 1.02%

EXP(N(0,0001)) -> 1.08%

EXP(N(0,0.01)) -> 1.28%

EXP(N(0,0.1)) -> 2.22%

EXP(N(0,1)) -> 9.5% - For a real-world example, geometric brownian motion is the most common model of stock prices, and produces a log-normal distribution of prices, but models based on GBM actually produce pretty thin tails in the commonsense use, which are in turn much thinner than the tails in real stock markets, as (in?)famously chronicled in Taleb's Black Swan among others. Since I'm a finance person who came of age right as that book was written, I'm particularly used to thinking of the log-normal distribution as 'the stupidly-thin-tailed one', and have a brief moment of confusion every time it is referred to as 'heavy-tailed'.
- The above, in my opinion, highlights the folly of ever thinking 'well, log-normal distributions are heavy-tailed, and this should be log-normal because things got multiplied together, so the top 1% must be at least a few percent of the overall value'.
. In fact, as I understand it many oft-used examples of normal distributions, such as height and other biological properties, are actually believed to follow a log-normal distribution.**Log-normal distributions with low variance are practically indistinguishable from normal distributions**

***

I'd guess we agree on the above, though if not I'd welcome a correction. But I'll go ahead and flag bits of your summary that look weird to me assuming we agree on the mathematical facts:

By contrast, a large meta-analysis reports ‘thin-tailed’ (Gaussian) distributions for ex-post performance in less complex jobs such as cook or mail carrier [1]: the top 1% account for 3-3.7% of the total.

I haven't read the meta-analysis, but I'd tentatively bet that much like biological properties these jobs actually follow log-normal distributions and they just couldn't tell (and weren't trying to tell) the difference.

These figures illustrate that the difference between ‘thin-tailed’ and ‘heavy-tailed’ distributions can be modest in the range that matters in practice

I agree with the direction of this statement, but it's actually worse than that: depending on the tail of interest "heavy-tailed distributions" can have thinner tails than "thin-tailed distributions"! For example, compare my numbers for the top 1% of various log-normal distributions to the right-hand-side of a standard N(0,1) normal distribution where we cut off negative values (~3.5% in top 1%).

It's also somewhat common to see comments like this from 80k staff (This from Ben Todd elsewhere in this thread):

You can get heavy tailed outcomes if performance is the product of two normally distributed factors (e.g. intelligence x effort).

You indeed can, but like the log-normal distribution this will *tend to* have pretty thin tails in the common use of the term. For example, multipling two N(100,225) distributions together, chosen because this is roughly the distribution of IQ, gets you a distribution where the top 1% account for 1.6% of the total. Looping back to my above thought, I'd also guess that performance on jobs like cook and mail-carrier look very close to this, and empirically were observed to have similarly thin tails (aptitude x intelligence x effort might in fact be the right framing for these jobs).

***

Ultimately, the recommendation I would give is much the same as the bottom line presented, which I was very happy to see. Indeed, I'm mostly grumbling because I want to discourage anything which treats heavy-tailed as a binary property**, as parts of the summary/commentary tend to, see above.

Some advice for how to work with these concepts in practice:

- In practice,
don’t treat ‘heavy-tailed’ as a binary property. Instead, askhowheavy the tails of some quantity of interest are, for instance by identifying the frequency of outliers you’re interested in (e.g. top 1%, top 0.1%, …) and comparing them to the median or looking at their share of the total. [2]Carefully choose the underlying population and the metric for performance, in a way that’s tailored to the purpose of your analysis. In particular, be mindful of whether you’re looking at the full distribution or some tail (e.g. wealth of all citizens vs. wealth of billionaires).

*Approximate because I was lazy and just simulated 10000 values to get these and other quoted numbers. AFAIK the true values are not sufficiently different to affect the point I'm making.

**If it were up to me, I'd taboo the term 'heavy-tailed' entirely, because having an oft-used term whose mathematical and commonsense notions differ is an obvious recipe for miscommunication in a STEM-heavy community like this one.

I want to push back against a possible interpretation of this moderately strongly.

If the charity you are considering starting has a 40% chance of being 2x better than what is currently being done on the margin, and a 60% chance of doing nothing, I very likely want you to start it, naive 0.8x EV be damned. I could imagine wanting you to start it at much lower numbers than 0.8x, depending on the upside case. The key is to be able to monitor whether you are in the latter case, and stop if you are. Then you absorb a lot more money in the 40% case, and the actual EV becomes positive even if all the money comes from EAs.

If monitoring is basically impossible and your EV estimate is never going to get more refined, I think the case for not starting becomes clearer. I just think that's actually pretty rare?

From the donor side in areas and at times where I've been active, I've generally been very happy to give 'risky' money to things where I trust the founders to monitor and stop or switch as appropriate, and much more conservative (usually just not giving) if I don't. I hope and somewhat expect other donors are willing to do the same, but if they aren't that seems like a serious failure of the funding landscape.

I have a few thoughts here, but my most important one is that your (2), as phrased, is an argument in favour of outreach, not against it. If *you* update towards a much better way of doing good, and any significant fraction of the people you 'recruit' update with you, you presumably did much more good via recruitment than via direct work.

Put another way, recruitment defers to question of how to do good into the future, and is therefore particularly valuable if we think our ideas are going to change/improve particularly fast. By contrast, recruitment (or deferring to the future in general) is less valuable when you 'have it all figured out'; you might just want to 'get on with it' at that point.

***

It might be easier to see with an illustrated example:

Let's say in the year 2015 you are choosing whether to work on cause P, or to recruit for the broader EA movement. Without thinking about the question of shifting cause preferences, you decide to recruit, because you think that one year of recruiting generates (e.g.) two years of counterfactual EA effort at your level of ability.

In the year 2020, looking back on this choice, you observe that you now work on cause Q, which you think is 10x more impactful than cause P. With frustration and disappointment, you also observe that a 'mere' 25% of the people you recruited moved with you to cause Q, and so your original estimate of two years actually became six months (actually more because P still counts for something in this example, but ignoring that for now).

This looks bad because six months < one year, but if you focus on *impact rather than time spent *then you realise that you are comparing one year of work on cause P, to six months of work on cause Q. Since cause Q is 10x better, your outreach 5x outperformed direct work on P, versus the 2x you thought it would originally.

***

You can certainly plug in numbers where the above equation will come out the other way - suppose you had 99% attrition - but I guess I think they are pretty implausible? If you still think your (2) holds, I'm curious what (ballpark) numbers you would use.

+1. A short version of my thoughts here is that I’d be interested in changing the EA name if we can find a better alternative, because it does have some downsides, but this particular alternative seems worse from a strict persuasion perspective.

Most of the pushback I feel when talking to otherwise-promising people about EA is not really as much about content as it is about framing: it’s people feeling EA is too cold, too uncaring, too Spock-like, too thoughtless about the impact it might have on those causes deemed ineffective, too naive to realise the impact living this way will have on the people who dive into it. I think you can see this in many critiques.

(Obviously, this isn’t universal; some people embrace the Spock-like-mindset and the quantification. I do, to some extent, or I wouldn’t be here. But I’ve been steadily more convinced over the years that it’s a small minority.)

You can fight this by framing your ideas in warmer terms, but it does seem like starting at ‘Global Priorities community’ makes the battle more uphill. And I find losing this group sad, because I think the actual EA community is relatively warm, but first impressions are tough to overcome.

Low confidence on all of the above, would be happy to see data.

Thanks for this, pretty interesting analysis.

The other thing going on here is that the karma system got an overhaul when forum 2.0 launched in late 2018, giving some users 2x voting power and also introducing strong upvotes. Before that, one vote was one karma. I don't remember exactly when the new system came in, but I'd guess this is the cause of the sharp rise on your graph around December 2018. AFAIK, old votes were never re-weighted, which is why if you go back through comments on old posts you'll see a lot of things with e.g. +13 karma and 13 total votes, a pattern I don't recall ever seeing since.

Partly as a result, most of the karma old posts have will have been from people going back and upvoting them later once the new system was impemented, e.g. from memory my post from your list was around +10 for most of its life, and has drifted to its current +59 over the past couple of years.

This jumps out to me because I'm pretty sure that post was not a particularly high-engagement post even at the time it was written, but it's the second-highest 2015 post on your list. I think this is because it's been linked back to a fair amount and so can partially benefit from the karma inflation.

(None of which is meant to take away from the work you've done here, just providing some possibly-helpful context.)