How much does performance differ between people?

I tried to sum up the key messages in plain language in a Twitter thread, in case that helps clarify.

How much does performance differ between people?

I think that's a good summary, but it's not only winner-takes-all effects that generate heavy-tailed outcomes.

You can get heavy tailed outcomes if performance is the product of two normally distributed factors (e.g. intelligence x effort).

It can also arise from the other factors that Max lists in another comment (e.g. scalable outputs, complex production).

Luck can also produce heavy tailed outcomes if it amplifies outcomes or is itself heavy-tailed.

How much does performance differ between people?

This is cool.

One theoretical point in favour of complexity is that complex production often looks like an 'o-ring' process, which will create heavy-tailed outcomes.

How much does performance differ between people?

On your main point, this was the kind of thing we were trying to make clearer, so it's disappointing that hasn't come through.

Just on the particular VC example:

I'm suspicious you can do a good job of predicting ex ante outcomes. After all, that's what VCs would want to do and they have enormous resources. Their strategy is basically to pick as many plausible winners as they can fund.

Most VCs only pick from the top 1-5% of startups. E.g. YC's acceptance rate is 1%, and very few startups they reject make it to series A. More data on VC acceptance rates here:

So, I think that while it's mostly luck once you get down to the top 1-5%, I think there's a lot of predictors before that.

Also see more on predictors of startup performance here:

What Makes Outreach to Progressives Hard

Thank you for this summary!

One thought that struck me is that most of the objections seem most likely to come up in response to 'GiveWell style EA'.

I expect the objections that would be raised to a longtermist-first EA would be pretty different, though with some overlap. I'd be interested in any thoughts on what they would be.

I also (speculatively) wonder if a longtermist-first EA might ultimately do better with this audience. You can do a presentation that starts with climate change, and then point out that the lack of political representation for future generations is a much more general problem.

In addition, longtermist EAs favour hits based giving, and that makes it clear that policy change is among the best interventions, while acknowledging it's very hard to measure effects, which seems more palatable than an approach highly focused on measurement of narrow metrics.

Feedback from where?

I agree - but my impression is that they consider track record when making the forward-looking estimates, and they also update their recommendations over time, in part drawing on track record. I think "doesn't consider track record" is a straw man, though there could be an interesting argument about whether more weight should be put on track record as opposed to other factors (e.g. intervention selection, cause selection, team quality).

Feedback from where?

Impact = money moved * average charity effectiveness. FP tracks money to their recommended charities, and this is their published research on the effectiveness of those charities, and why they recommended them.

Feedback from where?

We make the impact evaluation I note above available to donors (and our donors also do their own version of it). We also publish top line results publicly in our annual reviews (e.g. number of impact-adjusted plan changes) , but don't publish the case studies since they involve a ton of sensitive personal information.

Feedback from where?

Just a quick comment that I don't think the above is a good characterisation of how 80k assesses its impact. Describing our whole impact evaluation would take a while, but some key elements are:

  • We think impact is heavy tailed, so we try to identify the most high-impact 'top plan changes'. We do case studies of what impact they had and how we helped. This often involves interviewing the person, and also people who can assess their work. (Last year these interviews were done by a third party to reduce desirability bias). We then do a rough fermi estimate of the impact.

  • We also track the number of a wider class of 'criteria-based plan changes', but then take a random sample and make fermi estimates of impact so we can compare their value to the top plan changes.

If we had to choose a single metric, it would be something closer to impact-adjusted years of extra labour added to top causes, rather than the sheer number of plan changes.

We also look at other indicators like:

  • There have been other surveys of the highest-impact people who entered EA in recent years, evaluating which fraction came from 80k, which let's us make an estimate of the percentage of the EA workforce from 80k.

  • We look at the EA survey results, which let's us track things like how many people are working at EA orgs and entered via 80k.

We use number of calls as a lead metric, not an impact metric. Technically it's the number of calls with people who made an application above a quality bar, rather than the raw number. We've checked and it seems to be a proxy for the number of impact-adjusted plan changes that result from advising.

This is not to deny that assessing our impact is extremely difficult, and ultimately involves a lot of judgement calls - we were explicit about that in the last review - but we've put a lot more work into it than the above implies – probably around 5-10% of team time in recent years.

I think similar comments could be made by several of the other examples e.g. GWWC also tracks dollars donated each year to effective charities (now via the EA Funds) and total dollars pledged. They track the number of pledges as well since that's a better proxy for the community building benefits.

Load More