Dan_Keys

Posts

Sorted by New

Topic Contributions

Comments

How accurate are Open Phil's predictions?

I haven't seen a rigorous analysis of this, but I like looking at the slope, and I expect that it's best to include each resolved prediction as a separate data point. So there would be 743 data points, each with a y value of either 0 or 1.

How accurate are Open Phil's predictions?

There are several different sorts of systematic errors that you could look for in this kind of data, although checking for them requires including more features of each prediction than the ones that are here.

For example, to check for optimism bias you'd want to code whether each prediction is of the form "good thing will happen", "bad thing will happen", or neither. Then you can check if probabilities were too high for "good thing will happen" predictions and too low for "bad thing will happen" predictions. (Most of the example predictions were "good thing will happen" predictions, and it looks like probabilities were not generally too high, so probably optimism bias was not a major issue.)

Some other things you could check for:

  • tracking what the "default outcome" would be, or whether there is a natural base rate, to see if there has been a systematic tendency to overestimate the chances of a non-default outcome (or to underestimate it)
  • dividing predictions up into different types, such as predictions about outcomes in the world (e.g. >20 new global cage-free commitments), predictions about inputs / changes within the organization (e.g. will hire a comms person within 9 months), and predictions about people's opinions (e.g. [expert] will think [the grantee’s] work is ‘very good’), to check for calibration & accuracy on each type of prediction
  • trying to distinguish the relative accuracy of different forecasters. If there are too few predictions per forecaster, you could check if any forecaster-level features are correlated with overconfidence or with Brier score (e.g., experience within the org, experience making these predictions, some measure of quantitative skills). The aggregate pattern of overconfidence in the >80% and <20% bins can show up even if most forecasters are well-calibrated and only (say) 25% are overconfident, as overconfident predictions are averaged with well-calibrated predictions. And those 25% influence these sorts of results graphs more than it seems, because well-calibrated forecasters use the extreme bins less often. Even if only 25% of all predictions are made by overconfident forecasters, half of the predictions in the >80% bins might be from overconfident forecasters
Announcing Impact Island: A New EA Reality TV Show

Pardon my negativity, but I get the impression that you haven't thought through your impact model very carefully.

In particular, the structure where

Every week, an anonymous team of grantmakers rank all participants, and whoever accomplished the least morally impactful work that week will be kicked off the island. 

is selecting for mediocrity.

Given fat tails, I expect more impact to come from the single highest impact week than from 36 weeks of not-last-place impact.

Perhaps for the season finale you could bring back the contestant who had the highest impact week of the season and have them face off against the last survivor. That could also make for more exciting television than whatever you had planned for the 36th episode.

Announcing What The Future Owes Us

How much overlap is there between this book & Singer's forthcoming What We Owe The Past?

The State of the World — and Why Monkeys are Smarter than You

I got 13/13.

q11 (endangered species) was basically a guess. I thought that an extreme answer was more likely given how the quiz was set up to be counterintuitive/surprising. Also relevant: my sense is that we've done pretty well at protecting charismatic megafauna; the fact that I've heard about a particular species being at risk doesn't provide much information either way about whether things have gotten worse for it (me hearing about it is related to things being bad for it, and it's also related to successful efforts to protect it).

On q6 (age distribution of population increase) I figured that most people are age 15-74 and that group would increase roughly proportionally with the overall increase, which gives them the majority of the increase. The increase among the elderly will be disproportionately large, but that's not enough for it to be the biggest in absolute terms since they're only like 10% of the population.

On q7 (deaths from natural disaster) I wouldn't have been surprised if the drop in death rate was balanced out by the increase in population, but I had an inkling that it was faster. And the tenor of the quiz was that the surprisingly good answer was correct, so if population growth had balanced it out then probably it would've asked about deaths per capita rather than total deaths.

Getting money out of politics and into charity

For example: If there are diminishing returns to campaign spending, then taking equal amounts of money away from both campaigns would help the side which has more money.

Michael_Wiebe's Shortform

If humanity goes extinct this century, that drastically reduces the likelihood that there are humans in our solar system 1000 years from now. So at least in some cases, looking at the effects 1000+ years in the future is pretty straightforward (conditional on the effects over the coming decades).

In order to act for the benefit of the far future (1000+ years away), you don't need to be able to track the far future effects of every possible action. You just need to find at least one course of action whose far future effects are sufficiently predictable to guide you (and good in expectation).

The Web of Prevention

The initial post by Eliezer on security mindset explicitly cites Bruce Schneier as the source of the term, and quotes extensively from this piece by Schneier.

[Link] Aiming for Moral Mediocrity | Eric Schwitzgebel
In most of his piece, by “aiming to be mediocre”, Schwitzgebel means that people’s behavior regresses to the actual moral middle of a reference class, even though they believe the moral middle is even lower.

This skirts close to a tautology. People's average moral behavior equals people's average moral behavior. The output that people's moral processes actually produce is the observed distribution of moral behavior.

The "aiming" part of Schwitzgebel's hypothesis that people aim for moral mediocrity gives it empirical content. It gets harder to pick out the empirical content when interpreting aim in the objective sense.

Public Opinion about Existential Risk

Unless a study is done with participants who are selected heavily for numeracy and fluency in probabilities, I would not interpret stated probabilities literally as a numerical representation of their beliefs, especially near the extremes of the scale. People are giving an answer that vaguely feels like it matches the degree of unlikeliness that they feel, but they don't have that clear a sense of what (e.g.) a probability of 1/100 means. That's why studies can get such drastically different answers depending on the response format, and why (I predict) effects like scope insensitivity are likely to show up.

I wouldn't expect the confidence question to pick up on this. e.g., Suppose that experts think that something has a 1 in a million chance and a person basically agrees with the experts' viewpoint but hasn't heard/remembered that number. So they indicate "that's very unlikely" by entering "1%" which feels like it's basically the bottom of the scale. Then on the confidence question they say that they're very confident of that answer because they feel sure that it's very unlikely.

Load More