I haven't seen a rigorous analysis of this, but I like looking at the slope, and I expect that it's best to include each resolved prediction as a separate data point. So there would be 743 data points, each with a y value of either 0 or 1.
There are several different sorts of systematic errors that you could look for in this kind of data, although checking for them requires including more features of each prediction than the ones that are here.
For example, to check for optimism bias you'd want to code whether each prediction is of the form "good thing will happen", "bad thing will happen", or neither. Then you can check if probabilities were too high for "good thing will happen" predictions and too low for "bad thing will happen" predictions. (Most of the example predictions were "good thing will happen" predictions, and it looks like probabilities were not generally too high, so probably optimism bias was not a major issue.)
Some other things you could check for:
Pardon my negativity, but I get the impression that you haven't thought through your impact model very carefully.
In particular, the structure where
Every week, an anonymous team of grantmakers rank all participants, and whoever accomplished the least morally impactful work that week will be kicked off the island.
is selecting for mediocrity.
Given fat tails, I expect more impact to come from the single highest impact week than from 36 weeks of not-last-place impact.
Perhaps for the season finale you could bring back the contestant who had the highest impact week of the season and have them face off against the last survivor. That could also make for more exciting television than whatever you had planned for the 36th episode.
How much overlap is there between this book & Singer's forthcoming What We Owe The Past?
I got 13/13.
q11 (endangered species) was basically a guess. I thought that an extreme answer was more likely given how the quiz was set up to be counterintuitive/surprising. Also relevant: my sense is that we've done pretty well at protecting charismatic megafauna; the fact that I've heard about a particular species being at risk doesn't provide much information either way about whether things have gotten worse for it (me hearing about it is related to things being bad for it, and it's also related to successful efforts to protect it).
On q6 (age distribution of population increase) I figured that most people are age 15-74 and that group would increase roughly proportionally with the overall increase, which gives them the majority of the increase. The increase among the elderly will be disproportionately large, but that's not enough for it to be the biggest in absolute terms since they're only like 10% of the population.
On q7 (deaths from natural disaster) I wouldn't have been surprised if the drop in death rate was balanced out by the increase in population, but I had an inkling that it was faster. And the tenor of the quiz was that the surprisingly good answer was correct, so if population growth had balanced it out then probably it would've asked about deaths per capita rather than total deaths.
For example: If there are diminishing returns to campaign spending, then taking equal amounts of money away from both campaigns would help the side which has more money.
If humanity goes extinct this century, that drastically reduces the likelihood that there are humans in our solar system 1000 years from now. So at least in some cases, looking at the effects 1000+ years in the future is pretty straightforward (conditional on the effects over the coming decades).
In order to act for the benefit of the far future (1000+ years away), you don't need to be able to track the far future effects of every possible action. You just need to find at least one course of action whose far future effects are sufficiently predictable to guide you (and good in expectation).
The initial post by Eliezer on security mindset explicitly cites Bruce Schneier as the source of the term, and quotes extensively from this piece by Schneier.
In most of his piece, by “aiming to be mediocre”, Schwitzgebel means that people’s behavior regresses to the actual moral middle of a reference class, even though they believe the moral middle is even lower.
This skirts close to a tautology. People's average moral behavior equals people's average moral behavior. The output that people's moral processes actually produce is the observed distribution of moral behavior.
The "aiming" part of Schwitzgebel's hypothesis that people aim for moral mediocrity gives it empirical content. It gets harder to pick out the empirical content when interpreting aim in the objective sense.
Unless a study is done with participants who are selected heavily for numeracy and fluency in probabilities, I would not interpret stated probabilities literally as a numerical representation of their beliefs, especially near the extremes of the scale. People are giving an answer that vaguely feels like it matches the degree of unlikeliness that they feel, but they don't have that clear a sense of what (e.g.) a probability of 1/100 means. That's why studies can get such drastically different answers depending on the response format, and why (I predict) effects like scope insensitivity are likely to show up.
I wouldn't expect the confidence question to pick up on this. e.g., Suppose that experts think that something has a 1 in a million chance and a person basically agrees with the experts' viewpoint but hasn't heard/remembered that number. So they indicate "that's very unlikely" by entering "1%" which feels like it's basically the bottom of the scale. Then on the confidence question they say that they're very confident of that answer because they feel sure that it's very unlikely.