Does your model without log(GNI per capita) basically just include a proxy for log(GNI per capita), by including other predictor variables that, in combination, are highly predictive of log(GNI per capita)?
With a pool of 1058 potential predictor variables, many of which have some relationship to economic development or material standards of living, it wouldn't be surprising if you could build a model to predict log(GNI per capita) with a very good fit. If that is possible with this pool of variables, and if log(GNI per capita) is linearly predictive of lif...
It looks like the 3 articles are in the appendix of the dissertation, on pages 65 (fear, Study A), 72 (hope, Study B), and 73 (mixed, Study C).
The effect of health insurance on health, such as the old RAND study, the Oregon Medicaid expansion, the India study from a couple years ago, or whatever else is out there.
Robin Hanson likes to cite these studies as showing that more medicine doesn't improve health, but I'm skeptical of the inference from 'not statistically significant' to 'no effect' (I'm in the comments there as "Unnamed"). I would like to see them re-analyzed based on effect size (e.g. a probability distribution or confidence interval for DALY per $).
I'd guess that this is because an x-risk intervention might have on the order of a 1/100,000 chance of averting extinction. So if you run 150k simulations, you might get 0 or 1 or 2 or 3 simulations in which the intervention does anything. Then there's another part of the model for estimating the value of averting extinction, but you're only taking 0 or 1 or 2 or 3 draws that matter from that part of the model because in the vast majority of the 150k simulations that part of the model is just multiplied by zero.
And if the intervention sometimes increases e...
I believe the paper you're referring to is "Water Treatment And Child Mortality: A Meta-Analysis And Cost-effectiveness Analysis" by Kremer, Luby, Maertens, Tan, & Więcek (2023).
The abstract of this version of the paper (which I found online) says:
...We estimated a mean cross-study reduction in the odds of all-cause under-5 mortality of about 30% (Peto odds ratio, OR, 0.72; 95% CI 0.55 to 0.92; Bayes OR 0.70; 95% CrI 0.49 to 0.93). The results were qualitatively similar under alternative modeling and data inclusion choices. Taking into account heterogenei
Two thoughts on this paper:
More from Existential Risk Observatory (@XRobservatory) on Twitter:
...It was a landmark speech by @RishiSunak: the first real recognition of existential risk by a world leader. But even better are the press questions at the end:
@itvnews: "If the risks are as big as you say, shouldn't we at the very least slow down AI development, at least long enough to understand and control the risks."
@SkyNews: "Is it fair to say we know enough already to call for a moratorium on artificial general intelligence? Would you back a moratorium on AGI?"
Sky again: "Given th
One way to build risk decay into a model is to assume that the risk is unknown within some range, and to update on survival.
A very simple version of this is to assume an unknown constant per-century extinction risk, and to start with a uniform distribution on the size of that risk. Then the probability of going extinct in the first century is 1/2 (by symmetry), and the probability of going extinct in the second century conditional on surviving the first is smaller than that (since the higher-risk worlds have disproportionately already gone extinct) - with ...
Why are these expected values finite even in the limit?
It looks like this model is assuming that there is some floor risk level that the risk never drops below, which creates an upper bound for survival probability through n time periods based on exponential decay at that floor risk level. With the time of perils model, there is a large jolt of extinction risk during the time of perils, and then exponential decay of survival probability from there at the rate given by this risk floor.
The Jupyter notebook has this value as r_low=0.0001 per time period. If a...
(Commenting on mobile, so excuse the link formatting.)
See also this comment and thread by Carl Shulman: https://forum.effectivealtruism.org/posts/zLZMsthcqfmv5J6Ev/the-discount-rate-is-not-zero?commentId=Nr35E6sTfn9cPxrwQ
Including his estimate (guess?) of 1 in a million risk per century in the long run:
In general, even assigning a low but non-tiny probability to low long run risks can allow huge expected values.
See also Tarsney's The Epistem...
Thank you very much Dan for your comments and for looking into the ins and outs of the work and highlighting various threads that could improve it.
There are two quite separate issues that you brought up here. First about infinite value, which can be recovered with new scenarios and, second, the specific parameter defaults used. The parameters the report used could be reasonable but also might seem over-optimistic or over-pessimistic, depending on your background views.
I totally agree that we should not anchor on any particular set of parameters, including ...
One way to build risk decay into a model is to assume that the risk is unknown within some range, and to update on survival.
A very simple version of this is to assume an unknown constant per-century extinction risk, and to start with a uniform distribution on the size of that risk. Then the probability of going extinct in the first century is 1/2 (by symmetry), and the probability of going extinct in the second century conditional on surviving the first is smaller than that (since the higher-risk worlds have disproportionately already gone extinct) - with ...
You can get a sense for these sorts of numbers just by looking at a binomial distribution.
e.g., Suppose that there are n events which each independently have a 45% chance of happening, and a noisy/biased/inaccurate forecaster assigns 55% to each of them.
Then the noisy forecaster will look more accurate than an accurate forecaster (who always says 45%) if >50% of the events happen, and you can use the binomial distribution to see how likely that is to happen for different values of n. For example, according to this binomial calculator, with n=51 there is...
Seems like a question where the answer has to be "it depends".
There are some questions which have a decomposition that helps with estimating them (e.g. Fermi questions like estimating the mass of the Earth), and there are some decompositions that don't help (for one thing, decompositions always stop somewhere, with components that aren't further decomposed).
Research could help add texture to "it depends", sketching out some generalizations about which sorts of decompositions are helpful, but it wouldn't show that decomposition is just generally good or just generally bad or useless.
However, an absolute reduction of cumulative risk by 10-8 requires (by definition) driving cumulative risk at least below 1-10-8. Again, you say, that must be easy. Not so. Driving cumulative risk this low requires driving per-century risk to about 1.6*10-6, barely one in a million.
I'm unclear on what this means. I currently think that humanity has better than a 10-8 chance of surviving the next billion years, so can I just say that "driving cumulative risk at least below 1-10-8" is already done? Is the 1.6*10-6 per-century risk some sort of average of 10 ...
Constant per-century risk is implausible because these are conditional probabilities, conditional on surviving up to that century, which means that they're non-independent.
For example, the probability of surviving the 80th century from now is conditioned on having survived the next 79 centuries. And the worlds where human civilization survives the next 79 centuries are mostly not worlds where we face a 10% chance of extinction risk each century and keep managing to stumble along. Rather, they’re worlds where the per-century probabilities of extinction over...
GiveWell has a 2021 post Why malnutrition treatment is one of our top research priorities, which includes a rough estimate of "a cost of about $2,000 to $18,000 per death averted" through treating "otherwise untreated episodes of malnutrition in sub-Saharan Africa." You can click through to the footnotes and the spreadsheets for more details on how they calculated that.
Is this just showing that the predictions were inaccurate before updating?
I think it's saying that predictions over the lifetime of the market are less accurate for questions where early forecasters disagreed a lot with later forecasters, compared to questions where early forecasters mostly agreed with later forecasters. Which sounds unsurprising.
That improvement of the Metaculus community prediction seems to be approximately logarithmic, meaning that doubling the number of forecasters seems to lead to a roughly constant (albeit probably diminishing) relative improvement in performance in terms of Brier Score: Going from 100 to 200 would give you a relative improvement in Brier score almost as large as when going from 10 to 20 (e.g. an improvement by x percent).
In some of the graphs it looks like the improvement diminishes more quickly than the logarithm, such that (e.g.) going from 100 to 200 give...
I think the correct adjustment would involve multiplying the effect size by something like 1.1 or 1.2. But figuring out the best way to deal with it should involve some combination of looking into this issue in more depth and/or consulting with someone with more expertise on this sort of statistical issue.
This sort of adjustment wouldn't change your bottom-line conclusions that this point estimate for deworming is smaller than the point estimate for StrongMinds, and that this estimate for deworming is not statistically significant, but it would shift some of the distributions & probabilities that you discuss (including the probability that StrongMinds has a larger well-being effect than deworming).
A low reliability outcome measure attenuates the measured effect size. So if researchers measure the effect of one intervention on a high-quality outcome measure, and they measure the effect of another intervention on a lower-quality outcome measure, the use of different measures will inflate the apparent relative impact of the intervention that got higher-quality measurement. Converting different scales into number of SDs puts them all on the same scale, but doesn't adjust for this measurement issue.
For example, if you have a continuous outcome measure an...
I don't see why you used a linear regression over time. It seems implausible that the trend over time would be (non-flat) linear, and the three data points have enough noise to make the estimate of the trend extremely noisy.
Intelligence 1: Individual cognitive abilities.
Intelligence 2: The ability to achieve a wide range of goals.
Eliezer Yudkowsky means Intelligence 2 when he talks about general intelligence. e.g., He proposed "efficient cross-domain optimization" as the definition in his post by that name. See the LW tag page for General Intelligence for more links & discussion.
The model assumes gradually diminishing returns to spending within the next year, but the intuitions behind your third voice think that much higher spending would involve marginal returns that are a lot smaller OR ~zero OR negative?
Could you post something closer to the raw survey data, in addition to the analysis spreadsheet linked in the summary section? I'd like to see something that:
The numbers that you get from this sort of exercise will depend heavily on which people you get estimates from. My guess is that which people you include matters more than what you do with the numbers that they give you.
If the people who you survey are more like the general public, rather than people around our subcultural niche where misaligned AI is a prominent concern, then I expect you'll get smaller numbers.
Whereas, in Rob Bensinger's 2021 survey of "people working on long-term AI risk", every one of the 44 people who answered the survey gave an estim...
If the estimates for the different components were independent, then wouldn't the distribution of synthetic estimates be the same as the distribution of individual people's estimates?
Multiplying Alice's p1 x Bob's p2 x Carol's p3 x ... would draw from the same distribution as multiplying Alice's p1 x Alice's p2 x Alice's p3 ... , if estimates to the different questions are unrelated.
So you could see how much non-independence affects the bottom-line results just by comparing the synthetic distribution with the distribution of individual estimates (treating ...
Does the table in section 3.2 take the geometric mean for each of the 6 components?
From footnote 7 it looks like it does, but if it does then I don't see how this gives such a different bottom line probability from the synthetic method geomean in section 4 (18.7% vs. 1.65% for all respondents). Unless some probabilities are very close to 1, and those have a big influence on the numbers in the section 3.2 table? Or my intuitions about these methods are just off.
Have you looked at how sensitive this analysis is to outliers, or to (say) the most extreme 10% of responses on each component?
The recent Samotsvety nuclear risk estimate removed the largest and smallest forecast (out of 7) for each component before aggregating (the remaining 5 forecasts) with the geometric mean. Would a similar adjustment here change the bottom line much (for the single probability and/or the distribution over "worlds")?
The prima facie case for worrying about outliers actually seems significantly stronger for this survey than for an org l...
I had not thought to do that, and it seems quite sensible (I agree with your point about prima facie worry about low outliers). The results are below.
To my eye, the general mechanism I wanted to defend about is preserved (there is an asymmetric probability of finding yourself in a low-risk world), but the probability of finding yourself in an ultra-low-risk world has significantly lowered, with that probability mass roughly redistributing itself around the geometric mean (which itself has gone up to 7%-ish)
In some sense this isn't totally surprising - remo...
A passage from Superforecasting:
...Flash back to early 2012. How likely is the Assad regime to fall? Arguments against a fall include (1) the regime has well-armed core supporters; (2) it has powerful regional allies. Arguments in favor of a fall include (1) the Syrian army is suffering massive defections; (2) the rebels have some momentum, with fighting reaching the capital. Suppose you weight the strength of t
Two empirical reasons not to take the extreme scope neglect in studies like the 2,000 vs 200,000 birds one as directly reflecting people's values.
First, the results of studies like this depend on how you ask the question. A simple variation which generally leads to more scope sensitivity is to present the two options side by side, so that the same people would be asked both about 2,000 birds and about the 200,000 birds (some call this "joint evaluation" in contrast to "separate evaluation"). Other variations also generally produce more scope sensitive resu...
It would be interesting whether the forecasters with outlier numbers stand by those forecasts on reflection, and to hear their reasoning if so. In cases where outlier forecasts reflect insight, how do we capture that insight rather than brushing them aside with the noise? Checking in with those forecasters after their forecasts have been flagged as suspicious-to-others is a start.
The p(month|year) number is especially relevant, since that is not just an input into the bottom line estimate, but also has direct implications for individual planning. The plan ...
These numbers seem pretty all-over-the-place. On nearly every question, the odds given by the 7 forecasters span at least 2 orders of magnitude, and often substantially more. And the majority of forecasters (4/7) gave multiple answers which seem implausible (details below) in ways that suggest that their numbers aren't coming from a coherent picture of the situation.
I have collected the numbers in a spreadsheet and highlighted (in red) the ones that seem implausible to me.
Odds span at least 2 orders of magnitude:
Another commenter noted that the answers to ...
The Less Wrong posts Politics as Charity from 2010 and Voting is like donating thousands of dollars to charity from November 2012 have similar analyses to the 2020 80k article.
Agreed that there are some contexts where there's more value in getting distributions, like with the Fermi paradox.
Or, before the grants are given out, you could ask people to give an ex ante distribution for "what will be your ex post point estimate of the value of this grant?" That feeds directly into VOI calculations, and it is clearly defined what the distribution represents. But note that it requires focusing on point estimates ex post.
I think it would've been better to just elicit point estimates of the grants' expected value, rather than distributions. Using distributions adds complexity, for not much benefit, and it's somewhat unclear what the distributions even represent.
Added complexity: for researchers giving their elicitations, for the data analysis, for readers trying to interpret the results. This can make the process slower, lead to errors, and lead to different people interpreting things differently. e.g., For including both positive & negative numbers in the distributions...
In the table with post-discussion distributions, how is the lower bound of the aggregate distribution for the Open Phil AI Fellowship -73, when the lowest lower bound for an individual researcher is -2.4? Also in that row, Researcher 3's distribution is given as "250 to 320", which doesn't include their median (35) and is too large for a scale that's normalized to 100.
I haven't seen a rigorous analysis of this, but I like looking at the slope, and I expect that it's best to include each resolved prediction as a separate data point. So there would be 743 data points, each with a y value of either 0 or 1.
There are several different sorts of systematic errors that you could look for in this kind of data, although checking for them requires including more features of each prediction than the ones that are here.
For example, to check for optimism bias you'd want to code whether each prediction is of the form "good thing will happen", "bad thing will happen", or neither. Then you can check if probabilities were too high for "good thing will happen" predictions and too low for "bad thing will happen" predictions. (Most of the example predictions were "good thing...
Pardon my negativity, but I get the impression that you haven't thought through your impact model very carefully.
In particular, the structure where
Every week, an anonymous team of grantmakers rank all participants, and whoever accomplished the least morally impactful work that week will be kicked off the island.
is selecting for mediocrity.
Given fat tails, I expect more impact to come from the single highest impact week than from 36 weeks of not-last-place impact.
Perhaps for the season finale you could bring back the contestant who had the highest imp...
I got 13/13.
q11 (endangered species) was basically a guess. I thought that an extreme answer was more likely given how the quiz was set up to be counterintuitive/surprising. Also relevant: my sense is that we've done pretty well at protecting charismatic megafauna; the fact that I've heard about a particular species being at risk doesn't provide much information either way about whether things have gotten worse for it (me hearing about it is related to things being bad for it, and it's also related to successful efforts to protect it).
On q6 (age distributi...
For example: If there are diminishing returns to campaign spending, then taking equal amounts of money away from both campaigns would help the side which has more money.
If humanity goes extinct this century, that drastically reduces the likelihood that there are humans in our solar system 1000 years from now. So at least in some cases, looking at the effects 1000+ years in the future is pretty straightforward (conditional on the effects over the coming decades).
In order to act for the benefit of the far future (1000+ years away), you don't need to be able to track the far future effects of every possible action. You just need to find at least one course of action whose far future effects are sufficiently predictable to guide you (and good in expectation).
The initial post by Eliezer on security mindset explicitly cites Bruce Schneier as the source of the term, and quotes extensively from this piece by Schneier.
In most of his piece, by “aiming to be mediocre”, Schwitzgebel means that people’s behavior regresses to the actual moral middle of a reference class, even though they believe the moral middle is even lower.
This skirts close to a tautology. People's average moral behavior equals people's average moral behavior. The output that people's moral processes actually produce is the observed distribution of moral behavior.
The "aiming" part of Schwitzgebel's hypothesis that people aim for moral mediocrity gives it empirical content. It gets harder to pick out the empirical content when interpreting aim in the objective sense.
Unless a study is done with participants who are selected heavily for numeracy and fluency in probabilities, I would not interpret stated probabilities literally as a numerical representation of their beliefs, especially near the extremes of the scale. People are giving an answer that vaguely feels like it matches the degree of unlikeliness that they feel, but they don't have that clear a sense of what (e.g.) a probability of 1/100 means. That's why studies can get such drastically different answers depending on the response format, and why (I predict) eff...
That can be tested on these data, just by looking at the first of the 3 questions that each participant got, since the post says that "Participants were asked about the likelihood of humans going extinct in 50, 100, and 500 years (presented in a random order)."
I expect that there was a fair amount of scope insensitivity. e.g., That people who got the "probability of extinction within 50 years" question first gave larger answers to the other questions than people who got the "probability of extinction within 500 years" question first.
I agree that asking about 2016 donations in early 2017 is an improvement for this. If future surveys are just going to ask about one year of donations then that's pretty much all you can do with the timing of the survey.
In the meantime, it is pretty easy to filter the data accordingly -- if you look only at donations made by EAs who stated that they joined on 2014 or before, the median donation is $1280.20 for 2015 and $1500 for 2016.
This seems like a better way to do the analyses. I think that the post would be more informative & easier to inter...
It is also worth noting that the survey was asking people who identify as EA in 2017 how much they donated in 2015 and 2016. These people weren't necessarily EAs in 2015 or 2016.
Looking at the raw data of when respondents said that they first became involved in EA, I'm getting that:
7% became EAs in 2017
28% became EAs in 2016
24% became EAs in 2015
41% became EAs in 2014 or earlier
(assuming that everyone who took the "Donations Only" survey became an EA before 2015, and leaving out everyone else who didn't answer the question about when they bec...
This year, a “Donations Only” version of the survey was created for respondents who had filled out the survey in prior years. This version was shorter and could be linked to responses from prior years if the respondent provided the same email address each year.
Are these data from prior surveys included in the raw data file, for people who did the Donations Only version this year? At the bottom of the raw data file I see a bunch of entries which appear not to have any data besides income & donations - my guess is that those are either all the people who took the Donations Only version, or maybe just the ones who didn't provide an email address that could link their responses.
https://delib.zendesk.com/hc/en-us/articles/205061169-Creating-footnotes-HTML-anchors
It might be possible to fix in a not-too-tedious way, by using find-replace in the source code to edit all of the broken links (and anchors?) at once.
It appears that this analysis did not account for when people became EAs. It looked at donations in 2014, among people who in November 2015 were nonstudent EAs on an earning to give path. But less than half of those people were nonstudent EAs on an earning to give path at the start of 2014.
In fact, less than half of the people who took the Nov 2015 survey were EAs at the start of 2014. I've taken a look at the dataset, and among the 1171 EAs who answered the question about 2014 donations:
40% first got involved in EA in 2013 or earlier
21% first got involved...
I've raised related points here, and also here with followup, about how exponential decay with a fixed decay rate is not a good model to use for estimating long-term survival probability.