All of Michael_Wiebe's Comments + Replies

Should you "trust literatures, not papers"?
I replicated the literature on meritocratic promotion in China, and found that the evidence is not robust.

https://twitter.com/michael_wiebe/status/1750572525439062384

Do vaccinated children have higher income as adults?
I replicate a paper on the 1963 measles vaccine, and find that it is unable to answer the question.

https://twitter.com/michael_wiebe/status/1750197740603367689

I've written up my replication of Cook (2014) on racial violence and patenting by Black inventors.

Bottom line: I believe the conclusions, but I don't trust the results.

https://twitter.com/michael_wiebe/status/1749831822262018476

New replication: I find that the results in Moretti (AER 2021) are caused by coding errors.
The paper studies agglomeration effects for innovation, but the results supporting a causal interpretation don't hold up.

https://twitter.com/michael_wiebe/status/1749462957132759489

7
Karthik Tadepalli
3mo
You might want to add what the subject of Moretti (2021) is, and what the result is, just so people know if they're interested in learning more.

Angus Deaton writes that in academia and policy circles, “Past development practice is seen as a succession of fads, with one supposed magic bullet replacing another—from planning to infrastructure to human capital to structural adjustment to health and social capital to the environment and back to infrastructure—a process that seems not to be guided by progressive learning.”

This framing is weird. Obviously these factors have a positive causal effect on growth. But why would you expect a silver bullet? Conditions change over time, so the constraints on growth will change as well.

I'd expect this article to be pretty solid, but errors in top journals do happen.

1
Seth Ariel Green
5mo
Yep, I recall this case from Bryan Caplan as well: https://betonit.substack.com/p/a-correction-on-housing-regulation I happen to think Johannes is unusually careful about this stuff; per the original UCT evaluation: so I assume a similar level of care in Egger et al., on which he is coauthor
  • person-affecting views
  • supporting a non-zero pure discount rate

I think non-longtermists don't hold these premises; rather, they object to longtermism on tractability grounds.
 

What AI safety work should altruists do? For example, AI companies are self-interestedly working on RLHF, so there's no need for altruists to work on it. (And even stronger, working on RLHF is actively harmful because it advances capabilities.)

Tweet-thread promoting Rotblat Day on Aug. 31, to commemorate the spirit of questioning whether a dangerous project should be continued.

On Rotblat day, people post what signs they would look for to determine if their work was being counterproductive.

How about May 7, the day of the German surrender?

I found the opening paragraph a bit confusing. Suggested edits:

  • Last week a young man named Onekalit turned up 
  • The dry cough wore him down for over a month, but last week he finally managed to cough a bit of sputum
    • Why 'finally'? This makes it sound like the dry cough was preventing the collection of sputum.
  • The incredible Gates-Foundation-funded GeneXpert test [add hyphens]
2
NickLaing
10mo
Thanks so much appreciate it, have made the edits! Yes though, the dry cough was the thing preventing the collection of sputum.

Side note: what's up with "model evals"? Seems like a jargony term that excludes outsiders.

An evaluation by the National Academy of Sciences estimates PEPFAR has saved millions of lives (PEPFAR itself claims 25 million).

 

The dominant conceptual apparatus economists use to evaluate social policies—comparative cost-effectiveness analysis, which focuses on a specific goal like saving lives, and ranks policies by lives saved per dollar—suggested America’s foreign aid budget could’ve been better spent on condoms and awareness campaigns, or even malaria and diarrheal diseases.

As already mentioned by others, these two claims are consistent.

Related:

The Think Tank Initiative was dedicated to strengthening the capacity of independent policy research institutions in the developing world. Launched in 2008 and managed by IDRC, TTI was a partnership between five donors. The program ended in 2019.

Potential topic: state governments enforcing housing plans on municipalities.

Has anyone thought about the effects of air pollution on animal welfare?

Has anyone looked at the effect of air pollution on animal welfare (farmed or wild)?

This is a relevant question if you're thinking about how hard you should try to drive engagement on a forecasting question.

What is the 'policy relevance' of answering the title question? Ie. if the answer is "yes, forecaster count strongly increases accuracy", how would you go about increasing the number of forecasters?

1
nikos
1y
For Metaculus there are lots of ways to drive engagement: prioritise making the platform easier to use, increase cash prizes, community building and outreach etc.  But as mentioned in the article the problem in practice is that the bootstrap answer is probably misleading, as increasing the number of forecasters likely changes forecaster composition.  However, one specific example where the analysis might be actually applicable is when you're thinking about how many Pro Forecasters you hire for a job. 

Yes, this is the main difference compared to forecasters being randomly assigned to a question.

I don't think you can learn much from observational data like this about the causal effect of the number of forecasters on performance. Do you have any natural experiments that you could exploit? (ie. some 'random' factor affecting the number of forecasters, that's not correlated with forecaster skill.) Or can you run a randomized experiment?

3
David Rhys Bernard
1y
Can you explain more why the bootstrapping approach doesn't give a causal effect (or something pretty close to one) here? The aggregate approach is clearly confounded since questions with more answers are likely easier. But once you condition on the question and directly control the number of forecasters via bootstrapping different sample sizes, it doesn't seem like there are any potential unobserved confounders remaining (other than the time issue Nikos mentioned). I don't see what a natural experiment or RCT would provide above the bootstrapping approach.

It sounds like you're doing subsampling. Bootstrapping is random sampling with replacement.

If, for example, we kept increasing the size of the sample we draw, then eventually the variance would be guaranteed to go to zero (when the sample size equals the total number of forecasters and there is only one possible sample we can draw).

With bootstrapping, there are  possible draws when the bootstrap sample size is equal to the actual sample size . (And you could choose a bootstrap sample size .)

3
nikos
1y
Ah snap! I forgot to remove that paragraph... I did subsampling initially, then switched to bootstrapipng. Resulsts remained virtually unchanged. Thanks for pointing that out, will update the text. 

Imagine two cities. In one, it is safe for women to walk around at night and in the second it is not. I think the former city is better even if women don’t want to walk around at night, because I think that option is valuable to people even if they do not take it. Preference-satisfaction approaches miss this.

Don't people also have preferences for having more options?

1
ryancbriggs
1y
I believe that’s generally outside the model. It’s like asking if people have preferences about the ranking of their preferences.

I'm surprised the Nigerian business plan competition was not included. (Chris Blattman writeup from 2015 here: "Is this the most effective development program in history?".)

I say "They were arguably right, ex ante, to advocate for and participate in a project to deter the Nazi use of nuclear weapons." Actions in 1939-42 or around 1957-1959 are defensible.

Given this, is it accurate to call Einstein's letter a 'tragedy'? The tragic part was continuing the nuclear program after the German program was shut down.

  • 2 August 1939: Einstein-Szilárd letter to Roosevelt advocates for setting up a Manhattan Project. [...]
  • June 1942: Hitler decides against an atomic program for practical reasons.

Is it accurate to say that the US and Germans were in a nuclear weapons race until 1942? So perhaps the takeaway is "if you're in a race, make sure to keep checking that the race is still on".

2
HaydnBelfield
1y
I think the crucial thing is funding levels.  It was only by October 1941 (after substantial nudging from the British) that Roosevelt approved serious funding. As a reminder, I'm particularly interested in ‘sprint’ projects with substantial funding: for example those in which the peak year funding reached 0.4% of GDP (Stine, 2009, see also Grace, 2015). So to some extent they were in a race 1939-1942, but I would suggest it wasn't particularly intense, it wasn't a sprint race.

How much would I personally have to reduce X-risk to make this the optimal decision? Well, that’s simple. We just calculate: 

  • 25 billion * X = 20,000 lives saved
  • X = 20,000 / 25 billion
  • X = 0.0000008 
  • That’s 0.00008% in x-risk reduction for a single individual.

I'm not sure I follow this exercise. Here's how I'm thinking about it:

Option A: spend your career on malaria.

  • Cost: one career
  • Payoff: save 20k lives with probability 1.

Option B: spend your career on x-risk.

  • Cost: one career
  • Payoff: save 25B lives with probability p (=P(prevent extinction)), save 0
... (read more)
1
Phosphorous
1y
I think that  the expected payoff and the reduction in P(extinction)  are just equivalent. Like, a 1% chance of saving 25b is the same as reducing P(extinction) from 7% to 6%, that's what a "1% chance of saving" means, because:  p(extinction) = 1 - p(extinction reduction from me) - p(extinction reduction from all other causes) So, if I had a 100% chance of saving 25b lives, then that'd be a 100% reduction in extinction risk. Of course, what we care about is the counterfactual, so if there's already only a 50% chance of extinction, then you could say colloquially that I brought P(extinction) from 0.5 to 0, and there I had a "100% chance of saving 25b lives" but that's not quite right, because I should only get credit for reducing it from 0.5 to 0, so it would be better in that scenario to say that I had a 50% chance of saving 25b, and that's just as high as that can get. 

How much would I personally have to reduce X-risk to make this the optimal decision?

Shouldn't this exercise start with the current P(extinction), and then calculate how much you need to reduce that probability? I think your approach is comparing two outcomes: save 25B lives with probability p, or save 20,000 lives with probability 1. Then the first option has higher expected value if p>20000/25B. But this isn't answering your question of personally reducing x-risk.

Also, I think you should calculate marginal expected value, ie., the value of additional resources conditional on the resources already allocated, to account for diminishing marginal returns.

3
Phosphorous
1y
Hey thank you for this comment. We actually started by thinking about  P(extinction) but came to believe that it wasn't relevant, because in terms of expected value, reducing P(extinction) from 95% to 94% is equivalent to reducing it from 3% to 2%, or from any other amount to any other amount (keeping the difference the same). All that matters is the change in P(extinction).  Also, in terms of marginal expected value,  that would be the next step in this process. I'm not saying with this post "Go work on X-Risk because it's marginal EV is likely to be X" I'm rather saying, "You should go work on X-Risk if  it's marginal EV  is above X." But to be honest, I have no idea how to figure the first question out. I'd really like to, but I don't know of anyone who has even attempted to give an estimate on how much a particular intervention might reduce x-risk (please, forum, tell me where I can find this.) 

Adding to the causal evidence, there's a 2019 paper that uses wind direction as an instrumental variable for PM2.5. They find that IV  > OLS, implying that observational studies are biased downwards:

Comparing the OLS estimates to the IV estimates in Tables 2 and 3 provides strong evidence that observational studies of the relationship between air pollution and health outcomes suffer from significant bias: virtually all our OLS estimates are smaller than the corresponding IV estimates. If the only source of bias were classical measurement error, whi

... (read more)

Related, John von Neumann on x-risk:

Finally and, I believe, most importantly, prohibition of technology (invention and development, which are hardly separable from underlying scientific inquiry), is contrary to the whole ethos of the industrial age. It is irreconcilable with a major mode of intellectuality as our age understands it. It is hard to imagine such a restraint successfully imposed in our civilization. Only if those disasters that we fear had already occurred, only if humanity were already completely disillusioned about technological civilization

... (read more)
2
Maxwell Tabarrok
2y
Yes, this paper is great and it was an inspiration for my piece. I found his answer here pretty unsatisfying though so hopefully I was able to expand on it well.

It sounds like you're arguing that we should estimate 'good done/additional resources' directly (via Fermi estimates), instead of indirectly using the ITN framework. But shouldn't these give the same answer?

1
Karthik Tadepalli
2y
I don't think OP is opposed to multiplying them together.

And even when you can multiply the three quantities together, I feel like speaking in terms of importance, neglectedness and tractability might make you feel that there is no total ordering of intervention (“some have higher importance, some have higher tractability, whether you prefer one or the other is a matter a personal taste”)

I don't follow this. If you multiply I*T*N and get 'good done/additional resources', how is that not an ordering?

3
frib
2y
That's an ordering! It's mostly analyses like the ones of 80k Hours, which do not multiply the three together, which might let you think there is no ordering. Is there a way I can make that more precise?

There seems to be a "intentions don't matter, results do" lesson that's relevant here. Intending to solve AI alignment is secondary, and doesn't mean that you're making progress on the problem.

And we don't want people saying "I'm working on AI" just for the social status, if that's not their comparative advantage and they're not actually being productive.

Yes that's exactly it! Even if a lot of people think that AI is the most important problem to work on, I would expect only a small minority to have a comparative advantage. I worry that students are setting themselves up for burnout and failure by feeling obligated to work on what's been billed as some as the most pressing/impactful cause area, and I worry that it's getting in the way of people exploring with different roles and figuring out and building out their actual comparative advantage

Hm, then I find necessitarianism quite strange. In practice, how do we identify people who exist regardless of our choices?

5
EJT
2y
I think in ordinary cases, necessitarianism ends up looking a lot like presentism. If someone presently exists, then they exist regardless of my choices. If someone doesn't yet exist, their existence likely depends on my choices (there's probably something I could do to prevent their existence). Necessitarianism and presentism do differ in some contrived cases, though. For example, suppose I'm the last living creature on Earth, and I'm about to die. I can either leave the Earth pristine or wreck the environment. Some alien will soon be born far away and then travel to Earth. This alien's life on Earth will be much better if I leave the Earth pristine. Presentism implies that it doesn't matter whether I wreck the Earth, because the alien doesn't exist yet. Necessitarianism implies that it would be bad to wreck the Earth, because the alien will exist regardless of what I do.

The longtermist claim is that because humans could in theory live for hundreds of millions or billions of years, and we have potential to get the risk of extinction very almost to 0, the biggest effects of our actions are almost all in how they affect the far future. Therefore, if we can find a way to predictably improve the far future this is likely to be, certainly from a utilitarian perspective, the best thing we can do.

I don't find this framing very useful. The importance-tractability-crowdedness framework gives us a sophisticated method for evaluating... (read more)

Because of this heavy tailed distribution of interventions

Is it actually heavy-tailed? It looks like an ordered bar chart, not a histogram, so it's hard to tell what the tails are like.

1
Lorenzo Buonanno
2y
What do you mean? It looks like a histogram to me Oh nevermind, I see what you mean, indeed the y axis seems to indicate the intervention, not the number of interventions. Still, wouldn't a histogram be very similar?

Zach and Kelly Weinersmith are writing a book on space settlement. Might be worth reaching out to them.

What do you think of the Bayesian solution, where you shrink your EV estimate towards a prior (thereby avoiding the fanatical outcomes)?

5
Derek Shiller
2y
Thanks for sharing this. My (quick) reading is that the idea is to treat expected value calculations not as gospel, but as if they are experiments with estimated error intervals. These experiments should then inform, but not totally supplant, our prior. That seems sensible for givewell’s use cases, but I don’t follow the application to pascal’s mugging cases or better supported fanatical projects. The issue is that they don’t have expected value calculations that make sense to regard as experiments. Perhaps the proposal is that we should have a gut estimate and a gut confidence based on not thinking through the issues much, and another estimate based on making some guesses and plugging in the numbers, and we should reconcile those. I think this would be wrong. If anything, we should take our Bayesian prior to be our estimate after thinking through all the issues, (but perhaps before plugging in all of the exact numbers). If you’ve thought through all the issues above, I think it is appropriate to allow an extremely high expected value for fanatical projects even before trying to make a precise calculation. Or at least it is reasonable for your prior to be radically uncertain.
4
Thomas Kwa
2y
There are ways to deal with Pascal's Mugger with leverage penalties, which IIRC deal with some problems but are not totally satisfying in extremes.

The three groups have completely converged by the end of the 180 day period

I find this surprising. Why don't the treated individuals stay on a permanently higher trajectory? Do they have a social reference point, and since they're ahead of their peers, they stop trying as hard?

Is the difference between actualism and necessitarianism that actualism cares about both (1) people who exist as a result of our choices, and (2) people who exist regardless of our choices; whereas necessitarianism cares only about (2)?

3
EJT
2y
Yup!

I wonder if we can back out what assumptions the 'peace pact' approach is making about these exchange rates. They are making allocations across cause areas, so they are implicitly using an exchange rate.

I get the weak impression that worldview diversification (partially) started as an approximation to expected value, and ended up being more of a peace pact between different cause areas. This peace pact disincentivizes comparisons between giving in different cause areas, which then leads to getting their marginal values out of sync. 

Do you think there's an optimal 'exchange rate' between causes (eg. present vs future lives, animal vs human lives), and that we should just do our best to approximate it? 

4
NunoSempere
2y
Yes. To elaborate on this, I think that agents should converge on such an exchange as they become more wise and understand the world better. Separately, I think that there are exchange rates that are inconsistent with each other, and I would already consider it a win to have a setup where the exchange rates aren't inconsistent.

If we don't kill ourselves in the next few centuries or millennia, almost all humans that will ever exist will live in the future.

The idea is that, after a few millenia, we'll have spread out enough to reduce extinction risks to ~0?

Even without considering that, if we stay at ~140 million births per year, in 800 years 50% of all humans will have been born in our future.
And in ~7 millennia 90% of all humans will have been born in our future.

3
Sharmake
2y
Basically, yes. Assuming civilization survives the Singularity, existential risks are effectively zero thanks to the fact that it's almost impossible to destroy an interstellar civilization.

Nice work! Sounds like movement building is very important.

Do you disagree with FTX funding lead elimination instead of marginal x-risk interventions?

4
Zach Stein-Perlman
2y
Not actively. I buy that doing a few projects with sharper focus and tighter feedback loops can be good for community health & epistemics. I would disagree if it took a significant fraction of funding away from interventions with a more clear path to doing an astronomical amount of good. (I almost added that it doesn't really feel like lead elimination is competing with more longtermist interventions for FTX funding, but there probably is a tradeoff in reality.)
Load more