All of David Johnston's Comments + Replies

Why aren't you freaking out about OpenAI? At what point would you start?

I think this post and Yudkowski's Twitter thread that started it are probably harmful to the cause of AI safety.

OpenAI is one of the top AI labs worldwide, and the difference between their cooperation and antagonism to the AI safety community means a lot for the overall project. Elon Musk might be one of the top private funders of AI research, so his cooperation is also important.

I think that both this post and the Twitter thread reduce the likelihood of cooperation without accomplishing enough in return. I think that the potential to do harm to potential ... (read more)

Thanks for the recommendation. I spent about an hour looking for contact info, but was only able to find 5 public addresses of ex-OpenAI employees involved in the recent exodus. I emailed them all, and provided an anonymous Google Form as well. I'll provide an update if I do hear back from anyone.


Unfortunately we may be unlikely to get a statement from a departed safety researcher beyond mine (, at least currently.

How would you run the Petrov Day game?

It seems like the game would better approximate the game of mutually assured destruction if the two sides had unaligned aims somehow, and destroying the page could impede "their" ability to get in "our" way.

Maybe the site that gets more new registrations on Petrov day has the right to demand that the loser advertise something of their choice for 1 month after Petrov day. Preferably, make the competition something that will be close to 50/50 beforehand.

The two communities could try to negotiate an outcome acceptable to everyone or nuke the other to try to avoid having to trust them or do what they want.

3BenMillwood1moLike Sanjay's answer, I think this is a correct diagnosis of a problem, but I think the advertising solution is worse than the problem. * A month of harm seems too long to me, * I can't think of anything we'd want to advertise on LW that we wouldn't already want to advertise on EAF, and we've chosen "no ads" in that case.
The motivated reasoning critique of effective altruism

Here's one possible way to distinguish the two: Under the optimizer's curse + judgement stickiness scenario retrospective evaluation should usually take a step towards the truth, though it could be a very small one if judgements are very sticky! Under motivated reasoning, retrospective evaluation should take a step towards the "desired truth" (or some combination of truth an desired truth, if the organisation wants both).

The motivated reasoning critique of effective altruism

I like this post. Some ideas inspired by it:

If "bias" is pervasive among EA organisations, the most direct implication of this seems to me to be that we shouldn't take judgements published by EA organisations at face value. That is, if we want to know what is true we should apply some kind of adjustment to their published judgements.

It might also be possible to reduce bias in EA organisations, but that depends on other propositions like how effective debiasing strategies actually are.

A question that arises is "what sort of adjustment should be applied?". T... (read more)

5Linch1moThanks for your extensions! Worth pondering more. I think this is first-order correct (and what my post was trying to get at). Second-order, I think there's at least one important caveat (which I cut from my post) with just tallying total number (or importance-weighted number of) errors towards versus away from the desired conclusion as a proxy for motivated reasoning. Namely, you can't easily differentiate "motivated reasoning" biases from perfectly innocent traditional optimizer's curse [] . Suppose an organization is considering 20 possible interventions and do initial cost-effectiveness analyses for each of them. If they have a perfectly healthy and unbiased epistemic process, then the top 2 interventions that they've selected from that list would a) in expectation be better than the other 18 and b) in expectation will have more errors slanted towards higher impact rather than lower impact. If they then implement the top 2 interventions and do an impact assessment 1 year later, then I think it's likely the original errors (not necessarily biases) from the initial assessment will carry through. External red-teamers will then discover that these errors are systematically biased upwards, but at least on first blush "naive optimizer's curse issues" looks importantly different in form, mitigation measures, etc, from motivated reasoning concerns. I think it's likely that either formal Bayesian modeling or more qualitative assessments can allow us to differentiate the two hypotheses.