Thomas Kwa

Research contractor @ MIRI
2633Berkeley, CA, USAJoined Feb 2020



Doing alignment research with Vivek Hebbar's team at MIRI.


What do you mean by "compassionate"?

Should the EA Forum team stop optimizing for engagement?
I heard that the EA forum team tries to optimize the forum for engagement (tests features to see if they improve engagement). There are positives to this, but on net it worries me. Taken to the extreme, this is a destructive practice, as it would

  • normalize and encourage clickbait;
  • cause thoughtful comments to be replaced by louder and more abundant voices (for a constant time spent thinking, you can post either 1 thoughtful comment or several hasty comments. Measuring session length fixes this but adds more problems);
  • cause people with important jobs to spend more time on EA Forum than is optimal;
  • avoid community members and "EA" itself from keeping their identities small, as politics is an endless source of engagement;
  • distract from other possible directions of improvement, like giving topics proportionate attention, adding epistemic technology like polls and prediction market integration, improving moderation, and generally increasing quality of discussion.

I'm not confident that EA Forum is getting worse, or that tracking engagement is currently net negative, but we should at least avoid failing this exercise in Goodhart's Law.

I was thinking of reasons why I feel like I get less value from EA Forum. But this is not the same as reasons EAF might be declining in quality. So the original list would miss more insidious (to me) mechanisms by which EAF could actually be getting worse. For example I often read something like "EA Forum keeps accumulating more culture/jargon; this is questionably useful, but posts not using the EA dialect are received increasingly poorly." There are probably more that I can't think of, and it's harder for me to judge these...

Yeah, I don't think it's possible for controlled substances due to the tighter regulation.

Note that people in US/UK and presumably other places can buy drugs on the grey market (e.g. here) for less than standard prices. Although I wouldn't trust these 100%, they should be fairly safe because they're certified in other countries like India; gwern wrote about this here for modafinil and the basic analysis seems to hold for many antidepressants. The shipping times advertised are fairly long but potentially still less hassle than waiting for a doctor's appointment for each one.

Thanks. It looks reassuring that the correlations aren't as large as I thought. (How much variance is in the first principal component in log odds space though?) And yes, I now think the arguments I had weren't so much for arithmetic mean as against total independence / geometric mean, so I'll edit my comment to reflect that.

The main assumption of this post seems to be that, not only are the true values of the parameters independent, but a given person's estimates of stages are independent. This is a judgment call I'm weakly against.

Suppose you put equal weight on the opinions of Aida and Bjorn. Aida gives 10% for each of the 6 stages, and Bjorn gives 99%, so that Aida has an overall x-risk probability of 10^-6 and Bjorn has around 94%.

  • If you just take the arithmetic mean between their overall estimates, it's like saying "we might be in worlds where Aida is correct, or worlds where Bjorn is correct"
  • But if you take the geometric mean or decompose into stages, as in this post, it's like saying "we're probably in a world where each of the bits of evidence Aida and Bjorn have towards each proposition are independently 50% likely to be valid, so Aida and Bjorn are each more correct about 2-4 stages".

These give you vastly different results, 47% vs 0.4%. Which one is right? I think there are two related arguments to be made against the geometric mean, although they don't push me all the way towards using the arithmetic mean:

  • Aida and Bjorn's wildly divergent estimates on probably come from some underlying difference in their models of the world, not as independent draws. In this case where Aida is more optimistic about Bjorn on each of the 6 stages, it is unlikely that this is due to independent draws. I think this kind of multidimensional difference in optimism between alignment researchers is actually happening, so any model should take this into account.
  • If we learn that Bjorn was wrong about stage 1, then we should put less weight on his estimates for stages 2-6. (My guess is there's some copula that corresponds to a theoretically sensible way to update away from Bjorn's position treating his opinions as partially correlated, but I don't know enough statistics)
Load More