Sorry, I neglected to say thank you for this previously!
This idea sounds really cool. Brainstorming: a variant could be several people red teaming the same paper and not conferring until the end.
Viewership as in YouTube viewers? Where are you getting that stat from?
Looks like this already happened, in March 2020: https://lexfridman.com/william-macaskill/
It looks like Sam Harris interviewed Will MacAskill this year. He also interviewed Will in 2016. How might we tell if the previous interview created a similar number of new EA-survey-takers, or if this year's was particularly successful? The data from that year https://forum.effectivealtruism.org/posts/Cyuq6Yyp5bcpPfRuN/ea-survey-2017-series-how-do-people-get-into-ea doesn't seem to include a "podcast" option.
Quick take is this sounds like a pretty good bet, mostly for the indirect effects. You could do it with a 'contest' framing instead of a 'I pay you to produce book reviews' framing; idk whether that's meaningfully better.
Yeah, I agree this is unclear. But, staying away from the word 'intention' entirely, I think we can & should still ask: what is the best explanation for why this model is the one that minimizes the loss function during training? Does that explanation involve this argument about changing user preferences, or not?
One concrete experiment that could feed into this: if it were the case that feeding users extreme political content did not cause their views to become more predictable, would training select a model that didn't feed people as much extreme political content? I'd guess training would select the same model anyway, because extreme political content gets clicks in the short-term too. (But I might be wrong.)
I was surprised to see that this Gallup poll found no difference between college graduates and college nongraduates (in the US).
Younger people and more liberal people are much more likely to identify as not-straight, and EAs are generally young and liberal. I wonder how far this gets you to explaining this difference, which does need a lot of explaining since it's so big. Some stats on this (in the US).
Thanks for this work!
I'm wondering about "crazy teenager builds misaligned APS system in a basement" scenarios and to what extent you see the considerations in this report as bearing on those.
To be a bit more precise: I'm thinking about worlds where "alignment is easy" for society at large (i.e. your claim 3 is not true), but building powerful AI is feasible even for people who are not interested in taking the slightest precautions, even those that would be recommended by ordinary self-interest. I think mostly about individuals or small groups rather than organizations.
I think these scenarios are distinct from misuse scenarios (which you mention below your report is not intended to cover), though the line is blurry. If someone who wanted to see enormous damage to the world built an AI with the intent of causing such damage, and was successful, I'd call that "misuse." But I'm interested more in "crazy" than "omnicidal" here, where I don't think it's clear whether to call this "misuse" or not.
Maybe you see this as a pretty separate type of worry than what the report is intended to cover.