Most of my stuff (even the stuff of interest to EAs) can be found on LessWrong: https://www.lesswrong.com/users/daniel-kokotajlo
One way in which this paper (or the things policymakers and CEOs might do if they read it & like it) might be net-negative:
Maybe by default AIs will mostly be trained to say whatever maximizes engagement/clicks/etc., and so they'll say all sorts of stuff and people will quickly learn that a lot of it is bullshit and only fools will place their trust in AI. In the long run, AIs will learn to deceive us, or actually come to believe their own bullshit. But at least we won't trust them.But if people listen to this paper they might build all sorts of prestigious Ministries of Truth that work hard to train AIs to be truthful, where "truthful" in practice means Sticks to the Party Line. And so the same thing happens -- AIs learn to deceive us (because there will be cases where the Party Line just isn't true, and obviously so) or else actually come to believe their own bullshit (which would arguably be worse? Hard to say.) But it happens faster, because Ministries of Truth are accelerating the process. Also, and more importantly, more humans will trust the AIs more, because they'll be saying all the right things and they'll be certified by the right Ministries.(Crossposted from LW)
However, a disadvantage of having many truthfulness-evaluation bodies is that it increases the risk that one or more of these bodies is effectively captured by some group. Consequently, an alternative would be to use decentralised evaluation bodies, perhaps modelled on existing decentralised systems like Wikipedia, open-source software projects, or prediction markets. Decentralised systems might be harder to capture because they rely on many individuals who can be both geographically dispersed and hard to identify. Overall, both the existence of multiple evaluation bodies and of decentralised bodies might help to protect against capture and allow for a nimble response to new evidence.
The first sentence suggests that by default evaluation bodies will not be captured by some biased group or other. (Why else focus on the probability that at least one body will be captured, rather than the probability that at least one will not be captured?)
Instead, when I look around me today, I see a world in which almost all evaluation bodies are captured by some biased group or other (to varying degrees) and in general the more important and influential a body is, the more likely it is to be captured. Wikipedia is the shining beacon of exception that proves the rule -- and even Wikipedia has indeed been captured to a not-yet-appreciated extent by biased groups (talk to e.g. Gwern about this if you want more details and examples).I would say it's good to have multiple evaluation bodies because that increases the chance that maybe, just maybe, there will be one which is not captured by some biased group pushing an agenda. (I don't mean to be dumping on this paper, by the way -- I think it's very important work pushing in the right direction, and I'm heartened that you wrote it)
FWIW, my gut says this is unlikely to work but better than doing nothing and hoping for the best.
My main objection is to number 5: Wisdom and intelligence interventions are promising enough to justify significant work in prioritization.
The objection is a combination of:--Changing societal values/culture/ habits is hard. Society is big and many powerful groups are already trying to change it in various ways.--When you try, often people will interpret that as a threatening political move and push back.
--We don't have much time left.Overall I still think this is promising, I just thought I'd say what the main crux is for me.
Not sure, but it seems like a somewhat important variable in my mental model of how the future will go.
My hot take is that this seems valuable to me! Potentially harmful if you publish it on the internet, but you've probably thought of that concern already.
Update: A friend of mine read this as me endorsing doing PhD's and was surprised. I do not generally endorse doing PhDs at this late hour. (However, there are exceptions.) What I meant to say is that skilling up / learning is what you should be doing, for now at least. Maybe a PhD is the best way to do that, but maybe not -- it depends on what you are trying to learn. I think working as a research assistant at an EA org would probably be a better way to learn than doing a PhD, for example. If you aren't trying to do research, but instead are trying to contribute by e.g. building a movement, maybe you should be out of academia entirely and instead gaining practical experience building movements or running political campaigns.
I do think it is crunch time probably, but I agree with what Rohin said here about what you should do for now (and about my minority status). Skilling up (not just in technical specialist stuff, also in your understanding of the problem we face, the literature, etc.) is what you should be doing. For what I think should be done by the community as a whole, see this comment.
Is this the sort of thing where if we had, say, 10 - 100 EAs and a billion dollar / year budget, we could use that money to basically buy the eyeballs of a significant fraction of the US population? Are they for sale?
Update: I thought about it a bit more & asked this question & got some useful feedback, especially from tin482 and vladimir_nesov. I now am confused about what people mean when they say current AI systems are much less sample-efficient than humans. On some interpretations, GPT-3 is already about as sample-efficient as humans. My guess is it's something like: "Sure, GPT-3 can see a name or fact once in its dataset and then remember it later & integrate it with the rest of its knowledge. But that's because it's part of the general skill/task of predicting text. For new skills/tasks, GPT-3 would need huge amounts of fine-tuning data to perform acceptably."