(Note: This essay was largely written by Rob, based on notes from Nate. It’s formatted as Rob-paraphrasing-Nate because (a) Nate didn’t have time to rephrase everything into his own words, and (b) most of the impetus for this post came from Eliezer wanting MIRI to praise a recent OpenAI post and Rob wanting to share more MIRI-thoughts about the space of AGI organizations, so it felt a bit less like a Nate-post than usual.)
Nate and I have been happy about the AGI conversation seeming more honest and “real” recently. To contribute to that, I’ve collected some general Nate-thoughts in this post, even though they’re relatively informal and disorganized.
AGI development is a critically important topic, and the world should obviously be able to hash out such topics in conversation. (Even though it can feel weird or intimidating, and even though there’s inevitably some social weirdness in sometimes saying negative things about people you like and sometimes collaborate with.) My hope is that we'll be able to make faster and better progress if we move the conversational norms further toward candor and substantive discussion of disagreements, as opposed to saying everything behind a veil of collegial obscurity.
Capabilities work is currently a bad idea
Nate’s top-level view is that ideally, Earth should take a break on doing work that might move us closer to AGI, until we understand alignment better.
That move isn’t available to us, but individual researchers and organizations who choose not to burn the timeline are helping the world, even if other researchers and orgs don't reciprocate. You can unilaterally lengthen timelines, and give humanity more chances of success, by choosing not to personally shorten them.
Nate thinks capabilities work is currently a bad idea for a few reasons:
- He doesn’t buy that current capabilities work is a likely path to ultimately solving alignment.
- Insofar as current capabilities work does seem helpful for alignment, it strikes him as helping with parallelizable research goals, whereas our bottleneck is serial research goals. (See A note about differential technological development.)
- Nate doesn’t buy that we need more capabilities progress before we can start finding a better path.
This is not to say that capabilities work is never useful for alignment, or that alignment progress is never bottlenecked on capabilities progress. As an extreme example, having a working AGI on hand tomorrow would indeed make it easier to run experiments that teach us things about alignment! But in a world where we build AGI tomorrow, we're dead, because we won't have time to get a firm understanding of alignment before AGI technology proliferates and someone accidentally destroys the world. Capabilities progress can be useful in various ways, while still being harmful on net.
(Also, to be clear: AGI capabilities are obviously an essential part of humanity's long-term path to good outcomes, and it's important to develop them at some point — the sooner the better, once we're confident this will have good outcomes — and it would be catastrophically bad to delay realizing them forever.)
On Nate’s view, the field should do experiments with ML systems, not just abstract theory. But if he were magically in charge of the world's collective ML efforts, he would put a pause on further capabilities work until we've had more time to orient to the problem, consider the option space, and think our way to some sort of plan-that-will-actually-probably-work. It’s not as though we’re hurting for ML systems to study today, and our understanding already lags far behind today’s systems' capabilities.
Publishing capabilities advances is even more obviously bad
For researchers who aren't willing to hit the pause button, an even more obvious (and cheaper) option is to avoid publishing any capabilities research (including results of the form "it turns out that X can be done, though we won't say how we did it").
Information can leak out over time, so "do the work but don't publish about it" still shortens AGI timelines in expectation. However, it can potentially shorten them a lot less.
In an ideal world, the field would currently be doing ~zero publishing of capabilities research — and marginal action to publish less is beneficial even if the rest of the world continues publishing.
Thoughts on the landscape of AGI organizations
With those background points in hand:
Nate was asked earlier this year whether he agrees with Eliezer's negative takes on OpenAI. There's also been a good amount of recent discussion of OpenAI on LessWrong.
Nate tells me that his headline view of OpenAI is mostly the same as his view of other AGI organizations, so he feels a little odd singling out OpenAI. That said, here are his notes on OpenAI anyway:
- On Nate’s model, the effect of OpenAI is almost entirely dominated by its capabilities work (and sharing of its work), and this effect is robustly negative. (This is true for DeepMind, FAIR, and Google Brain too.)
- Nate thinks that DeepMind, OpenAI, Anthropic, FAIR, Google Brain, etc. should hit the pause button on capabilities work (or failing that, at least halt publishing). (And he thinks any one actor can unilaterally do good in the process, even if others aren't reciprocating.)
- On Nate’s model, OpenAI isn't close to operational adequacy in the sense of the Six Dimensions of Operational Adequacy write-up — which is another good reason to hold off on doing capabilities research. But this is again a property OpenAI shares with DeepMind, Anthropic, etc.
Insofar as Nate or I think OpenAI is doing the wrong thing, we’re happy to criticize it. But, while this doesn't change the fact that we view OpenAI's effects as harmful on net currently, Nate does want to acknowledge that OpenAI seems to him to be doing better than some other orgs on a number of fronts:
- Nate liked a lot of things about the OpenAI Charter. (As did Eliezer, though compared to Eliezer, Nate saw the Charter as a more important positive sign about OpenAI's internal culture.)
- Nate would suspect that OpenAI is much better than Google Brain and FAIR (and comparable with DeepMind, and maybe a bit behind Anthropic? it's hard to judge these things from the outside) on some important adequacy dimensions, like research closure and operational security. (Though Nate worries that, e.g., he may hear more about efforts in these directions made by OpenAI than about DeepMind just by virtue of spending more time in the Bay.)
- Nate is also happy that Sam Altman and others at OpenAI talk to EAs/rationalists and try to resolve disagreements, and he’s happy that OpenAI has had people like Holden and Helen on their board at various points.
- Also, obviously, OpenAI (along with DeepMind and Anthropic) has put in a much clearer AGI alignment effort than Google, FAIR, etc. (Albeit Nate thinks the absolute amount of "real" alignment work is still small.)
- Most recently, Nate and Eliezer both think it’s great that OpenAI released a blog post that states their plan going forward, and we want to encourage DeepMind and Anthropic to do the same.
Comparatively, Nate thinks of OpenAI as being about on par with DeepMind, maybe a bit behind Anthropic (who publish less), and better than most of the other big names, in terms of attempts to take not-killing-everyone seriously. But again, Nate and I think that the overall effect of OpenAI (and DeepMind and FAIR and etc.) is bad, because we think it's dominated by "shortens AGI timelines". And we’re a little leery of playing “who's better on [x] dimension” when everyone seems to be on the floor of the logistic success curve.
We don't want "here are a bunch of ways OpenAI is doing unusually well for its reference class" to be treated as encouragement for those organizations to stay in the pool, or encouragement for others to join them in the pool. Outperforming DeepMind, FAIR, and Google on one or two dimensions is a weakly positive sign about the future, but on my model and Nate’s, it doesn't come close to outweighing the costs of "adding another capabilities org to the world".
Post summary (feel free to suggest edits!):
Rob paraphrases Nate’s thoughts on capabilities work and the landscape of AGI organisations. Nate thinks:
(If you'd like to see more summaries of top EA and LW forum posts, check out the Weekly Summaries series.)
Thanks, Zoe! This is great. :) Points 1 and 2 of your summary are spot-on.
Point 3 is a bit too compressed:
"This applies to all of OpenAI, DeepMind, FAIR, Google Brain" makes it sound like "this organization does well for the reference class 'AI capabilities org'" applies to all four orgs; whereas actually we think OpenAI, DeepMind, and Anthropic are doing well for that class, and FAIR and Google Brain are not.
"Even if an organisation does well for the reference class 'AI capabilities org', it’s better for it to stop" also makes it sound like Nate endorses this as true for all possible capabilities orgs in all contexts. Rather, Nate thinks it could be good to do capabilities work in some contexts; it just isn't good right now. The intended point is more like:
OpenAI, Anthropic, and DeepMind are unusually safety-conscious AI capabilities orgs (e.g., much better than FAIR or Google Brain). But reality doesn't grade on a curve, there's still a lot to improve, and they should still call a halt to mainstream SotA-advancing potentially-AGI-relevant ML work, since the timeline-shortening harms currently outweigh the benefits.
Thanks, and that makes sense, edited to reflect your suggestion