Hi, I'm Rohin Shah! I work as a Research Scientist on the technical AGI safety team at DeepMind. I completed my PhD at the Center for Human-Compatible AI at UC Berkeley, where I worked on building AI systems that can learn to assist a human user, even if they don't initially know what the user wants.
I'm particularly interested in big picture questions about artificial intelligence. What techniques will we use to build human-level AI systems? How will their deployment affect the world? What can we do to make this deployment go better? I write up summaries and thoughts about recent work tackling these questions in the Alignment Newsletter.
In the past, I ran the EA UC Berkeley and EA at the University of Washington groups.
Oh sorry, I missed the weights on the factors, and thought you were taking an unweighted average.
Is it because you have to run large evals or do pre-training runs? Do you think this argument applies to all areas of capabilities research?
All tasks in capabilities are ultimately trying to optimize the capability-cost frontier, which usually benefits from measuring capability.
If you have an AI that will do well at most tasks you give it that take (say) a week, then you have the problem that the naive way of evaluating the AI (run it on some difficult tasks and see how well it does) now takes a very long time to give you useful signal. So you now have two options:
This doesn't apply for training / inference efficiency (since you hold the AI and thus capabilities constant, so you don't need to measure capability). And there is already a good proxy for pretraining improvements, namely perplexity. But for all the other areas, this is going to increasingly be a problem that they will need to solve.
On reflection this is probably not best captured in your "task length" criterion, but rather the "feedback quality / verifiability" criterion.
Great analysis of factors impacting automatability.
Looking at your numbers though, I feel like you didn't really need this; you could have just said "I think scheming risk is by far the most important factor in automatability of research areas, therefore capabilities will come first". EDIT Overstated, I missed the fact that scheming risk factor had lower weight than the others.
I don't agree with that conclusion for two main reasons:
I don’t know how much the FTX collapse is responsible for our current culture. They did cause unbelievable damage, acting extremely unethically and unilaterally and recklessly in destructive ways. But they did have this world-scale ambition, and urgency, and proclivity to actually make things happen in the world, that I think central EA orgs and the broader EA community sorely lack in light of the problems we’re hoping to solve.
But this is exactly why I don't want to encourage heroic responsibility (despite the fact that I often take on that mindset myself). Empirically, its track record seems quite bad, and I'd feel that way even if you ignore FTX.
Like, my sense is that something along the lines of heroic responsibility causes people to:
To be clear in some sense these are all failures of epistemics, in that if you have sufficiently good epistemics then you wouldn't make any of these mistakes even if you were taking on heroic responsibility. But in practice humans are enough of an epistemic mess that I instead think that it's better to just not adopt heroic responsibility and instead err more in the direction of "the normal way to do things".
In fact, all of the top 7 most sought-after skills were related to management or communications.
"Leadership / strategy" and "government and policy expertise" are emphatically not management or communications. There's quite a lot of effort on building a talent pipeline for "government and policy expertise". There isn't one for "leadership / strategy" but I think that's mostly because no one knows how to do it well (broadly speaking, not just limited to EA).
If you want to view things through the lens of status (imo often a mistake), I think "leadership / strategy" is probably the highest status role in the safety community, and "government and policy expertise" is pretty high as well. I do agree that management / communications are not as high status as the chart would suggest they should be, though I suspect this is mostly due to tech folks consistently underestimating the value of these fields.
Applicant A started out wanting to be a researcher. They did MATS before becoming an AI Safety researcher. By gaining enough research experience they were promoted to a research manager.
Applicant B always wanted to be a manager. They got an MBA from a competitive business school and worked their way into becoming a people manager in a tech company. Midway through their career they discover AI Safety and decide they want to make a career transition.
If I were hiring for a manager and somehow had to choose between only these two applicants with only this information, I would choose applicant A. (Though of course the actual answer is to find better applicants and/or get more information about them.)
I can always train applicant A to be an adequate people manager (and have done so in the past). I can't train applicant B to have enough technical understanding to make good prioritization decisions.
(Relatedly, at tech companies, the people managers often have technical degrees, not MBAs.)
in many employers’ eyes they would not look as value aligned as someone who did MATS, something which is part of a researcher’s career path anyway.
I've done a lot of hiring, and I suppose I do look for "value alignment" in the sense of "are you going to have the team's mission as a priority", but in practice I have a hard time imagining how any candidate who actually was mission aligned could somehow fail to demonstrate it. My bar is not high and I care way more about other factors. (And in fact I've hired multiple people who looked less "EA-value aligned" than either applicants A or B, I can think of four off the top of my head.)
It's possible that other EA hiring cares more about this, but I'd weakly guess that this is a mostly-incorrect community-perpetuated belief.
(There is another effect which does advantage e.g. MATS -- we understand what MATS is, and what excellence at it looks like. Of the four people I thought of above, I think we plausibly would have passed over 2-3 of them in a nearby world where the person reviewing their resume didn't realize what made them stand out.)
I think that most of classic EA vs the rest of the world is a difference in preferences / values, rather than a difference in beliefs.
I somewhat disagree but I agree this is plausible. (That was more of a side point, maybe I shouldn't have included it.)
most people really really don't want to die in the next ten years
Is your claim that they really really don't want to die in the next ten years, but they are fine dying in the next hundred years? (Else I don't see how you're dismissing the anti-aging vs sports team example.)
So, for x-risk to be high, many people (e.g. lab employees, politicians, advisors) have to catastrophically fail at pursuing their own self-interest.
Sure, I mostly agree with this (though I'd note that it can be a failure of group rationality, without being a failure of individual rationality for most individuals). I think people frequently do catastrophically fail to pursue their own self-interest when that requires foresight.
Most people really don’t want to die, or to be disempowered in their lifetimes. So, for existential risk to be high, there has to be some truly major failure of rationality going on.
... What is surprising about the world having a major failure of rationality? That's the default state of affairs for anything requiring a modicum of foresight. A fairly core premise of early EA was that there is a truly major failure of rationality going on in the project of trying to improve the world.
Are you surprised that ordinary people spend more money and time on, say, their local sports team, than on anti-aging research? For most of human history, aging had a ~100% chance of killing someone (unless something else killed them first).
If you think the following claim is true - 'non-AI projects are never undercut but always outweighed'
Of course I don't think this. AI definitely undercuts some non-AI projects. But "non-AI projects are almost always outweighed in importance" seems very plausible to me, and I don't see why anything in the piece is a strong reason to disbelieve that claim, since this piece is only responding to the undercutting argument. And if that claim is true, then the undercutting point doesn't matter.
We are disputing a general heuristic that privileges the AI cause area and writes off all the others.
I think the most important argument towards this conclusion is "AI is a big deal, so we should prioritize work that makes it go better". But it seems you have placed this argument out of scope:
[The claim we are interested in is] that the coming AI revolution undercuts the justification for doing work in other cause areas, rendering work in those areas useless, or nearly so (for now, and perhaps forever).
[...]
AI causes might be more cost-effective than projects in other areas, even if AI doesn’t undercut those projects’ efficacy. Assessing the overall effectiveness of these broad cause areas is too big a project to take on here.
I agree that lots of other work looks about as valuable as it did before, and isn't significantly undercut by AI. This seems basically irrelevant to the general heuristic you are disputing, whose main argument is "AI is a big deal so is way more important".
It is more like this stronger claim.
I might not use "inherently" here. A core safety question is whether an AI system is behaving well because it is aligned, or because it is pursuing convergent instrumental subgoals until it can takeover. The "natural" test is to run the AI until it has enough power to easily take over, at which point you observe whether it takes over, which is extremely long-horizon. But obviously this was never an option for safety anyway, and many of the proxies that we think about are more short horizon.