Scrappy note on the AI safety landscape. Very incomplete, but probably a good way to get oriented to (a) some of the orgs in the space, and (b) how the space is carved up more generally.
(A) Technical
(i) A lot of the safety work happens in the scaling-based AGI companies (OpenAI, GDM, Anthropic, and possibly Meta, xAI, Mistral, and some Chinese players). Some of it is directly useful, some of it is indirectly useful (e.g. negative results, datasets, open-source models, position pieces etc.), and some is not useful and/or a distraction. It's worth developing good assessment mechanisms/instincts about these.
(ii) A lot of safety work happens in collaboration with the AGI companies, but by individuals/organisations with some amount of independence and/or different incentives. Some examples: METR, Redwood, UK AISI, Epoch, Apollo. It's worth understanding what they're doing with AGI cos and what their theories of change are.
(iii) Orgs that don't seem to work directly with AGI cos but are deeply technically engaging with frontier models and their relationship to catastrophic risk: places like Palisade, FAR AI, CAIS. These orgs maintain even more independence, and are able to do/say things which maybe the previous tier might not be able to. A recent cool thing was CAIS finding that models don't do well on remote work tasks -- only 2.5% of tasks -- in contrast to OpenAI's findings in GDPval suggests models have an almost 50% win-rate against industry professionals on a suite of "economically valuable, real-world tasks" tasks.
(iv) Orgs that are pursuing other* technical AI safety bets, different from the AGI cos: FAR AI, ARC, Timaeus, Simplex AI, AE Studio, LawZero, many independents, some academics at e.g. CHAI/Berkeley, MIT, Stanford, MILA, Vector Institute, Oxford, Cambridge, UCL and elsewhere. It's worth understanding why they want to make these bets, including whether it's their comparative advantage, an alignment with their incentives/grants, or whether they
Not sure who needs to hear this, but Hank Green has published two very good videos about AI safety this week: an interview with Nate Soares and a SciShow explainer on AI safety and superintelligence.
Incidentally, he appears to have also come up with the ITN framework from first principles (h/t @Mjreard).
Hopefully this is auspicious for things to come?
ChatGPT’s usage terms now forbid it from giving legal and medical advice:
Some users are reporting that ChatGPT refuses to give certain kinds of medical advice. I can’t figure out if this also applies to API usage.
It sounds like the regulatory threats and negative press may be working, and it’ll be interesting to see if other model providers follow suit. It will be interesting to see if jurisdictions formally regulate this (I can see the EU doing so, but not the U.S.).
In my opinion, the upshot of this is probably that OpenAI are ceding this market to specialised providers who can afford the higher marginal costs of moderation, safety, and regulatory compliance (or black-market-style providers who refuse to put these safeguards on and don’t bow to regulatory pressure). This is probably a good thing—the legal, medical, and financial industries have clearer, industry-specific regulatory frameworks that can more adequately monitor for and prevent harm.
FYI: METR is actively fundraising!
METR is a non-profit research organization. We prioritise independence and trustworthiness, which shapes both our research process and our funding options. To date, we have not accepted payment from frontier AI labs for running evaluations. ^[1]
Part of METR's role is to independently assess the arguments that frontier AI labs put forward about the safety of their models. These arguments are becoming increasingly complex and dependent on nuances of how models are trained and how mitigations were developed.
For this reason, it's important that METR has its finger on the pulse of frontier AI safety research. This means hiring and paying for staff that might otherwise work at frontier AI labs, requiring us to compete with labs directly for talent.
The central constraint to our publishing more and better research, and scaling up our work aimed at monitoring the AI industry for catastrophic risk, is growing our team with excellent new researchers and engineers.
And our recruiting is, to some degree, constrained by our fundraising - especially given the skyrocketing comp that AI companies are offering.
To donate to METR, click here: https://metr.org/donate
If you’d like to discuss giving with us first, or receive more information about our work for the purpose of informing a donation, reach out to giving@metr.org
1. ^
However, we are definitely not immune from conflicting incentives. Some examples:
- We are open to taking donations from individual lab employees (subject to some constraints, e.g. excluding senior decision-makers, constituting <50% of our funding)
- Labs provide us with free model access for conducting our evaluations, and several labs also provide us ongoing free access for research even if we're not conducting a specific evaluation.
PSA: If you're doing evals things, every now and then you should look back at OpenPhil's page on capabilities evals to check against their desiderata and questions in sections 2.1-2.2, 3.1-3.4, 4.1-4.3 as a way to critically appraise the work you're doing.
If the people arguing that there is an AI bubble turn out to be correct and the bubble pops, to what extent would that change people's minds about near-term AGI?
I strongly suspect there is an AI bubble because the financial expectations around AI seem to be based on AI significantly enhancing productivity and the evidence seems to show it doesn't do that yet. This could change — and I think that's what a lot of people in the business world are thinking and hoping. But my view is a) LLMs have fundamental weaknesses that make this unlikely and b) scaling is running out of steam.
Scaling running out of steam actually means three things:
1) Each new 10x increase in compute is less practically or qualitatively valuable than previous 10x increases in compute.
2) Each new 10x increase in compute is getting harder to pull off because the amount of money involved is getting unwieldy.
3) There is an absolute ceiling to the amount of data LLMs can train on that they are probably approaching.
So, AI investment is dependent on financial expectations that are depending on LLMs enhancing productivity, which isn't happening and probably won't happen due to fundamental problems with LLMs and due to scaling becoming less valuable and less feasible. This implies an AI bubble, which implies the bubble will eventually pop.
So, if the bubble pops, will that lead people who currently have a much higher estimation than I do of LLMs' current capabilities and near-term prospects to lower that estimation? If AI investment turns out to be a bubble, and it pops, would you change your mind about near-term AGI? Would you think it's much less likely? Would you think AGI is probably much farther away?
Ajeya Cotra writes:
Like Ajeya, I haven't thought about this a ton. But I do feel quite confident in recommending that generalist EAs — especially the "get shit done" kind — at least strongly consider working on biosecurity if they're looking for their next thing.