Model evals are eating AI safety hiring (and why the "AI safety" title is disappearing)

AI Safety Careers

I’ve been spending a lot of time digging through lab career pages, think tank sites, and research orgs lately for a small project I’m running. Looking at the raw data from the last few months, there’s a really obvious shift in what teams are actually hiring for right now: model evaluations, red teaming, and preparedness are exploding.

It seems like labs and governments (like the various AI Safety Institutes) are hitting the exact same wall at the same time. Everyone is realizing that standard benchmarks like MMLU don't actually tell you if a frontier model is safe to deploy in the real world.

Because of that, the money and headcounts are moving toward practical, adversarial testing. It’s becoming much less about speculative philosophy and much more about concrete engineering.

From what I see in recent job specs, teams aren't looking for vague existential worries anymore. They want people who can build test suites to answer very specific questions:

Can the model help users with dangerous, dual-use tasks?
What happens when you give an autonomous agent a bash terminal and code execution? Does it break things?
Does the model maintain its alignment under heavy optimization pressure, or does it start playing double games?
Can it subtly manipulate or flatter a user over a long, multi-turn conversation?

The "AI Safety" job title is disappearing

If you're currently looking for work in this space, here is the most important takeaway: stop searching only for "AI Safety Engineer". You’re going to miss 90% of the actual market.

Labs are professionalizing and breaking down safety into highly specialized sub-units. The broad umbrella of "safety" is fracturing into roles that sit somewhere between security engineering, product safety, and policy:

Evals Infrastructure (building the actual software to test models at scale)
Adversarial Red Teaming (literally trying to break the model before it ships)
Preparedness / Frontier Risk Assessment (mapping raw capabilities to institutional thresholds)
Trust & Safety / Misuse Prevention (handling mitigation at the deployment/API level)
AI Assurance & Compliance (auditing and dealing with emerging regulations)

A note on mapping this space

Because these roles are scattered across 50 different corporate career pages, policy centers, and tiny non-profits, keeping track of them is a massive pain.

I got tired of checking dozens of bookmarks every week, so I built a dedicated tracker to aggregate them: AI Safety Careers.

Right now, the toughest part of running this is classification. It’s easy to classify a pure alignment researcher. But I'm constantly wrestling with borderline roles—things like RLHF data platform engineering, evals infrastructure, or model security.

If anyone here is currently hiring for these types of teams, or recently went through the hiring loop, I’d love to hear your thoughts:

What technical skillsets are your eval teams actually missing the most right now?
Is the term "AI safety" actively being phased out in your org's internal taxonomy, or is that just an external hiring trend?

Effective Altruism Forum
EA Forum

Model evals are eating AI safety hiring (and why the "AI safety" title is disappearing)

1

1

Reactions