AI Safety Talent Needs in 2026: Insights for Field-Building Organizations

jteichma

TL;DR

AI safety organizations are capacity constrained by a lack of senior researchers who can mentor and supervise junior talent. This bottleneck drives hyper-selective hiring, blocks promising candidates from entering the field, and is being exacerbated by AI automation, which is raising the bar for human contribution rather than lowering it. Based on 23 interviews with hiring managers, research leads, and funders across the ecosystem, we identify six concrete actions field-builders can take to improve talent pipelines, including standardized mentor evaluations, direct policy embedding, and production codebase access.

Study Context

In Q4, 2025, MATS Research performed 23 interviews with hiring managers, research leads, and funders across the AI safety ecosystem to understand talent needs, hiring constraints, and skills gaps. This analysis focuses on identifying how field-building organizations like MATS Research can better prepare candidates for employment in this rapidly evolving field. We are sharing this information more broadly in hopes it helps organizations design programs that better prepare people to contribute to AI safety work.

Methodology and Limitations

Orgs were selected according to MATS Research’s understanding of the AI Safety ecosystem, drawing from a range of categories (funders, think tanks, labs, nonprofits and for-profit safety organizations as well as governmental orgs). This is a synthesis of interview impressions, not a quantitative study. The focus was predominantly technical AI Safety, but this survey extended interviewing and insights to policy and governance roles as well. This approach provides a broad picture of AI Safety talent needs, but the governance interviews were fewer, and therefore observations in that category were drawn from smaller sample sizes. Typical interviewee roles involved team leads, program leads, founders, and hiring managers.

A Rapidly Growing Field Induces Hyper-Selectivity

As with any rapidly growing field, there are natural constraints which create bottlenecks for growth. In the case of AI Safety, many talented junior researchers are entering the field, but organizations lack the capacity to absorb and develop them. A major bottleneck for technical organizations is the scarcity of senior people who can effectively supervise and mentor others. Even when funding exists, each senior researcher can only mentor a limited number of juniors before quality degrades, and growing the org too quickly risks overwhelming existing capacity.

This bottleneck forces organizations into hyper-selectivity when choosing talent. Unable to invest time training and managing junior talent, they must hire only those who need minimal supervision, typically experienced, senior researchers who can set research agendas, and operate autonomously and contribute meaningfully right away.

This constraint makes direct collaboration experience with a calibrated reference the dominant hiring signal. Having worked directly with someone, or receiving strong endorsements from trusted colleagues with established reputations, provides far richer evidence than credentials, interviews, or even publications. Organizations need this signal because they cannot afford hiring mistakes when supervision capacity is so scarce. This explains why fellowship programs play a crucial role as hiring pipelines, by providing extended observation periods and opportunities for candidates to grow research taste, execution speed, communication quality, and independence.

In contrast to the more pervasive hyper-selectivity rule, some larger, well-established organizations faced internal bureaucratic hiring constraints around head count allocation that prevented them from growing when there was clearly capacity to do so.

Two Distinct Tracks at Frontier AI Companies

At frontier AI companies, technical safety research and production implementation are distinct roles. Research roles explore and test new ideas and approaches to AI safety challenges, while implementation roles translate those ideas into production systems operating within many competing constraints on model behavior. Safety interventions must improve one dimension (like reducing harmful outputs) without degrading others (instruction following, coding ability, factuality, etc.). This work cannot be studied outside frontier labs and requires senior software engineering skills typically developed through years of industry experience.

Labs rely heavily on internal hiring for implementation roles, drawing from existing engineering workforces who already understand company systems and production standards. This creates structural challenges for external fellowship candidates, though those with specific expertise (like large language model evaluations) or strong mission alignment can be considered for roles in AI safety research.

The Automation Paradox

AI automation is already reshaping workflows, with some frontier AI company teams reporting their processes have “completely changed” in just three years. Teams now move toward AI agents that autonomously implement entire features. However, senior engineers prove far more effective at using these AI tools than junior engineers. Experienced engineers identify when AI-generated code is poorly designed and provide corrective feedback, while junior engineers often accept AI output uncritically, building unsustainable “spaghetti code” that must be rewritten.

This suggests automation may raise rather than lower the bar for human contribution by making engineering taste, architectural judgment, and the ability to validate AI-generated work even more valuable. Within 2-5 years, organizations expect reduced demand for junior technical execution roles while roles that emphasize research taste, strategic judgment, human-facing capabilities, and the ability to orchestrate AI systems will become increasingly important.

Evolving Demand Patterns

Current demand for technical research shows a strong preference for Iterators; researchers who can quickly execute experiments and implement ideas on established research agendas. However, from a field-building perspective, producing research agenda-setters may hold higher expected value than producing more people to work on existing programs/agendas. Within two years, a majority technical organizations interviewed expect Connectors, those who create entirely new research paradigms, to become more valuable as AI handles more of the day-to-day iterative execution work.

At the same time, demand grows for Amplifier roles to support field expansion by facilitating research, managing cross-organizational projects, and translating technical work into action, especially in governance and policy contexts.

Critical Skills Gaps

Technical Tracks

For technical AI safety roles, organizations identify several critical gaps. Deeper technical training through ML bootcamp-style content remains important, but the ability to code appropriately for different time horizons, navigate large production codebases, understand existing infrastructure quickly, and coordinate/communicate across teams are also identified as essential skills at frontier AI companies and AI safety organizations.

Perhaps the most essential and elusive skill to cultivate is research taste, the ability to identify promising research directions, scope projects appropriately, and make good prioritization decisions without constant guidance.

Governance Tracks

For governance roles, substantive policy context paired with direct government experience was identified as the key skills gap. Many technically strong candidates don’t understand how government institutions actually work, including which agencies exist, how Congress differs from executive branches, what regulatory mechanisms exist, or how policy is actually made. Much of the knowledge about institutional dynamics, political timing, relationship building across partisan divides, and navigating bureaucratic processes is tacit and cannot be transmitted through reading alone. It requires direct experience through mentorships and internships in government settings.

Strong communication skills matter enormously for governance work, which consists largely of explaining technical issues to non-experts via meetings, memos, briefs, and reports. The ability to work in complex political environments was also identified. To be successful, it is necessary to build relationships with those who disagree, accept partial wins, and persist through frustrating processes. These social skills were identified as separating successful policy candidates from those who struggle despite strong analytical capabilities.

Actionable Recommendations

Organizations offered several concrete recommendations for improving talent pipelines:

Extended Evaluation Mechanisms: Organizations want ways to assess candidates over longer periods in realistic contexts before making hiring decisions. Current three-month fellowship projects often prove too short to distinguish truly independent researchers from those who execute well only under close supervision.

(Paradoxically) Shorter Work Trials: Some organizations suggested 4-week “work trial” formats specifically for hiring assessment, allowing much higher throughput (screening 4-5x more candidates) at the cost of research output. The value proposition is purely identification: quickly determining who can work effectively in the organization’s specific context.

Standardized Mentor Testimonials: Generic academic recommendations provide little value (”every professor says their students are brilliant”), but calibrated evaluations from known AI safety researchers carry substantial weight. If fellows left with a structured evaluation from their mentor that covers research taste, execution speed, communication quality, independence, and other key dimensions, it could dramatically reduce hiring uncertainty.

Production Codebase Experience: For technical tracks, providing access to shared computing clusters and large model codebases where fellows practice production-like collaborative engineering could address the codebase navigation gap. Researchers could contribute through pull requests, navigate existing infrastructure, and experience the coordination challenges of multi-person engineering projects.

Direct Policy Embedding: For governance tracks, direct embedding in real multi-stakeholder policy processes, whether through partnerships with congressional offices, think tanks, government agencies, or initiatives like the Frontier Model Forum would build relationships and tacit institutional knowledge that translate directly to employment. This proves far more valuable than standalone policy projects completed in isolation.

Broad Strategic Perspective: Well-rounded researchers would benefit from a broad exposure to the AI safety landscape, including forecasting, compute economics, first-principles modeling of AI development, and strategic thinking about which threat models are plausible, which research agendas address them, and what assumptions underlie different approaches.

The Path Forward

Given rapid change and limited senior bandwidth, the priority is building durable capacity: developing researchers and practitioners who can contribute with increasing autonomy and eventually mentor others. If we get this right, we increase the field’s ability to absorb talent, set better agendas, and respond faster as the technical and governance landscape shifts—improving our odds of building AI systems that are safe, reliable, and aligned with human interests.

Acknowledgements

This report was produced by MATS Research as an update to the Talent Needs of Technical AI Safety Teams post published in May of 2024. John Teichman conducted all interviews, analyzed the data and authored the report. Ryan Kidd provided directional guidance and editorial support, and a number of people on the MATS team contributed helpful reviews.

Thanks to our interviewees for their time and support, without which this work wouldn’t have been possible. We also thank Coefficient Giving, the Survival and Flourishing Fund, and Longview Philanthropy, without whose donations we would be unable to run upcoming programs or retain team members essential to this report.

To learn more about MATS, please visit our website. We are currently accepting donations for our Summer 2026 Program and beyond!

mjkerrison🔸️Mar 253

Is there any work on calibration / assessing how well the 2024 report held up? Would be really useful to know how well the hiring managers' (etc.) predictions can be expected to hold up. Put another way, everything in this report rings true to me, but if it's true but e.g. the opportunities/jobs for Connectors and Amplifiers don't actually manifest, then... that seems bad?

jteichmaMar 263

We didn't expressly assess how the 2024 report held up, but my impression is that the Iterator, Amplifier, Connector roles are still pretty solid as archetypes, but the emphasis is shifting. For example, labs still want Iterators, but admit that those skills will largely be replaced by AI automation in a few years and are looking for something more like a Connector/Iterator (or at a minimum Iterators with extreme research taste). Similarly the Connector role was not as in demand previously (most orgs are founded by one Connector-type with a new agenda to pursue, and everyone else iterates). It may be that new agendas combined with automated research and rapid iteration accelerate AI safety innovation - that was suggested.

This report covers a little broader scope than the original in that it extends to AI Safety governance. Many of those roles broadly fit the Amplifier archetype, but in actuality they are more nuanced. For example, there is demand for people with organizational knowledge and AI safety grounding who can take high-level ideas and effectuate them in government. They need technical knowledge, organizational judgment, and also must excel at communicating ideas to non-technical stakeholders.

So, to answer the question of whether these predictions hold up... If the past is any indicator, they are directionally correct, but subject to change! Let's check back in two years and see how we did.

Effective Altruism Forum
EA Forum