This is a cross-post of our newly updated article with advice on working at frontier AI companies if you want to help with AI safety.
The original article was published in June 2023. We decided to update it in May 2024, as events in the intervening year caused us to be more concerned about working for frontier AI companies, and we also wanted to explain why we still thought it made sense for some people to take roles at AI companies. Our overall view is that “it’s complicated”: working at a frontier AI company is sometimes a good idea, and sometimes a bad idea; it depends on the role and type of work, the company, and your individual case.
The new article tries to more clearly lay out the case in favour and against working at frontier AI companies, give tips for mitigating the downsides, and give more information on alternatives.
Summary
In a nutshell: If you want to help reduce catastrophic risks from AI, working at a frontier AI company is an important option to consider, but the impact is hard to assess. These roles often come with great potential for career growth, and many could be (or lead to) highly impactful ways of reducing the chances of an AI-related catastrophe. However, there's also a risk of doing substantial harm, and there are roles you should probably avoid.
Pros
- Some roles have high potential for a big positive impact via reducing risks from AI
- Among the best and most robust ways to gain AI-specific career capital
- Highly compensated and prestigious
Cons
- Risk of contributing to or accelerating AI systems that could cause extreme harm
- Financial and social incentives might make it harder to think objectively about risks
- Stress, especially because of a need to carefully and repeatedly assess whether your role is harmful
Recommendation: it’s complicated
We think there are people in our audience for whom working in a role at a frontier AI company is their highest-impact option. But some of these roles might also be extremely harmful. This means it's important to be discerning and vigilant when thinking about taking a role at a frontier AI company — both about the role and the company overall — and to actually be willing to leave if you end up thinking the work is harmful.
Review status: Based on an in-depth investigation
This review is informed by three surveys of people we regard as having expertise: one survey on whether you should be open to roles that advance AI capabilities (written up here), and two followup surveys conducted over the past two years. It's likely there are still gaps in our understanding, as many of these considerations remain highly debated. The domain is also fast moving, so this article may become outdated quickly.
This article was originally published in June 2023. It was substantially updated in August 2024 to reflect more recent developments and thinking.
Introduction
We think AI is likely to have transformative effects over the coming decades, and that reducing the chances of an AI-related catastrophe is one of the world’s most pressing problems.
So it’s natural to wonder whether you should try to work at one of the companies that are doing the most to build and shape these future AI systems.
As of summer 2024, OpenAI, Google DeepMind, Meta, and Anthropic seem to be the leading frontier AI companies — meaning they have produced the most capable models so far and seem likely to continue doing so. Mistral, and xAI are contenders as well — and others may enter the industry from here.1
Why might it be high impact to work for a frontier AI company?
Some roles at these companies might be among the best for reducing risks
We suggest working at frontier AI companies in several of our career reviews because a lot of important safety, governance, and security work is done in them. In these reviews, we highlight:
- Technical AI safety research and engineering roles — both in alignment and areas like interpretability or threat modelling
- Some AI governance research (including technical governance research) and advocacy roles [2]
- Roles aimed at designing and implementing company policies and processes for safely rolling out systems (e.g. "responsible scaling policies")
- Information security roles, e.g. for protecting model weights from being stolen.
These roles are aimed at some of the biggest issues increasing catastrophic risks from AI: the lack of scalable enough approaches for aligning powerful AI systems, governance and policy structures (both inside and outside companies) that could fail to contain risks, and the vulnerability of increasingly powerful AI systems to theft and misuse. Importantly, because some roles specifically target these issues, we think you can do risk-reducing work in them even if the company you're working for overall is making things more dangerous.
Moreover, because frontier AI companies are where so much AI work is being done right now — on the models themselves, processes, and safety research — the resources available to you there might help you do much better work. For example, from inside one of these companies, you might have access to stronger models, more compute, more experimental infrastructure, more information, and stronger collaborators for safety research than you would if you were on the outside. And if you're working on processes and policies to improve safety, you'll have more access to the people and systems that you're designing those processes and policies for. In many cases, it seems like this will make your work better and more effective.
These roles can also be very hard to hire for, because they require a lot of skill, sometimes many years of experience, and a thorough and nuanced understanding of the risk landscape and the technology.
All in all, we think that if you're a good fit, some of these roles are probably among the highest-impact jobs out there.
A frontier AI company could be a huge force for good
We'll talk below about how frontier AI companies might contribute to an AI-related catastrophe. But it's also possible that a responsible frontier AI company could be crucial to preventing an AI-related catastrophe. It could, for example:
- Lead the industry in the development of wise safety practices, protocols, and frameworks — setting examples for other projects on governance, security, and adherence to standards. For example, Anthropic's creation of a 'responsible scaling policy' in the summer of 2023 seems to have influenced OpenAI and Google DeepMind to create analogous structures in the following year: 'Preparedness' for OpenAI and 'Frontier Safety Framework' for DeepMind.
- Do valuable technical safety and governance research using state-of-the-art systems (possibly far more advanced than those available outside a frontier AI project) — especially if the results are shared.
- Engage in defensive deployment by using early, safe, but powerful AI systems to make the overall situation with AI safer. For example, it could use AI systems to contribute to AI safety research, produce evidence and demonstrations of risks, contribute to information security, and help with monitoring the risks. (Note that whether this will be possible is debated.)
- Put huge effort into designing tests for danger, and credibly warn others if it detects danger in its own systems.
- Coordinate effectively with other AI companies and projects — for example, by sharing important safety findings and techniques, and possibly, if needed, acquiring or merging with other projects or otherwise gaining visibility, influence, and control with which to prevent the deployment of dangerous systems.
- Effectively lobby the government for helpful measures to reduce risk — or help develop them. For example, our impression is that AI companies were instrumental in reaching achievements like the US's executive order on safe AI.
(Read about what AI companies can do to reduce risks). [3]
These actions require, or are made more effective by, being on the cutting edge. This means that to successfully be a force for good in these ways, a frontier AI company needs to balance the continued development of powerful AI (including possibly retaining a leadership position) with appropriately prioritising actions that reduce the risk overall.
This tightrope seems difficult to walk. But if a company is responsible enough to manage it, and successfully acts as a huge force for good, then boosting them generally would be very helpful — even if your role is not specifically focused on reducing catastrophic risks.
We give a few brief thoughts on which companies appear to be acting responsibly below, but ultimately, if you're considering working for a frontier AI company, you should investigate for yourself and decide if you trust the company's actions and leadership enough to want to help it succeed.
It's also worth noting that it might be possible to help a company walk this tightrope better as an employee by generally becoming part of the organisation's 'conscience,' especially if the company is on the smaller side. If you see something really harmful occurring, you can consider organising internal complaints, whistleblowing, or even resigning (as some safety-focused OpenAI employees did in spring 2024). In many cases, this probably won't make a difference (or could backfire),[4] but by taking these issues seriously yourself and discussing them with your colleagues, you might be able to help foster a culture that values safety, responsibility, and cooperative relationships with other actors and the public.
Wanting to influence the culture or decision making of a frontier AI company probably shouldn't be a big reason you take a role there — but it can be a bonus if you plan to have a positive impact in other ways as well.
It could be very good career capital (though there may also be career capital downsides)
Frontier AI companies are (perhaps obviously) among the best places in the world to learn about AI technology, its trajectory, and the overall industry. These companies are where state-of-the-art models are being developed, and they employ many of the people who will most influence how AI plays out.
Working at a frontier AI company might also help you gain a much better understanding of risks from AI specifically, because it puts you on the front lines. It's much easier to grasp the power, weaknesses, and vulnerabilities of AI systems, as well as the social structures and dynamics that drive and constrain the work of frontier companies, if you're working with them closely.
Frontier AI companies are also just high-performing, rapidly growing organisations. In general, one of the best ways to gain career capital is to work with any high-performing team — you can just learn a huge amount that way about getting stuff done, form valuable connections, and gain an impressive credential.
We think that career capital like this can be a very big deal, especially if you're relatively early in your career or switching careers. In fact, career capital can be even more important than direct impact when you're just getting started, because the direct impact of your work is likely much lower in a junior role, and you will have more time to make use of the career capital you gain.[5]
But there might also be some career capital downsides to working at a frontier AI company.
In general, your character, attitudes, and beliefs are shaped by the jobs you take and the company you keep, and they matter a lot for your long-term impact. This is why we list character as a component of career capital.
Some people we spoke to while preparing this article warned against working at frontier AI companies due to the risk of changing your mind, not for reasons you would endorse, but simply because of the influence of those around you. Our impression is that leading companies are at least somewhat concerned about risks, which makes it unlikely you'll do a total 180 on your views — but we still think this should be taken into account in any decision you make.
Of course, your coworkers might change your mind with good arguments too, so it's possible your beliefs could improve. But if you're going in thinking that the company doesn't take AI risk seriously enough — and we think many of our readers will — you should be aware of the possibility of the environment you'll be working in leading you to take the risks less seriously too, in part due to social factors. If you're currently right about the company not taking the risks seriously enough, that's probably a bad thing.
Also, working at these companies might create a conflict of interest for you or otherwise constrain your actions after you leave the role.
For example, prior to May 2024, OpenAI used financial incentives to get employees to sign non-disparagement agreements upon leaving. These prevented employees from criticising the company's actions and from acknowledging the non-disparagement agreements' existence until a public outcry caused OpenAI to stop the practice. It seems possible that even now, after this particular practice has ended, a frontier AI company could pressure you to not take actions that would go against its interests, including after you leave the job.
Also, some future employers that particularly value independence — such as governments or independent third-party evaluators — might be concerned about your history, especially if you still have a financial stake in the company, which will probably be part of your compensation package.
Finally, compensation at these companies is often high enough, especially for technical talent, that some people find it hard to decide to leave. And there is often a compensation structure that rewards you for staying longer, at least for the first few years. This is obviously a big plus financially, and might also allow you to have more impact depending on how you use the money (for example, you could donate a lot of it). But in practice some people experience this as a constraint that causes them to not leave as soon as they think they should — hence the term "golden handcuffs."
Overall, we think that for many people the career capital from working at a frontier AI company is among the best they can get for working in AI safety, despite these downsides. That said, there are a lot of alternative roles where you can also gain valuable experience and connections in the field, as well as do impactful work.
What's the case against working at a frontier AI company?
You might increase the risk of an AI-related catastrophe
The main reason not to work at a frontier AI company is that you might increase the risk of an AI-related catastrophe, by reducing the amount of time we have to decrease risks or by directly contributing to building dangerous systems (or both).
The leading frontier AI companies are all directly aimed at creating highly powerful, general artificial intelligence, and it seems plausible that they could succeed fairly soon (potentially in the next few decades). Moreover, because they face competition, they have incentives to move as quickly as they can.
Our view is that these powerful, general systems would pose very substantial risks, even though they might also be used to achieve a lot of good if the risks are avoided.
The overwhelming majority of roles at frontier AI companies seem likely to accelerate progress towards the goal of building the systems these companies are aiming at — either directly through research and engineering, or indirectly via activities like promoting products, trying to attract investors, or just generally making the company more successful.
So if you work in one of these roles, a large effect of your work — maybe the main effect — will be increasing this acceleration, giving everyone less time to mitigate risks, and so potentially making things worse.[6]
We think this simple argument is very powerful, and a big reason against working in these companies.
However, there are also a lot of caveats we think are important here.
First, roles aimed directly at decreasing catastrophic risks like the ones we highlighted above seem less likely than most roles at these companies to accelerate progress toward transformative AI. Some seem like they could even decrease acceleration.
And when the roles we highlighted do add to acceleration, we think they often have larger risk-reducing effects. Take AI alignment research, which might accelerate progress toward transformative AI, because systems are much more useful if they do what people want them to do, and more useful models are more likely to be adopted and built upon. For example, the invention of reinforcement learning from human feedback has been helpful for controlling models and is widely considered an alignment technique. It has also arguably sped up the field in general, and so reduced the time we have to more fully mitigate risks. But we'd guess a lot of alignment research is still risk-reducing on net, even when it also accelerates AI as a field. (Though it's also very controversial which alignment research has been risk-reducing vs risk-increasing overall.)
Second, even with research and engineering work that specifically focuses on increasing the capabilities of AI models, there are big differences in how much the work will help accelerate the creation of transformative AI.
Our impression is that scaling up foundation models, i.e. creating larger models that use more compute and are trained on more data, is likely to contribute the most to acceleration toward transformative AI. Increasing scale massively has driven much of the progress in model performance in recent years, and some people believe further scaling represents the shortest path to transformatively powerful AI. Improving model architectures, other algorithmic improvements, and increasing data quality might also play substantial roles.
On the other hand, work that happens post-training, such as fine-tuning and creating applications, is generally seen as less likely to directly contribute to acceleration toward transformative AI. This is because the performance of the next generation of models will be driven more by greater scale, better data, and algorithmic improvements. (That said, post-training enhancements do contribute to performance and making good products, which increases investment, which in turn can be used to drive faster fundamental progress.)
Finally, accelerating progress isn't quite the same thing as just reducing the time we have to do safety work while holding everything else equal. Accelerating progress also has other effects, some of which might reduce risks.
In particular, it's possible that the later we develop transformative AI, the faster (and therefore more dangerously) everything will play out, because other currently-constraining factors, such as the amount of compute available in the world, could continue to grow independently of other progress. Slowing down advances now could increase the rate of development in the future, when we're much closer to being able to build transformative AI systems. This would give the world less time to adapt to and conduct safety research with models that are very similar to ones we should be concerned about but aren't dangerous themselves. (When this is caused by a growth in the amount of compute, the scenario is often referred to as a hardware overhang.)[7]
Finally, it's possible that an unusually responsible AI company (or coalition) moving faster could reduce the risk of less cautious AI projects entering the field and causing problems. If this is right, boosting the unusually responsible frontier AI company would be beneficial even when it reduces total time to transformative AI. (Though you'd have to think there is a particularly responsible AI company for this reasoning to apply.)
We think these nuances are important. Overall, we think the rule of thumb don't do work where a primary effect is to accelerate progress to transformative AI is pretty reasonable while we haven't yet figured out how to create transformative AI safely, and it applies in many cases. But our view is that it isn't always decisive, and there are cases where the risk-reducing effects of your work likely outweigh the risk-increasing effects. Whether you're in one of those cases will depend on lots of considerations, some of which we discuss later in this article (like the type of role and how responsible the company is overall), as well as more individual factors like the particular research you're doing and how widely it's shared.
Some people disagree with this and would think that it's never OK to do work that contributes to accelerating AI progress while we don't have a plan to make it safe. If you think this, you might also favour a 'pause' on advancing AI while the world develops such a plan. See the arguments for a pause here.
It does seem worth emphasising that, ultimately, one of these frontier AI companies could produce a system that causes an existential catastrophe. If you work there, you might contribute to that. That in itself is a strong reason to be very careful.
Does replaceability mean this doesn't matter?
We've heard the argument that because so many people apply to work at AI companies, even if you end up doing work that's a bit harmful overall it won't make a real difference, because someone else would have done it anyway.
We think this argument isn't that strong. While hiring rounds don't perfectly select for performance, being offered a role is still a reason to believe you'll do the work better than the next person they would have hired. If the work is harmful, that's a bad thing. The high salaries and fast hiring at these companies also suggest that talent is a substantial bottleneck for them. At a basic level, increasing the supply of talent to the company will increase the amount of work that gets done, and replaceability is usually too hard to model to be considered as an important factor anyway.
Also, if you actually believe the work you'd do would be harmful overall, then even if it's true that it would get done either way, there's a strong common-sense moral case for avoiding doing it yourself.
There are great alternatives that lack many of the downsides
There are many more promising ways to contribute to reducing catastrophic risks from AI outside frontier AI companies than there used to be, such as through government institutes, nonprofit research organisations, and academic groups. These often also present good career capital opportunities, meaning you don't have to work at a frontier AI company to gain credibility, skills, and connections in AI safety. This isn't to say that you can always do work that's just as effective outside a frontier AI company as you would be able to do within it — as we discussed above, frontier companies' control over and access to the technology give them a lot of advantages for doing high-impact work. But the field has grown a lot in the last few years, and there are many other places to work where you can also make a big positive difference while running less risk of doing harm or changing your values and views in ways you wouldn't endorse.
See some ideas below.
Things to consider for your particular case
It's complicated to assess overall whether it's good to work at a frontier AI company, not least because whether it's a good idea will vary from person to person, and from role to role.
Here are some considerations you can use to assess your particular situation.
(You can also speak to us one-on-one for free about your options, and we can help introduce you to others working in AI safety. If you're considering taking a job at an AI company, we think it's probably worth discussing the choice with others for more personalised advice.)
There are very different kinds of roles
As we outlined above, there are roles at AI companies aimed specifically at reducing the chance of a catastrophic outcome from AI. These often seem like really good opportunities for impact.[8]
But for the vast majority of jobs at frontier AI companies, their main effect is to boost the success of the company generally and to accelerate progress toward transformative AI systems.
These jobs include technical research and engineering roles focused on making more powerful models (for example, via scaling, architectural improvements, optimiser improvements, or improved hardware), as well as roles in business, product, marketing, design, communications, fundraising, HR, sales, and operations.
We'd recommend steering clear of these roles unless you trust the leadership and structures at a frontier AI company enough to feel awesome about them being a force for good.
Some companies are probably much more responsible than others
Whether your work will be harmful or beneficial might depend a lot on how responsible the company you work at is overall, as well as how good their plans are for making their systems safe.
Not only is an irresponsible or misguided company more likely to produce a dangerous system, but it can also normalise disregard for governance, standards, and security across the industry, causing secondary harm.
This is important to consider, particularly if the positive impact of the role you're considering depends on the company's overall direction, which is more likely for roles outside the specifically targeted roles we highlighted at the beginning of this article. For example, we'd guess that securing powerful systems is beneficial regardless of how responsible the company is overall, but that working in communications very much isn't.
But it's not necessarily easy to know how positive you should be on a given company's plans and leadership.
Among the four leading frontier labs, OpenAI, Anthropic, and to some extent Google DeepMind have all expressed concern about catastrophic and even existential risks from AI, created policies aimed at reducing risks, and have internal teams focused on research into AI alignment and other ways to further safety.
On the other hand, Meta has not. The Chief AI Scientist at Meta, Yann LeCun, is also a vocal and prominent sceptic of AI safety.
As a rule of thumb, if you are trying to reduce catastrophic or existential risks from AI, it seems much better to work for a company that acknowledges the risks you are concerned about and has teams to address them than one that doesn't.
Among the labs that do publicly prioritise AI safety, it's worth being aware of concerns raised about OpenAI due to recent events:
- In fall 2023, OpenAI's nonprofit board voted to fire their CEO Sam Altman for allegedly being dishonest. He resumed control after employees threatened to leave and then replaced the board. During the power struggle, Altman appeared willing to take the employee base to OpenAI's partner, Microsoft, where they would be unbound by OpenAI's nonprofit mission to ensure that artificial general intelligence benefits all of humanity. This casts doubt on whether OpenAI's leadership or structure would reliably prioritise safety. (See a timeline of these events.)
- In spring 2024, several senior employees in safety-focused roles on the 'superaligment' team left OpenAI, citing a lack of support, particularly due to reportedly not being granted access to the 20% of OpenAI's compute that had been publicly promised to their work. After the resignations, OpenAI dissolved the team. These events suggest less investment in safety than many had previously thought and than some of OpenAI's public statements imply. (You can see the leadership's response here.)[9]
In general, we'd expect a lot of variation in how seriously companies take catastrophic risks, because the arguments for doing so aren't universally accepted, and the tradeoffs companies make will be influenced by the views of employees and especially leadership.
That said, all these companies face the same huge commercial pressure to develop more and more powerful AI, quickly — and their social impact missions are dependent on them doing so. This will push their behaviour closer together.
If you are thinking about working at one of these companies, you should seriously ask yourself whether you buy the statements and arguments they make about safety. Your managers and other employees at the company will be excited to tell you all about the positive impact of the work. Most will be being honest, but they are selected for their belief in the company's decisions, so listening to them without question is likely to give you a biased picture.[10]
As much as you can, look for credible signs of the company's plans and mission, such as big expenditures, corporate deals, and other costly investments, and not just company statements or what people working at the company think.
We also recommend checking out AI Lab Watch to compare the safety-relevant public behaviours of AI companies. AI Lab Watch continually assesses frontier AI companies' behaviour on several criteria including how well they appear able to assess the riskiness of their models, and internal governance structures.
At the time of this writing, AI Lab Watch gives Anthropic the highest "overall score" on avoiding extreme risks from AI of all the frontier companies, followed by OpenAI and then DeepMind, with Meta and Microsoft far behind.
The broader AI safety community can also help you reason through some of these questions, and provide you with a firehose of news, impressions, and discussion that is hard to summarise here.
It matters who you are
Putting aside differences between roles and companies, there's an additional question of whether it's a good idea for you to work at one of these companies. In general, we think you'll be a better fit and make a more positive difference working at an AI company if:
- You have an excellent understanding of risks from AI.
- You are vigilant and inclined to question and think for yourself about the impact of your work.
- You can follow tight security practices. AI companies sometimes deal with sensitive information that needs to be handled responsibly.
- You are able to keep your options open — that is, you are financially and psychologically prepared to leave if it starts to seem like the right idea.
- You have good social skills (if the positive impact of your role comes from being able to persuade others to make better decisions).
- You're less sensitive than average to incentives and social pressure, so you can stick to your principles and change your beliefs based on good arguments.
How can you reduce the downsides of working at a frontier AI company?
If you do take a role at a frontier AI company, here are some things you can do to mitigate the downsides:
- Continue to engage with the broader safety community. To reduce the chance that your opinions or values will drift just because of the people you're socialising with, try to find a way to spend time with people who share your values and who are actively trying to have accurate beliefs about the risks. For example, if you're a researcher or engineer, you may be able to spend some of your working time with a safety-focused research group. In general, keep friends outside your company — don't get into a situation where the incomes of the only people you ever talk to depend on the company succeeding.
- Be an employee who pays attention. Take the time to think carefully about the work you're doing, e.g. whether it'll lead to harmful hype. If something makes you uncomfortable, consider speaking up about it if you're in a position to. You'll need good social skills to navigate this effectively though, and you should probably pick your battles to focus on the most important issues, as some things you could do here might hurt your ability to advance in the company. Also, make sure that you notice and take action if your role is changed so that it's no longer having the effect you wanted, or if you're being guided to a new role that seems harmful to you. (For example, we've heard of companies suggesting to people who've applied for roles aimed at reducing catastrophic risks that they try out for other roles during the application process.)
- Be ready to quit. Avoid being in a financial or psychological situation where it's just going to be really hard for you to switch jobs into something more clearly focused on doing good, if that ends up being the right thing. Instead, constantly ask yourself whether you'd be able to make that switch, and whether you're making decisions that could make it harder to do so in the future. This also means keeping financial resources and intellectual influences outside the company.
In general, our top piece of advice would be to keep in close contact with people who are also focused on reducing catastrophic risks from AI and who can help you get perspective and make difficult decisions. This can help you think through things so that you stay if you should stay, leave if you should leave, or change the nature of your work if you need to.
Where might you work instead?
There are many promising opportunities to reduce AI risk that could be as good or better than working at frontier AI companies in some circumstances, and which generally have fewer of the downsides we’ve discussed.
Some top options:
- Top government initiatives working to reduce catastrophic risks from AI — especially the US AI Safety Institute and the UK AI Safety Institute. These initiatives have both research and governance roles. So it's possible to develop and use many of the same skills, as well as have a big direct impact. Government initiatives also face a recruitment disadvantage when compared with AI companies, because they aren't able to offer as high compensation.
- Think tanks that do governance research (including technical governance research), such as RAND, The Centre for the Governance of AI, and the Center for Security and Emerging Technology (CSET).
- Nonprofit research organisations like METR and Apollo Research, which work on advancing evaluations that can be used to detect dangerous capabilities, security risks, and other catastrophic-risk-relevant features in models, or Redwood Research, the Alignment Research Center (ARC), CAIS, or FAR AI.
- Academic research groups — e.g. those at MIT, Cambridge, Carnegie Mellon University, NYU, and UC Berkeley. See our job board PhD supervisors list.
- AI companies that build AI products, and might do some AI research, but which aren't causing the world to make progress toward transformative AI. You can gain ML skills at these companies, and some are even building socially beneficial applications — like in medicine or decision making.
Keep in mind that you can also learn many relevant skills via education (e.g. a bootcamp or a degree), and in roles that have nothing to do with AI, like software engineering at a large tech firm. See our technical AI safety career review and our AI governance and policy career review for more.
Explore jobs
If you've considered all the above and you're ready to start looking at some job opportunities in frontier AI companies, see our list.
Learn more about working at AI companies
Learn more about making career decisions where there's a risk of harm:
- Anonymous advice about advancing AI capabilities
- Is it ever OK to take a harmful job in order to do more good?
- Ways people trying to do good accidentally make things worse, and how to avoid them
Relevant career reviews (for more specific and practical advice):
Podcast episodes:
- Zvi Mowshowitz on sleeping on sleeper agents, and the biggest AI updates since ChatGPT
- Nick Joseph on whether Anthropic's AI safety policy is up to the task
- Nathan Labenz on the final push for AGI, understanding OpenAI's leadership drama, and red-teaming frontier models
Acknowledgements
Thanks to Benjamin Hilton, Benjamin Todd, Jonas Vollmer, Rohin Shah, Zach Stein-Perlman, and the 80,000 Hours team for helpful feedback on this piece. And thanks to Benjamin Hilton for writing the previous version of this article.
- ^
Many of the considerations in this article will also apply to working on AI for big tech companies like Apple and Amazon, which have the resources to potentially become rising stars in AI, as well as new upstarts in AI like SSI inc (if they become industry leaders) and AI hardware companies like Nvidia.
- ^
We think that governance staff at AI companies often do useful work, including contributing to policies like the US's executive order on safe AI. However, it's worth being concerned about conflicts of interest here, as this work essentially involves companies shaping policy to govern their own (and their competitors') behaviour.
- ^
This article is by Holden Karnofsky. Karnofsky cofounded Open Philanthropy, 80,000 Hours’ largest funder. Karnofsky is also married to Anthropic President and former OpenAI employee Daniela Amodei, who owns equity in both Anthropic and OpenAI
- ^
To improve a company's culture to be more oriented to safety and responsibility, you'd need the sorts of social skills and understanding of the area that will help you discuss catastrophic risks from AI productively with your colleagues, including those from different intellectual backgrounds. Arguing badly about these topics could accidentally cause harm by making people think that arguments for caution are unconvincing. We’d also guess that you should spend almost all of your work time focused on doing your job well; criticism and suggestions for change are usually far more powerful coming from a high performer.
- ^
This raises the question of whether it can ever be good to take a role that is a bit harmful for a while if you think it'd allow you to do more good in your career overall.
One version of this argument says that because there are so many people already working on advancing the base capabilities of AI, your work won't contribute much on the margin. And because of the connections you'll gain and everything you'll learn by doing that work, you'll be able to contribute much more to safety work in the future — in a way that makes taking the job good overall.
This response from our article "Anonymous advice: If you want to reduce AI risk, should you take roles that advance AI capabilities?" fleshes out this argument with some useful detail:
"Capabilities work is already highly incentivised to the tune of billions of dollars and will become more so in the future, so I don’t think on the margin AI risk motivated individuals working in these spaces would boost capabilities much. To try to quantify things, there were around 6,000 authors attending NeurIPS in 2021. Increasing that number by 1 represents an increase of 1/6,000. By contrast, I think the above benefits to safety of having an individual learn from other fields, potentially be a leader of a new critical area in AI safety, and otherwise be in a potentially better position to shape norms and organisational decisions, are likely to be much larger. (A relevant belief in my thinking is that I don’t believe shortening timelines today costs us that much safety, relative to getting us in a better position closer to the critical period.) Note that this last argument doesn’t apply to big actors, like significant labs or funders."
We're wary of this kind of argument because of the concern that it can be used to justify unethical actions, and we generally recommend people try hard to find other ways to have more impact, e.g. work somewhere else to build up the career capital they need. But we don't want to say this sort of argument is never right — it seems too extreme and dogmatic to say you can never do anything that causes harm, or that might be causing harm, as part of a plan to do more good. Harm is everywhere, many actions carry risks and unintended consequences, and arguments like the one articulated above do seem to us to have something to them. We wrote some more about the topic of harm/benefit tradeoffs here, but unfortunately we don't have fully general or final guidance on the question. If you're facing a choice like this, it seems best to just think very carefully and talk to others about your choice to make sure you're making reasonable tradeoffs.
- ^
This might not just be limited to just the company you work at, either. It also seems possible that accelerating progress toward transformative AI at one company will indirectly speed up progress or make things more dangerous at another, for example, by incentivising racing or through leaks or public sharing of insights. On the other hand, different companies compete for finite resources like compute, talent, and investment — so this wouldn’t necessarily be the case.
- ^
Indeed, OpenAI has argued that developing AI quickly and deploying it via commercial projects as it develops makes the world overall safer by giving society more ability to adapt as the technology develops — rather than having to deal with extremely powerful and general AI suddenly. Though, in OpenAI's particular case, this logic seems undermined by Sam Altman pursuing large ventures in compute. This sheds doubt on Altman's sincerity, but it doesn't invalidate the general argument that moving more slowly on non-hardware AI progress can be harmful.
- ^
Keep in mind there can also be helpful roles not specifically or directly focused on catastrophic risk reduction — for example, an operations role aimed at boosting the coordination of a safety team with the rest of the organisation, or new initiatives aimed at increasing the chance of a beneficial outcome for society. We haven't written as much about these, but we’d guess they’re often helpful.
- ^
There was also an earlier wave of departures in 2021 by seven OpenAI staff who left to start Anthropic, which was related to differences in views on safety.
- ^
Though OpenAI's previous practice of preventing former employees from criticising them on pain of losing their equity highlights that there are sometimes more sinister dynamics at play in these situations as well.
When I was 23 (35 now) I worked at News Corporation. At the time there was a General Manager who bullied many of her direct reports.
One day she decided to start bullying me. I decided to push back, while trying to get support from others around me. No one wanted to put their neck out. Eventually I discovered she was fudging financial numbers, so I went to HR, and that led to both of us going to the state level General Manager. She was sent home while a week long investigation was conducted. Many people then came out regarding the bullying and she was fired.
Being 23 this was the hardest thing I’ve ever done. I quit immediately after the investigation was over.
I write this as a warning: doing “work” from the inside can be incredibly stressful for young people and dear reader you should be very careful about assuming you’re the one to do it.
Yeah, when I was younger I successfully represented ~100 of my colleagues in an informal pay dispute at a software company. It's really, really hard to prove retaliation, but I found myself on the receiving end of a few very intimidating meetings with HR over trivial internal comments (where, when other people had made similar comments, they had not received this treatment). I was also told that there wasn't enough budget to promote me or even give me 'exceeds expectations' in my performance reviews, when colleagues in other teams had no issue. Even if this wasn't retaliation, speaking out gave me a paranoia that lasted the remainder of my time there and led me to hold my tongue in the future.
I'm the sort of person that tries to stick up for people when I see them getting fucked over, and perhaps the average EA also has this strength of will. But I agree with Yanni that whether this 'infiltration' approach works depends on this being one of your primary goals in joining the company, and a personality with very strong will & resilience. I don't think that it's a nice side-effect or valuable bonus in someone's personal calculation to join such a firm.
I'm sorry to hear you experienced that man, it sounds very familiar :(
Going through what I did represents the proudest moment of my life, but I wouldn't wish it on anyone.
Maybe someone from animal welfare with experience doing "inside" investigations could provide useful (albeit extreme) insight into this problem?
Another thought - I think it is possible that a bunch of EAs have just never worked with really shitty people before? Like, maybe it is taken for granted that everyone is just kinda nice and not actively trying to fuck you over?
I do wonder if the naïveté that the OpenAI board coup was approached with is a result of this. It did not sound like something organised by people who were used to operating in a highly political, cut-throat environment, and they seemed surprised when it turned out that they were.
Zvi on the 80k podcast:
The transcript is from the 80k website. The episode is also linked to in the post. It also continues to Rob replying that the 80k view is "it's complicated" and Zvi replying to that.
I also think that a lot of work that is branded as safety (for example, that is developed in a team called the safety-team or alignment-team) could reasonably be considered to be advancing "capabilities" (as the topic is often divided).
My main point is that I recommend checking the specific project you'd work on, and not only what it's branded as, if you think advancing AI capabilities could be dangerous (which I do think).
I personally think that "does this advance capabilities" is the wrong question to ask, and instead you should ask "how much does this advance capabilities relative to safety". Safer models are just more useful, and more profitable a lot of the time! Eg I care a lot about avoiding deception. But honest models are just generally more useful to users (beyond white lies I guess). And I think it would be silly for no one to work on detecting or reducing deception. I think most good safety work will inherently advance capabilities in some sense, and this is a sign that it's actually doing anything real. I struggle to think of any work I think is both useful and doesn't advance capabilities at all
My frank opinion is that the solution to not advancing capabilities is keeping the results private, and especially not sharing them with frontier labs.
((
making sure I'm not missing our crux completely: Do you agree:
))
1 is very true, 2 I agree with apart from the word main, it seems hard to label any factor as "the main" thing, and there's a bunch of complex reasoning about counterfactuals - eg if GDM stopped work that wouldn't stop Meta, so is GDM working on capabilities actually the main thing?
I'm pretty unconvinced that not sharing results with frontier labs is tenable - leaving aside that these labs are often the best places to do certain kinds of safety work, if our work is to matter, we need the labs to use it! And you often get valuable feedback on the work by seeing it actually used in production. Having a bunch of safety people who work in secret and then unveil their safety plan at the last minute seems very unlikely to work to me
If anyone reading this ever takes a job inside a lab and has concerns about their mental health or navigating the challenges outlined in this article, I'd be happy to (privately) mentor / support them through the experience. E.g. take regular phone / zoom calls. DM if interested.
Appreciate this a lot. Based on priors I think this might even be the most important trait if you're considering this work - given the extreme value drift as you surround yourself with a whole lot of powerful and smart people who are bent on accelerating AI.
"You're less sensitive than average to incentives and social pressure, so you can stick to your principles and change your beliefs based on good arguments."
At the risk of sounding naive: I'd like to point out you can go work for a frontier AI company and give lots of money to AI safety (or indeed any other cause area you believe in).
If nothing else, if you give at least the salary difference between a frontier job and a lower-pay non-frontier AI safety job, this prevents you from lying to yourself: thinking you are working at a frontier company because you believe its good, while actually doing it because of the benefits to you.
Thanks for taking a balanced view, but I would have liked to see more discussion of the replaceability argument which really is pivotal here.
You say that whoever is hired into a progress-accelerating role, even if they are safety-conscious, will likely be most effective in the role and so will accelerate progress more than an alternative candidate. This is fair but may not be the whole story. Could the fact that they are safety-conscious mean they can develop the AI in a safer way than the alternative candidate? Maybe they would be inclined to communicate and cooperate more with the safety teams than an alternative candidate. Maybe they would be more likely to raise concerns to leadership etc.
If these latter effects dominate it could be worth suggesting that people in the EA community apply even for progress-accelerating roles, and it could be more important for them to take roles at less reliable places like OpenAI than slightly more reliable like Anthropic.
I think the summary at the start of this post is too easy to misinterpret as "if you think of yourself as a smart and moral person, it's ok to go for these companies".
(None of the things the summary says seem false. But the overall impression seems too vulnerable to rationalisation along the lines of "surely I would not fall prey to these bad incentives". When reality is probably that most people fall prety to them. So at the minimum, it might be more fair to change the recommendation to something like "it's complicated, but err on the side of not joining" or "it's complicated, but we wouldn't recommend this for 95% of people who can get a job at these companies"[1].
Or whatever qualifier you think is fair. The main point is to make it clear that the warnings apply to the reader as well, not just to "all the other people".