Jobs that can help with the most important century

Holden Karnofsky

This is a linkpost for https://www.cold-takes.com/jobs-that-can-help-with-the-most-important-century/

Let’s say you’re convinced that AI could make this the most important century of all time for humanity. What can you do to help things go well instead of poorly?

I think the biggest opportunities come from a full-time job (and/or the money you make from it). I think people are generally far better at their jobs than they are at anything else.

This piece will list the jobs I think are especially high-value. I expect things will change (a lot) from year to year - this is my picture at the moment.

Here’s a summary:

Role	Skills/assets you'd need
Research and engineering on AI safety	Technical ability (but not necessarily AI background)
Information security to reduce the odds powerful AI is leaked	Security expertise or willingness/ability to start in junior roles (likely not AI)
Other roles at AI companies	Suitable for generalists (but major pros and cons)
Govt and govt-facing think tanks	Suitable for generalists (but probably takes a long time to have impact)
Jobs in politics	Suitable for generalists if you have a clear view on which politicians to help
Forecasting to get a better handle on what’s coming	Strong forecasting track record (can be pursued part-time)
"Meta" careers	Misc / suitable for generalists
Low-guidance options	These ~only make sense if you read & instantly think "That's me"

A few notes before I give more detail:

These jobs aren’t the be-all/end-all. I expect a lot to change in the future, including a general increase in the number of helpful jobs available.
Most of today’s opportunities are concentrated in the US and UK, where the biggest AI companies (and AI-focused nonprofits) are. This may change down the line.
Most of these aren’t jobs where you can just take instructions and apply narrow skills.
- The issues here are tricky, and your work will almost certainly be useless (or harmful) according to someone.
- I recommend forming your own views on the key risks of AI - and/or working for an organization whose leadership you’re confident in.
Staying open-minded and adaptable is crucial.
- I think it’s bad to rush into a mediocre fit with one of these jobs, and better (if necessary) to stay out of AI-related jobs while skilling up and waiting for a great fit.
- I don’t think it’s helpful (and it could be harmful) to take a fanatical, “This is the most important time ever - time to be a hero” attitude. Better to work intensely but sustainably, stay mentally healthy and make good decisions.

The first section of this piece will recap my basic picture of the major risks, and the promising ways to reduce these risks (feel free to skip if you think you’ve got a handle on this).

The next section will elaborate on the options in the table above.

After that, I’ll talk about some of the things you can do if you aren’t ready for a full-time career switch yet, and give some general advice for avoiding doing harm and burnout.

Recapping the major risks, and some things that could help

This is a quick recap of the major risks from transformative AI. For a longer treatment, see How we could stumble into an AI catastrophe, and for an even longer one see the full series. To skip to the next section, click here.

The backdrop: transformative AI could be developed in the coming decades. If we develop AI that can automate all the things humans do to advance science and technology, this could cause explosive technological progress that could bring us more quickly than most people imagine to a radically unfamiliar future.

Such AI could also be capable of defeating all of humanity combined, if it were pointed toward that goal.

(Click to expand) The most important century

In the most important century series, I argued that the 21st century could be the most important century ever for humanity, via the development of advanced AI systems that could dramatically speed up scientific and technological advancement, getting us more quickly than most people imagine to a deeply unfamiliar future.

I focus on a hypothetical kind of AI that I call PASTA, or Process for Automating Scientific and Technological Advancement. PASTA would be AI that can essentially automate all of the human activities needed to speed up scientific and technological advancement.

Using a variety of different forecasting approaches, I argue that PASTA seems more likely than not to be developed this century - and there’s a decent chance (more than 10%) that we’ll see it within 15 years or so.

I argue that the consequences of this sort of AI could be enormous: an explosion in scientific and technological progress. This could get us more quickly than most imagine to a radically unfamiliar future.

I’ve also argued that AI systems along these lines could defeat all of humanity combined, if (for whatever reason) they were aimed toward that goal.

For more, see the most important century landing page. The series is available in many formats, including audio; I also provide a summary, and links to podcasts where I discuss it at a high level.

(Click to expand) How could AI systems defeat humanity?

A previous piece argues that AI systems could defeat all of humanity combined, if (for whatever reason) they were aimed toward that goal.

By defeating humanity, I mean gaining control of the world so that AIs, not humans, determine what happens in it; this could involve killing humans or simply “containing” us in some way, such that we can’t interfere with AIs’ aims.

One way this could happen would be via “superintelligence” It’s imaginable that a single AI system (or set of systems working together) could:

Do its own research on how to build a better AI system, which culminates in something that has incredible other abilities.
Hack into human-built software across the world.
Manipulate human psychology.
Quickly generate vast wealth under the control of itself or any human allies.
Come up with better plans than humans could imagine, and ensure that it doesn't try any takeover attempt that humans might be able to detect and stop.
Develop advanced weaponry that can be built quickly and cheaply, yet is powerful enough to overpower human militaries.

But even if “superintelligence” never comes into play - even if any given AI system is at best equally capable to a highly capable human - AI could collectively defeat humanity. The piece explains how.

The basic idea is that humans are likely to deploy AI systems throughout the economy, such that they have large numbers and access to many resources - and the ability to make copies of themselves. From this starting point, AI systems with human-like (or greater) capabilities would have a number of possible ways of getting to the point where their total population could outnumber and/or out-resource humans.

More: AI could defeat all of us combined

Misalignment risk: AI could end up with dangerous aims of its own.

If this sort of AI is developed using the kinds of trial-and-error-based techniques that are common today, I think it’s likely that it will end up “aiming” for particular states of the world, much like a chess-playing AI “aims” for a checkmate position - making choices, calculations and plans to get particular types of outcomes, even when doing so requires deceiving humans.
I think it will be difficult - by default - to ensure that AI systems are aiming for what we (humans) want them to aim for, as opposed to gaining power for ends of their own.
If AIs have ambitious aims of their own - and are numerous and/or capable enough to overpower humans - I think we have a serious risk that AIs will take control of the world and disempower humans entirely.

(Click to expand) Why would AI "aim" to defeat humanity?

A previous piece argued that if today’s AI development methods lead directly to powerful enough AI systems, disaster is likely by default (in the absence of specific countermeasures).

In brief:

Modern AI development is essentially based on “training” via trial-and-error.
If we move forward incautiously and ambitiously with such training, and if it gets us all the way to very powerful AI systems, then such systems will likely end up aiming for certain states of the world (analogously to how a chess-playing AI aims for checkmate).
And these states will be other than the ones we intended, because our trial-and-error training methods won’t be accurate. For example, when we’re confused or misinformed about some question, we’ll reward AI systems for giving the wrong answer to it - unintentionally training deceptive behavior.
We should expect disaster if we have AI systems that are both (a) powerful enough to defeat humans and (b) aiming for states of the world that we didn’t intend. (“Defeat” means taking control of the world and doing what’s necessary to keep us out of the way; it’s unclear to me whether we’d be literally killed or just forcibly stopped ^[1] from changing the world in ways that contradict AI systems’ aims.)

More: Why would AI "aim" to defeat humanity?

Competitive pressures, and ambiguous evidence about the risks, could make this situation very dangerous. In a previous piece, I lay out a hypothetical story about how the world could stumble into catastrophe. In this story:

There are warning signs about the risks of misaligned AI - but there’s a lot of ambiguity about just how big the risk is.
Everyone is furiously racing to be first to deploy powerful AI systems.
We end up with a big risk of deploying dangerous AI systems throughout the economy - which means a risk of AIs disempowering humans entirely.
And even if we navigate that risk - even if AI behaves as intended - this could be a disaster if the most powerful AI systems end up concentrated in the wrong hands (something I think is reasonably likely due to the potential for power imbalances). There are other risks as well.

(Click to expand) Why AI safety could be hard to measure

In previous pieces, I argued that:

If we develop powerful AIs via ambitious use of the “black-box trial-and-error” common in AI development today, then there’s a substantial risk that:

These AIs will develop unintended aims (states of the world they make calculations and plans toward, as a chess-playing AI "aims" for checkmate);
These AIs could deceive, manipulate, and even take over the world from humans entirely as needed to achieve those aims.
People today are doing AI safety research to prevent this outcome, but such research has a number of deep difficulties:

“Great news - I’ve tested this AI and it looks safe.” Why might we still have a problem?
Problem	Key question	Explanation
The Lance Armstrong problem	Did we get the AI to be actually safe or good at hiding its dangerous actions?	When dealing with an intelligent agent, it’s hard to tell the difference between “behaving well” and “appearing to behave well.” When professional cycling was cracking down on performance-enhancing drugs, Lance Armstrong was very successful and seemed to be unusually “clean.” It later came out that he had been using drugs with an unusually sophisticated operation for concealing them.
The King Lear problem	The AI is (actually) well-behaved when humans are in control. Will this transfer to when AIs are in control?	It's hard to know how someone will behave when they have power over you, based only on observing how they behave when they don't. AIs might behave as intended as long as humans are in control - but at some future point, AI systems might be capable and widespread enough to have opportunities to take control of the world entirely. It's hard to know whether they'll take these opportunities, and we can't exactly run a clean test of the situation. Like King Lear trying to decide how much power to give each of his daughters before abdicating the throne.
The lab mice problem	Today's "subhuman" AIs are safe.What about future AIs with more human-like abilities?	Today's AI systems aren't advanced enough to exhibit the basic behaviors we want to study, such as deceiving and manipulating humans. Like trying to study medicine in humans by experimenting only on lab mice.
The first contact problem	Imagine that tomorrow's "human-like" AIs are safe. How will things go when AIs have capabilities far beyond humans'?	AI systems might (collectively) become vastly more capable than humans, and it's ... just really hard to have any idea what that's going to be like. As far as we know, there has never before been anything in the galaxy that's vastly more capable than humans in the relevant ways! No matter what we come up with to solve the first three problems, we can't be too confident that it'll keep working if AI advances (or just proliferates) a lot more. Like trying to plan for first contact with extraterrestrials (this barely feels like an analogy).

(Click to expand) Power imbalances, and other risks beyond misaligned AI

I’ve argued that AI could cause a dramatic acceleration in the pace of scientific and technological advancement.

One way of thinking about this: perhaps (for reasons I’ve argued previously) AI could enable the equivalent of hundreds of years of scientific and technological advancement in a matter of a few months (or faster). If so, then developing powerful AI a few months before others could lead to having technology that is (effectively) hundreds of years ahead of others’.

Because of this, it’s easy to imagine that AI could lead to big power imbalances, as whatever country/countries/coalitions “lead the way” on AI development could become far more powerful than others (perhaps analogously to when a few smallish European states took over much of the rest of the world).

I think things could go very badly if the wrong country/countries/coalitions lead the way on transformative AI. At the same time, I’ve expressed concern that people might overfocus on this aspect of things vs. other issues, for a number of reasons including:

I think people naturally get more animated about "helping the good guys beat the bad guys" than about "helping all of us avoid getting a universally bad outcome, for impersonal reasons such as 'we designed sloppy AI systems' or 'we created a dynamic in which haste and aggression are rewarded.'"
I expect people will tend to be overconfident about which countries, organizations or people they see as the "good guys."

(More here.)

There are also dangers of powerful AI being too widespread, rather than too concentrated. In The Vulnerable World Hypothesis, Nick Bostrom contemplates potential future dynamics such as “advances in DIY biohacking tools might make it easy for anybody with basic training in biology to kill millions.” In addition to avoiding worlds where AI capabilities end up concentrated in the hands of a few, it could also be important to avoid worlds in which they diffuse too widely, too quickly, before we’re able to assess the risks of widespread access to technology far beyond today’s.

I discuss these and a number of other AI risks in a previous piece: Transformative AI issues (not just misalignment): an overview

I’ve laid out several ways to reduce the risks (color-coded since I’ll be referring to them throughout the piece):

Alignment research. Researchers are working on ways to design AI systems that are both (a) “aligned” in the sense that they don’t have unintended aims of their own; (b) very powerful, to the point where they can be competitive with the best systems out there.

I’ve laid out three high-level hopes for how - using techniques that are known today - we might be able to develop AI systems that are both aligned and powerful.
These techniques wouldn’t necessarily work indefinitely, but they might work long enough so that we can use early safe AI systems to make the situation much safer (by automating huge amounts of further alignment research, by helping to demonstrate risks and make the case for greater caution worldwide, etc.)
(A footnote explains how I’m using “aligned” vs. “safe.”¹)

(Click to expand) High-level hopes for AI alignment

A previous piece goes through what I see as three key possibilities for building powerful-but-safe AI systems.

It frames these using Ajeya Cotra’s young businessperson analogy for the core difficulties. In a nutshell, once AI systems get capable enough, it could be hard to test whether they’re safe, because they might be able to deceive and manipulate us into getting the wrong read. Thus, trying to determine whether they’re safe might be something like “being an eight-year-old trying to decide between adult job candidates (some of whom are manipulative).”

Key possibilities for navigating this challenge:

Digital neuroscience: perhaps we’ll be able to read (and/or even rewrite) the “digital brains” of AI systems, so that we can know (and change) what they’re “aiming” to do directly - rather than having to infer it from their behavior. (Perhaps the eight-year-old is a mind-reader, or even a young Professor X.)
Limited AI: perhaps we can make AI systems safe by making them limited in various ways - e.g., by leaving certain kinds of information out of their training, designing them to be “myopic” (focused on short-run as opposed to long-run goals), or something along those lines. Maybe we can make “limited AI” that is nonetheless able to carry out particular helpful tasks - such as doing lots more research on how to achieve safety without the limitations. (Perhaps the eight-year-old can limit the authority or knowledge of their hire, and still get the company run successfully.)
AI checks and balances: perhaps we’ll be able to employ some AI systems to critique, supervise, and even rewrite others. Even if no single AI system would be safe on its own, the right “checks and balances” setup could ensure that human interests win out. (Perhaps the eight-year-old is able to get the job candidates to evaluate and critique each other, such that all the eight-year-old needs to do is verify basic factual claims to know who the best candidate is.)

These are some of the main categories of hopes that are pretty easy to picture today. Further work on AI safety research might result in further ideas (and the above are not exhaustive - see my more detailed piece, posted to the Alignment Forum rather than Cold Takes, for more).

Standards and monitoring.I see some hope for developing standards that all potentially dangerous AI projects (whether companies, government projects, etc.) need to meet, and enforcing these standards globally.

Such standards could require strong demonstrations of safety, strong security practices, designing AI systems to be difficult to use for overly dangerous activity, etc.
We don't need a perfect system or international agreement to get a lot of benefit out of such a setup. The goal isn’t just to buy time – it’s to change incentives, such that AI projects need to make progress on improving security, alignment, etc. in order to be profitable.

(Click to expand) How standards might be established and become national or international

I previously laid out a possible vision on this front, which I’ll give a slightly modified version of here:

Today’s leading AI companies could self-regulate by committing not to build or deploy a system that they can’t convincingly demonstrate is safe (e.g., see Google’s 2018 statement, "We will not design or deploy AI in weapons or other technologies whose principal purpose or implementation is to cause or directly facilitate injury to people”).
- Even if some people at the companies would like to deploy unsafe systems, it could be hard to pull this off once the company has committed not to.
- Even if there’s a lot of room for judgment in what it means to demonstrate an AI system is safe, having agreed in advance that certain evidence is not good enough could go a long way.
As more AI companies are started, they could feel soft pressure to do similar self-regulation, and refusing to do so is off-putting to potential employees, investors, etc.
Eventually, similar principles could be incorporated into various government regulations and enforceable treaties.
Governments could monitor for dangerous projects using regulation and even overseas operations. E.g., today the US monitors (without permission) for various signs that other states might be developing nuclear weapons, and might try to stop such development with methods ranging from threats of sanctions to cyberwarfare or even military attacks. It could do something similar for any AI development projects that are using huge amounts of compute and haven’t volunteered information about whether they’re meeting standards.

Successful, careful AI projects. I think an AI company (or other project) can enormously improve the situation, if it can both (a) be one of the leaders in developing powerful AI; (b) prioritize doing (and using powerful AI for) things that reduce risks, such as doing alignment research. (But don’t read this as ignoring the fact that AI companies can do harm as well!)

(Click to expand) How a careful AI project could be helpful

In addition to using advanced AI to do AI safety research (noted above), an AI project could:

Put huge effort into designing tests for signs of danger, and - if it sees danger signs in its own systems - warning the world as a whole.
Offer deals to other AI companies/projects. E.g., acquiring them or exchanging a share of its profits for enough visibility and control to ensure that they don’t deploy dangerous AI systems.
Use its credibility as the leading company to lobby the government for helpful measures (such as enforcement of a monitoring-and-standards regime), and to more generally highlight key issues and advocate for sensible actions.
Try to ensure (via design, marketing, customer choice, etc.) that its AI systems are not used for dangerous ends, and are used on applications that make the world safer and better off. This could include defensive deployment to reduce risks from other AIs; it could include using advanced AI systems to help it gain clarity on how to get a good outcome for humanity; etc.

An AI project with a dominant market position could likely make a huge difference via things like the above (and probably via many routes I haven’t thought of). And even an AI project that is merely one of several leaders could have enough resources and credibility to have a lot of similar impacts - especially if it’s able to “lead by example” and persuade other AI projects (or make deals with them) to similarly prioritize actions like the above.

A challenge here is that I’m envisioning a project with two arguably contradictory properties: being careful (e.g., prioritizing actions like the above over just trying to maintain its position as a profitable/cutting-edge project) and successful (being a profitable/cutting-edge project). In practice, it could be very hard for an AI project to walk the tightrope of being aggressive enough to be a “leading” project (in the sense of having lots of resources, credibility, etc.), while also prioritizing actions like the above (which mostly, with some exceptions, seem pretty different from what an AI project would do if it were simply focused on its technological lead and profitability).

Strong security. A key threat is that someone could steal major components of an AI system and deploy it incautiously. It could be extremely hard for an AI project to be robustly safe against having its AI “stolen.” But this could change, if there’s enough effort to work out the problem of how to secure a large-scale, powerful AI system.

(Click to expand) The challenging of securing dangerous AI

In Racing Through a Minefield, I described a "race" between cautious actors (those who take misalignment risk seriously) and incautious actors (those who are focused on deploying AI for their own gain, and aren't thinking much about the dangers to the whole world). Ideally, cautious actors would collectively have more powerful AI systems than incautious actors, so they could take their time doing alignment research and other things to try to make the situation safer for everyone.

But if incautious actors can steal an AI from cautious actors and rush forward to deploy it for their own gain, then the situation looks a lot bleaker. And unfortunately, it could be hard to protect against this outcome.

It's generally extremely difficult to protect data and code against a well-resourced cyberwarfare/espionage effort. An AI’s “weights” (you can think of this sort of like its source code, though not exactly) are potentially very dangerous on their own, and hard to get extreme security for. Achieving enough cybersecurity could require measures, and preparations, well beyond what one would normally aim for in a commercial context.

Jobs that can help

In this long section, I’ll list a number of jobs I wish more people were pursuing.

Unfortunately, I can’t give individualized help exploring one or more of these career tracks. Starting points could include 80,000 Hours and various other resources.

Research and engineering careers. You can contribute to alignment research as a researcher and/or software engineer (the line between the two can be fuzzy in some contexts).

There are (not necessarily easy-to-get) jobs along these lines at major AI labs, in established academic labs, and at independent nonprofits (examples in footnote). ^[2]

Different institutions will have very different approaches to research, very different environments and philosophies, etc. so it’s hard to generalize about what might make someone a fit. A few high-level points:

It takes a lot of talent to get these jobs, but you shouldn’t assume that it takes years of experience in a particular field (or a particular degree).
- I’ve seen a number of people switch over from other fields (such as physics) and become successful extremely quickly.
- In addition to on-the-job training, there are independent programs specifically aimed at helping people skill up quickly. ^[3]
You also shouldn’t assume that these jobs are only for “scientist” types - there’s a substantial need for engineers, which I expect to grow.
I think most people working on alignment consider a lot of other people’s work to be useless at best. This seems important to know going in for a few reasons.
- You shouldn’t assume that all work is useless just because the first examples you see seem that way.
- It’s good to be aware that whatever you end up doing, someone will probably dunk on your work on the Internet.
- At the same time, you shouldn’t assume that your work is helpful because it’s “safety research.” It's worth investing a lot in understanding how any particular research you're doing could be helpful (and how it could fail).
  - I’d even suggest taking regular dedicated time (a day every few months?) to pause working on the day-to-day and think about how your work fits into the big picture.
- For a sense of what work I think is most likely to be useful, I’d suggest my piece on why AI safety seems hard to measure - I’m most excited about work that directly tackles the challenges outlined in that piece, and I’m pretty skeptical of work that only looks good with those challenges assumed away. (Also see my piece on broad categories of research I think have a chance to be highly useful, and some comments from a while ago that I still mostly endorse.)

I also want to call out a couple of categories of research that are getting some attention today, but seem at least a bit under-invested in, even relative to alignment research:

Threat assessment research. To me, there’s an important distinction between “Making AI systems safer” and “Finding out how dangerous they might end up being.” (Today, these tend to get lumped together under “alignment research.”)
- A key approach to medical research is using model organisms - for example, giving cancer to mice, so we can see whether we’re able to cure them.
- Analogously, one might deliberately (though carefully!^[4]) design an AI system to deceive and manipulate humans, so we can (a) get a more precise sense of what kinds of training dynamics lead to deception and manipulation; (b) see whether existing safety techniques are effective countermeasures.
- If we had concrete demonstrations of AI systems becoming deceptive/manipulative/power-seeking, we could potentially build more consensus for caution (e.g., standards and monitoring). Or we could imaginably produce evidence that the threat is low. ^[5]
- A couple of early examples of threat assessment research: here and here.
Anti-misuse research.
- I’ve written about how we could face catastrophe even from aligned AI. That is - even if AI does what its human operators want it to be doing, maybe some of its human operators want it to be helping them build bioweapons, spread propaganda, etc.
- But maybe it’s possible to train AIs so that they’re hard to use for purposes like this - a separate challenge from training them to avoid deceiving and manipulating their human operators.
- In practice, a lot of the work done on this today (example) tends to get called “safety” and lumped in with alignment (and sometimes the same research helps with both goals), but again, I think it’s a distinction worth making.
- I expect the earliest and easiest versions of this work to happen naturally as companies try to make their AI models fit for commercialization - but at some point it might be important to be making more intense, thorough attempts to prevent even very rare (but catastrophic) misuse.

Information security careers. There’s a big risk that a powerful AI system could be “stolen” via hacking/espionage, and this could make just about every kind of risk worse. I think it could be very challenging - but possible - for AI projects to be secure against this threat. (More above.)

I really think security is not getting enough attention from people concerned about AI risk, and I disagree with the idea that key security problems can be solved just by hiring from today’s security industry.

From what I’ve seen, AI companies have a lot of trouble finding good security hires. I think a lot of this is simply that security is challenging and valuable, and demand for good hires (especially people who can balance security needs against practical needs) tends to swamp supply.
- And yes, this means good security people are well-paid!
Additionally, AI could present unique security challenges in the future, because it requires protecting something that is simultaneously (a) fundamentally just software (not e.g. uranium), and hence very hard to protect; (b) potentially valuable enough that one could imagine very well-resourced state programs going all-out to steal it, with a breach having globally catastrophic consequences. I think trying to get out ahead of this challenge, by experimenting early on with approaches to it, could be very important.
It’s plausible to me that security is as important as alignment right now, in terms of how much one more good person working it will help.
And security is an easier path, because one can get mentorship from a large community of security people working on things other than AI. ^[6]
I think there’s a lot of potential value both in security research (e.g., developing new security techniques) and in simply working at major AI companies to help with their existing security needs.
For more on this topic, see this recent 80,000 hours report and this 2019 post by two of my coworkers.

Other jobs at AI companies. AI companies hire for a lot of roles, many of which don’t require any technical skills.

It’s a somewhat debatable/tricky path to take a role that isn’t focused specifically on safety or security. Some people believe ^[7] that you can do more harm than good this way, by helping companies push forward with building dangerous AI before the risks have gotten much attention or preparation - and I think this is a pretty reasonable take.

At the same time:

You could argue something like: “Company X has potential to be a successful, careful AI project. That is, it’s likely to deploy powerful AI systems more carefully and helpfully than others would, and use them to reduce risks by automating alignment research and other risk-reducing tasks. Furthermore, Company X is most likely to make a number of other decisions wisely as things develop. So, it’s worth accepting that Company X is speeding up AI progress, because of the hope that Company X can make things go better.” This obviously depends on how you feel about Company X compared to others!
Working at Company X could also present opportunities to influence Company X. If you’re a valuable contributor and you are paying attention to the choices the company is making (and speaking up about them), you could affect the incentives of leadership.
- I think this can be a useful thing to do in combination with the other things on this list, but I generally wouldn’t advise taking a job if this is one’s main goal.
Working at an AI company presents opportunities to become generally more knowledgeable about AI, possibly enabling a later job change to something else.

(Click to expand) How a careful AI project could be helpful

In addition to using advanced AI to do AI safety research (noted above), an AI project could:

Put huge effort into designing tests for signs of danger, and - if it sees danger signs in its own systems - warning the world as a whole.
Offer deals to other AI companies/projects. E.g., acquiring them or exchanging a share of its profits for enough visibility and control to ensure that they don’t deploy dangerous AI systems.
Use its credibility as the leading company to lobby the government for helpful measures (such as enforcement of a monitoring-and-standards regime), and to more generally highlight key issues and advocate for sensible actions.
Try to ensure (via design, marketing, customer choice, etc.) that its AI systems are not used for dangerous ends, and are used on applications that make the world safer and better off. This could include defensive deployment to reduce risks from other AIs; it could include using advanced AI systems to help it gain clarity on how to get a good outcome for humanity; etc.

80,000 Hours has a collection of anonymous advice on how to think about the pros and cons of working at an AI company.

In a future piece, I’ll discuss what I think AI companies can be doing today to prepare for transformative AI risk. This could be helpful for getting a sense of what an unusually careful AI company looks like.

Jobs in government and at government-facing think tanks. I think there is a lot of value in providing quality advice to governments (especially the US government) on how to think about AI - both today’s systems and potential future ones.

I also think it could make sense to work on other technology issues in government, which could be a good path to working on AI later (I expect government attention to AI to grow over time).

People interested in careers like these can check out Open Philanthropy’s Technology Policy Fellowships.

One related activity that seems especially valuable: understanding the state of AI in countries other than the one you’re working for/in - particularly countries that (a) have a good chance of developing their own major AI projects down the line; (b) are difficult to understand much about by default.

Having good information on such countries could be crucial for making good decisions, e.g. about moving cautiously vs. racing forward vs. trying to enforce safety standards internationally.
I think good work on this front has been done by the Center for Security and Emerging Technology ^[8] among others.

A future piece will discuss other things I think governments can be doing today to prepare for transformative AI risk. I won’t have a ton of tangible recommendations quite yet, but I expect there to be more over time, especially if and when standards and monitoring frameworks become better-developed.

Jobs in politics. The previous category focused on advising governments; this one is about working on political campaigns, doing polling analysis, etc. to generally improve the extent to which sane and reasonable people are in power. Obviously, it’s a judgment call which politicians are the “good” ones and which are the “bad” ones, but I didn’t want to leave out this category of work.

Forecasting. I’m intrigued by organizations like Metaculus, HyperMind, Good Judgment, ^[9] Manifold Markets, and Samotsvety - all trying, in one way or another, to produce good probabilistic forecasts (using generalizable methods ^[10]) about world events.

If we could get good forecasts about questions like “When will AI systems be powerful enough to defeat all of humanity?” and “Will AI safety research in category X be successful?”, this could be useful for helping people make good decisions. (These questions seem very hard to get good predictions on using these organizations’ methods, but I think it’s an interesting goal.)

To explore this area, I’d suggest learning about forecasting generally (Superforecasting is a good starting point) and building up your own prediction track record on sites such as the above.

“Meta” careers. There are a number of jobs focused on helping other people learn about key issues, develop key skills and end up in helpful jobs (a bit more discussion here).

It can also make sense to take jobs that put one in a good position to donate to nonprofits doing important work, to spread helpful messages, and to build skills that could be useful later (including in unexpected ways, as things develop), as I’ll discuss below.

Low-guidance jobs

This sub-section lists some projects that either don’t exist (but seem like they ought to), or are in very embryonic stages. So it’s unlikely you can get any significant mentorship working on these things.

I think the potential impact of making one of these work is huge, but I think most people will have an easier time finding a fit with jobs from the previous section (which is why I listed those first).

This section is largely to illustrate that I expect there to be more and more ways to be helpful as time goes on - and in case any readers feel excited and qualified to tackle these projects themselves, despite a lack of guidance and a distinct possibility that a project will make less sense in reality than it does on paper.

A big one in my mind is developing safety standards that could be used in a standards and monitoring regime. By this I mean answering questions like:

What observations could tell us that AI systems are getting dangerous to humanity (whether by pursuing aims of their own or by helping humans do dangerous things)?
- A starting-point question: why do we believe today’s systems aren’t dangerous? What, specifically, are they unable to do that they’d have to do in order to be dangerous, and how will we know when that’s changed?
Once AI systems have potential for danger, how should they be restricted, and what conditions should AI companies meet (e.g., demonstrations of safety and security) in order to loosen restrictions?

There is some early work going on along these lines, at both AI companies and nonprofits. If it goes well, I expect that there could be many jobs in the future, doing things like:

Continuing to refine and improve safety standards as AI systems get more advanced.
Providing AI companies with “audits” - examinations of whether their systems meet standards, provided by parties outside the company to reduce conflicts of interest.
Advocating for the importance of adherence to standards. This could include advocating for AI companies to abide by standards, and potentially for government policies to enforce standards.

Other public goods for AI projects. I can see a number of other ways in which independent organizations could help AI projects exercise more caution / do more to reduce risks:

Facilitating safety research collaborations. I worry that at some point, doing good alignment research will only be possible with access to state-of-the-art AI models - but such models will be extraordinarily expensive and exclusively controlled by major AI companies.
- I hope AI companies will be able to partner with outside safety researchers (not just rely on their own employees) for alignment research, but this could get quite tricky due to concerns about intellectual property leaks.
- A third-party organization could do a lot of the legwork of vetting safety researchers, helping them with their security practices, working out agreements with respect to intellectual property, etc. to make partnerships - and selective information sharing, more broadly - more workable.
Education for key people at AI companies. An organization could help employees, investors, and board members of AI companies learn about the potential risks and challenges of advanced AI systems. I’m especially excited about this for board members, because:
- I’ve already seen a lot of interest from AI companies in forming strong ethics advisory boards, and/or putting well-qualified people on their governing boards (see footnote for the difference ^[11]). I expect demand to go up.
- Right now, I don’t think there are a lot of people who are both (a) prominent and “fancy” enough to be considered for such boards; (b) highly thoughtful about, and well-versed in, what I consider some of the most important risks of transformative AI (covered in this piece and the series it’s part of).
- An “education for potential board members” program could try to get people quickly up to speed on good board member practices generally, on risks of transformative AI, and on the basics of how modern AI works.
Helping share best practices across AI companies. A third-party organization might collect information about how different AI companies are handling information security, alignment research, processes for difficult decisions, governance, etc. and share it across companies, while taking care to preserve confidentiality. I’m particularly interested in the possibility of developing and sharing innovative governance setups for AI companies.

Thinking and stuff. There’s tons of potential work to do in the category of “coming up with more issues we ought to be thinking about, more things people (and companies and governments) can do to be helpful, etc.”

About a year ago, I published a list of research questions that could be valuable and important to gain clarity on. I still mostly endorse this list (though I wouldn’t write it just as is today).
A slightly different angle: it could be valuable to have more people thinking about the question, “What are some tangible policies governments could enact to be helpful?” E.g., early steps towards standards and monitoring. This is distinct from advising governments directly (it's earlier-stage).

Some AI companies have policy teams that do work along these lines. And a few Open Philanthropy employees work on topics along the lines of the first bullet point. However, I tend to think of this work as best done by people who need very little guidance (more at my discussion of wicked problems), so I’m hesitant to recommend it as a mainline career option.

Things you can do if you’re not ready for a full-time career change

Switching careers is a big step, so this section lists some ways you can be helpful regardless of your job - including preparing yourself for a later switch.

First and most importantly, you may have opportunities to spread key messages via social media, talking with friends and colleagues, etc. I think there’s a lot of potential to make a difference here, and I wrote a previous post on this specifically.

Second, you can explore potential careers like those I discuss above. I’d suggest generally checking out job postings, thinking about what sorts of jobs might be a fit for you down the line, meeting people who work in jobs like those and asking them about their day-to-day, etc.

Relatedly, you can try to keep your options open.

It’s hard to predict what skills will be useful as AI advances further and new issues come up.
Being ready to switch careers when a big opportunity comes up could be hugely valuable - and hard. (Most people would have a lot of trouble doing this late in their career, no matter how important!)
Building up the financial, psychological and social ability to change jobs later on would (IMO) be well worth a lot of effort.

Right now there aren’t a lot of obvious places to donate (though you can donate to the Long-Term Future Fund ^[12] if you feel so moved).

I’m guessing this will change in the future, for a number of reasons.^[13]
Something I’d consider doing is setting some pool of money aside, perhaps invested such that it’s particularly likely to grow a lot if and when AI systems become a lot more capable and impressive,^[14] in case giving opportunities come up in the future.
You can also, of course, donate to things today that others aren’t funding for whatever reason.

Learning more about key issues could broaden your options. I think the full series I’ve written on key risks is a good start. To do more, you could:

Actively engage with this series by writing your own takes, discussing with others, etc.
Consider various online courses ^[15] on relevant issues.
I think it’s also good to get as familiar with today’s AI systems (and the research that goes into them) as you can.
- If you’re happy to write code, you can check out coding-intensive guides and programs (examples in footnote). ^[16]
- If you don’t want to code but can read somewhat technical content, I’d suggest getting oriented with some basic explainers on deep learning ^[17] and then reading significant papers on AI and AI safety. ^[18]
- Whether you’re very technical or not at all, I think it’s worth playing with public state-of-the-art AI models, as well as seeing highlights of what they can do via Twitter and such.

Finally, if you happen to have opportunities to serve on governing boards or advisory boards for key organizations (e.g., AI companies), I think this is one of the best non-full-time ways to help.

I don’t expect this to apply to most people, but wanted to mention it in case any opportunities come up.
It’s particularly important, if you get a role like this, to invest in educating yourself on key issues.

Some general advice

I think full-time work has huge potential to help, but also big potential to do harm, or to burn yourself out. So here are some general suggestions.

Think about your own views on the key risks of AI, and what it might look like for the world to deal with the risks. Most of the jobs I’ve discussed aren’t jobs where you can just take instructions and apply narrow skills. The issues here are tricky, and it takes judgment to navigate them well.

Furthermore, no matter what you do, there will almost certainly be people who think your work is useless (if not harmful).^[19] This can be very demoralizing. I think it’s easier if you’ve thought things through and feel good about the choices you’re making.

I’d advise trying to learn as much as you can about the major risks of AI (see above for some guidance on this) - and/or trying to work for an organization whose leadership you have a good amount of confidence in.

Jog, don’t sprint. Skeptics of the “most important century” hypothesis will sometimes say things like “If you really believe this, why are you working normal amounts of hours instead of extreme amounts? Why do you have hobbies (or children, etc.) at all?” And I’ve seen a number of people with an attitude like: “THIS IS THE MOST IMPORTANT TIME IN HISTORY. I NEED TO WORK 24/7 AND FORGET ABOUT EVERYTHING ELSE. NO VACATIONS."

I think that’s a very bad idea.

Trying to reduce risks from advanced AI is, as of today, a frustrating and disorienting thing to be doing. It’s very hard to tell whether you’re being helpful (and as I’ve mentioned, many will inevitably think you’re being harmful).

I think the difference between “not mattering,” “doing some good” and “doing enormous good” comes down to how you choose the job, how good at it you are, and how good your judgment is (including what risks you’re most focused on and how you model them). Going “all in” on a particular objective seems bad on these fronts: it poses risks to open-mindedness, to mental health and to good decision-making (I am speaking from observations here, not just theory).

That is, I think it’s a bad idea to try to be 100% emotionally bought into the full stakes of the most important century - I think the stakes are just too high for that to make sense for any human being.

Instead, I think the best way to handle “the fate of humanity is at stake” is probably to find a nice job and work about as hard as you’d work at another job, rather than trying to make heroic efforts to work extra hard. (I criticized heroic efforts in general here.)

I think this basic formula (working in some job that is a good fit, while having some amount of balance in your life) is what’s behind a lot of the most important positive events in history to date, and presents possibly historically large opportunities today.

Special thanks to Alexander Berger, Jacob Eliosoff, Alexey Guzey, Anton Korinek and Luke Muelhauser for especially helpful comments on this post. A lot of other people commented helpfully as well.

Footnotes

I use “aligned” to specifically mean that AIs behave as intended, rather than pursuing dangerous goals of their own. I use “safe” more broadly to mean that an AI system poses little risk of catastrophe for any reason in the context it’s being used in. It’s OK to mostly think of them as interchangeable in this post. ↩
AI labs with alignment teams: Anthropic, DeepMind and OpenAI. Disclosure: my wife is co-founder and President of Anthropic, and used to work at OpenAI (and has shares in both companies); OpenAI is a former Open Philanthropy grantee.
Academic labs: there are many of these; I’ll highlight the Steinhardt lab at Berkeley (Open Philanthropy grantee), whose recent research I’ve found especially interesting.
Independent nonprofits: examples would be Alignment Research Center and Redwood Research (both Open Philanthropy grantees, and I sit on the board of both).
You can also ↩
Examples: AGI Safety Fundamentals, SERI MATS, MLAB (all of which have been supported by Open Philanthropy) ↩
On one hand, deceptive and manipulative AIs could be dangerous. On the other, it might be better to get AIs trying to deceive us before they can consistently succeed; the worst of all worlds might be getting this behavior by accident with very powerful AIs. ↩
Though I think it’s inherently harder to get evidence of low risk than evidence of high risk, since it’s hard to rule out risks arising as AI systems get more capable. ↩
Why do I simultaneously think “This is a mature field with mentorship opportunities” and “This is a badly neglected career track for helping with the most important century”?
In a nutshell, most good security people are not working on AI. It looks to me like there are plenty of people who are generally knowledgeable and effective at good security, but there’s also a huge amount of need for such people outside of AI specifically.
I expect this to change eventually if AI systems become extraordinarily capable. The issue is that it might be too late at that point - the security challenges in AI seem daunting (and somewhat AI-specific) to the point where it could be important for good people to start working on them many years before AI systems become extraordinarily powerful. ↩
Here’s Katja Grace arguing along these lines. ↩
An Open Philanthropy grantee. ↩
Open Philanthropy has funded Metaculus and contracted with Good Judgment and HyperMind. ↩
That is, these groups are mostly trying things like “Incentivize people to make good forecasts; track how good people are making forecasts; aggregate forecasts” rather than “Study the specific topic of AI and make forecasts that way” (the latter is also useful, and I discuss it below). ↩
The governing board of an organization has the hard power to replace the CEO and/or make other decisions on behalf of the organization. An advisory board merely gives advice, but in practice I think this can be quite powerful, since I’d expect many organizations to have a tough time doing bad-for-the-world things without backlash (from employees and the public) once an advisory board has recommended against them. ↩
Open Philanthropy, which I’m co-CEO of, has supported this fund, and its current Chair is an Open Philanthropy employee. ↩
I generally expect there to be more and more clarity about what actions would be helpful, and more and more people willing to work on them if they can get funded. A bit more specifically and speculatively, I expect AI safety research to get more expensive as it requires access to increasingly large, expensive AI models. ↩
Not investment advice! I would only do this with money you’ve set aside for donating such that it wouldn’t be a personal problem if you lost it all. ↩
Some options here, here, here, here. I’ve made no attempt to be comprehensive - these are just some links that should make it easy to get rolling and see some of your options. ↩
Spinning Up in Deep RL, ML for Alignment Bootcamp, Deep Learning Curriculum. ↩
For the basics, I like Michael Nielsen’s guide to neural networks and deep learning; 3Blue1Brown has a video explainer series that I haven’t watched but that others have recommended highly. I’d also suggest The Illustrated Transformer (the transformer is the most important AI architecture as of today).
For a broader overview of different architectures, see Neural Network Zoo.
You can also check out various Coursera etc. courses on deep learning/neural networks. ↩
I feel like the easiest way to do this is to follow AI researchers and/or top labs on Twitter. You can also check out Alignment Newsletter or ML Safety Newsletter for alignment-specific content. ↩
Why?
One reason is the tension between the “caution” and “competition” frames: people who favor one frame tend to see the other as harmful.
Another reason: there are a number of people who think we’re more-or-less doomed without a radical conceptual breakthrough on how to build safe AI (they think the sorts of approaches I list here are hopeless, for reasons I confess I don’t understand very well). These folks will consider anything that isn’t aimed at a radical breakthrough ~useless, and consider some of the jobs I list in this piece to be harmful, if they are speeding up AI development and leaving us with less time for a breakthrough.
At the same time, working toward the sort of breakthrough these folks are hoping for means doing pretty esoteric, theoretical research that many other researchers think is clearly useless.
And trying to make AI development slower and/or more cautious is harmful according to some people who are dismissive of risks, and think the priority is to push forward as fast as we can with technology that has the potential to improve lives. ↩

Show all footnotes

Arjun PanicksseryFeb 13 20236

About a year ago, I published a list of research questions that could be valuable and important to gain clarity on. I still mostly endorse this list (though I wouldn’t write it just as is today).

Could you expand on "though I wouldn’t write it just as is today"?

Holden KarnofskyMar 18 20236

Not easily - I skimmed it before linking to it and thought "Eh, I would maybe reframe some of these if I were writing the post today," but found it easier to simply note that point than to do a rewrite or even a list of specific changes, given that I don't think the picture has radically changed.

Effective Altruism Forum
EA Forum