I’ve been writing about tangible things we can do today to help the most important century go well. Previously, I wrote about helpful messages to spread and how to help via full-time work.
This piece is about what major AI companies can do (and not do) to be helpful. By “major AI companies,” I mean the sorts of AI companies that are advancing the state of the art, and/or could play a major role in how very powerful AI systems end up getting used.1
This piece could be useful to people who work at those companies, or people who are just curious.
Generally, these are not pie-in-the-sky suggestions - I can name2 more than one AI company that has at least made a serious effort at each of the things I discuss below (beyond what it would do if everyone at the company were singularly focused on making a profit).3
- Prioritizing alignment research, strong security, and safety standards (all of which I’ve written about previously).
- Avoiding hype and acceleration, which I think could leave us with less time to prepare for key risks.
- Preparing for difficult decisions ahead: setting up governance, employee expectations, investor expectations, etc. so that the company is capable of doing non-profit-maximizing things to help avoid catastrophe in the future.
- Balancing these cautionary measures with conventional/financial success.
- I’ll also list a few things that some AI companies present as important, but which I’m less excited about: censorship of AI models, open-sourcing AI models, raising awareness of AI with governments and the public. I don’t think all these things are necessarily bad, but I think some are, and I’m skeptical that any are crucial for the risks I’ve focused on.
I previously laid out a summary of how I see the major risks of advanced AI, and four key things I think can help (alignment research; strong security; standards and monitoring; successful, careful AI projects). I won’t repeat that summary now, but it might be helpful for orienting you if you don’t remember the rest of this series too well; click here to read it.
Some basics: alignment research, strong security, safety standards
First off, AI companies can contribute to the “things that can help” I listed above:
- They can prioritize alignment research (and other technical research, e.g. threat assessment research and misuse research).
- For example, they can prioritize hiring for safety teams, empowering these teams, encouraging their best flexible researchers to work on safety, aiming for high-quality research that targets crucial challenges, etc.
- It could also be important for AI companies to find ways to partner with outside safety researchers rather than rely solely on their own teams. As discussed previously, this could be challenging. But I generally expect that AI companies that care a lot about safety research partnerships will find ways to make them work.
- They can help work toward a standards and monitoring regime. E.g., they can do their own work to come up with standards like "An AI system is dangerous if we observe that it's able to ___, and if we observe this we will take safety and security measures such as ____." They can also consult with others developing safety standards, voluntarily self-regulate beyond what’s required by law, etc.
- They can prioritize strong security, beyond what normal commercial incentives would call for.
- It could easily take years to build secure enough systems, processes and technologies for very high-stakes AI.
- It could be important to hire not only people to handle everyday security needs, but people to experiment with more exotic setups that could be needed later, as the incentives to steal AI get stronger.
(Click to expand) The challenge of securing dangerous AI
In misalignment risk seriously) and incautious actors (those who are focused on deploying AI for their own gain, and aren't thinking much about the dangers to the whole world). Ideally, cautious actors would collectively have more powerful AI systems than incautious actors, so they could take their time doing alignment research and other things to try to make the situation safer for everyone.
But if incautious actors can steal an AI from cautious actors and rush forward to deploy it for their own gain, then the situation looks a lot bleaker. And unfortunately, it could be hard to protect against this outcome.
It's generally extremely difficult to protect data and code against a well-resourced cyberwarfare/espionage effort. An AI’s “weights” (you can think of this sort of like its source code, though not exactly) are potentially very dangerous on their own, and hard to get extreme security for. Achieving enough cybersecurity could require measures, and preparations, well beyond what one would normally aim for in a commercial context.
(Click to expand) How standards might be established and become national or international
I previously laid out a possible vision on this front, which I’ll give a slightly modified version of here:
- Today’s leading AI companies could self-regulate by committing not to build or deploy a system that they can’t convincingly demonstrate is safe (e.g., see Google’s 2018 statement, "We will not design or deploy AI in weapons or other technologies whose principal purpose or implementation is to cause or directly facilitate injury to people”).
- Even if some people at the companies would like to deploy unsafe systems, it could be hard to pull this off once the company has committed not to.
- Even if there’s a lot of room for judgment in what it means to demonstrate an AI system is safe, having agreed in advance that certain evidence is not good enough could go a long way.
- As more AI companies are started, they could feel soft pressure to do similar self-regulation, and refusing to do so is off-putting to potential employees, investors, etc.
- Eventually, similar principles could be incorporated into various government regulations and enforceable treaties.
- Governments could monitor for dangerous projects using regulation and even overseas operations. E.g., today the US monitors (without permission) for various signs that other states might be developing nuclear weapons, and might try to stop such development with methods ranging from threats of sanctions to cyberwarfare or even military attacks. It could do something similar for any AI development projects that are using huge amounts of compute and haven’t volunteered information about whether they’re meeting standards.
Avoiding hype and acceleration
It seems good for AI companies to avoid unnecessary hype and acceleration of AI.
I’ve argued that we’re not ready for transformative AI, and I generally tend to think that we’d all be better off if the world took longer to develop transformative AI. That’s because:
- I’m hoping general awareness and understanding of the key risks will rise over time.
- A lot of key things that could improve the situation - e.g., alignment research, standards and monitoring, and strong security - seem to be in very early stages right now.
- If too much money pours into the AI world too fast, I’m worried there will be lots of incautious companies racing to build transformative AI as quickly as they can, with little regard for the key risks.
By default, I generally think: “The fewer flashy demos and breakthrough papers a lab is putting out, the better.” This can involve tricky tradeoffs in practice (since AI companies generally want to be successful at recruiting, fundraising, etc.)
A couple of potential counterarguments, and replies:
First, some people think it's now "too late" to avoid hype and acceleration, given the amount of hype and investment AI is getting at the moment. I disagree. It's easy to forget, in the middle of a media cycle, how quickly people can forget about things and move onto the next story once the bombs stop dropping. And there are plenty of bombs that still haven't dropped (many things AIs still can't do), and the level of investment in AI has tons of room to go up from here.
Second, I’ve sometimes seen arguments that hype is good because it helps society at large understand what’s coming. But unfortunately, as I wrote previously, I'm worried that hype gives people a skewed picture.
- Some key risks are hard to understand and take seriously.
- What's easy to understand is something like "AI is powerful and scary, I should make sure that people like me are the ones to build it!"
- Maybe recent developments will make people understand the risks better? One can hope, but I'm not counting on that just yet - I think AI misbehavior can be given illusory "fixes," and probably will be.
I also am generally skeptical that there's much hope of society adapting to risks as they happen, given the explosive pace of change that I expect once we get powerful enough AI systems.
I discuss some more arguments on this point in a footnote.4
I don’t think it’s clear-cut that hype and acceleration are bad, but it’s my best guess.
Preparing for difficult decisions ahead
I’ve argued that AI companies might need to do “out-of-the-ordinary” things that don’t go with normal commercial incentives.
Today, AI companies can be building a foundation for being able to do “out-of-the-ordinary” things in the future. A few examples of how they might do so:
Public-benefit-oriented governance. I think typical governance structures could be a problem in the future. For example, a standard corporation could be sued for not deploying AI that poses a risk of global catastrophe - if this means a sacrifice for its bottom line.
I’m excited about AI companies that are investing heavily in setting up governance structures - and investing in executives and board members - capable of making the hard calls well. For example:
- By default, if an AI company is a standard corporation, its leadership has legally recognized duties to serve the interests of shareholders - not society at large. But an AI company can incorporate as a Public Benefit Corporation or some other kind of entity (including a nonprofit!) that gives more flexibility here.
- By default, shareholders make the final call over what a company does. (Shareholders can replace members of the Board of Directors, who in turn can replace the CEO). But a company can set things up differently (e.g., a for-profit controlled by a nonprofit5).
It could pay off in lots of ways to make sure the final calls at a company are made by people focused on getting a good outcome for humanity (and legally free to focus this way).
Gaming out the future. I think it’s not too early for AI companies to be discussing how they would handle various high-stakes situations.
- Under what circumstances would the company simply decide to stop training increasingly powerful AI models?
- If the company came to believe it was building very powerful, dangerous models, whom would it notify and seek advice from? At what point would it approach the government, and how would it do so?
- At what point would it be worth using extremely costly security measures?
- If the company had AI systems available that could do most of what humans can do, what would it do with these systems? Use them to do AI safety research? Use them to design better algorithms and continue making increasingly powerful AI systems? (More possibilities here.)
- Who should be leading the way on decisions like these? Companies tend to employ experts to inform their decisions; who would the company look to for expertise on these kinds of decisions?
Establishing and getting practice with processes for particularly hard decisions. Should the company publish its latest research breakthrough? Should it put out a product that might lead to more hype and acceleration? What safety researchers should get access to its models, and how much access?
AI companies face questions like this pretty regularly today, and I think it’s worth putting processes in place to consider the implications for the world as a whole (not just for the company’s bottom line). This could include assembling advisory boards, internal task forces, etc.
Managing employee and investor expectations. At some point, an AI company might want to make “out of the ordinary” moves that are good for the world but bad for the bottom line. E.g., choosing not to deploy AIs that could be very dangerous or very profitable.
I wouldn’t want to be trying to run a company in this situation with lots of angry employees and investors asking about the value of their equity shares! It’s also important to minimize the risk of employees and/or investors leaking sensitive and potentially dangerous information.
AI companies can prepare for this kind of situation by doing things like:
- Being selective about whom they hire and take investment from, and screening specifically for people they think are likely to be on board with these sorts of hard calls.
- Education and communications - making it clear to employees what kinds of dangerous-to-humanity situations might be coming up in the future, and what kinds of actions the company might want to take (and why).
Internal and external commitments. AI companies can make public and/or internal statements about how they would handle various tough situations, e.g. how they would determine when it’s too dangerous to keep building more powerful models.
I think these commitments should generally be non-binding (it’s hard to predict the future in enough detail to make binding ones). But in a future where maximizing profit conflicts with doing the right thing for humanity, a previously-made commitment could make it more likely that the company does the right thing.
I’ve emphasized how helpful a successful, careful AI projects could be. So far, this piece has mostly talked about the “careful” side of things - how to do things that a “normal” AI company (focused only on commercial success) wouldn’t, in order to reduce risks. But it’s also important to succeed at fundraising, recruiting, and generally staying relevant (e.g., capable of building cutting-edge AI systems).
I don’t emphasize this or write about it as much because I think it’s the sort of thing AI companies are likely to be focused on by default, and because I don’t have special insight into how to succeed as an AI company. But it’s important, and it means that AI companies need to walk a sort of tightrope - constantly making tradeoffs between success and caution.
Some things I’m less excited about
I think it’s also worth listing a few things that some AI companies present as important societal-benefit measures, but which I’m a bit more skeptical are crucial for reducing the risks I’ve focused on.
- Some AI companies restrict access to their models so people won’t use the AIs to create pornography, misleading images and text, etc. I’m not necessarily against this and support versions of it (it depends on the details), but I mostly don’t think it is a key way to reduce the risks I’ve focused on. For those risks, the hype that comes from seeing a demonstration of a system’s capabilities could be even more dangerous than direct harms.
- I sometimes see people implying that open-sourcing AI models - and otherwise making them as broadly available as possible - is a key social-benefit measure. While there may be benefits in some cases, I mostly see this kind of thing as being negative (or at best neutral) in terms of the risks I’m most concerned about.
- I think it can contribute to hype and acceleration, and could make it generally harder to enforce safety standards.
- In the long run, I worry that AI systems could become extraordinarily powerful (more so than e.g. nuclear weapons), so I don’t think “Make sure everyone has access asap” is the right framework.
- In addition to increasing dangers from misaligned AI, this framework could increase other dangers I’ve written about previously.
- I generally don’t think AI companies should be trying to get governments to pay more attention to AI, for reasons I’ll get to in a future piece. (Forming relationships with policymakers could be good, though.)
When an AI company presents some decision as being for the benefit of humanity, I often ask myself, “Could this same decision be justified by just wanting to commercialize successfully?”
For example, making AI models “safe” in the sense that they usually behave as users intend (including things like refraining from toxic language, chaotic behavior, etc.) can be important for commercial viability, but isn’t necessarily good enough for the risks I worry about.
Disclosure: my wife works at one such company (Anthropic) and used to work at another (OpenAI), and has equity in both. ↩
Though I won’t, because I decided I don’t want to get into a thing about whom I did and didn’t link to. Feel free to give real-world examples in the comments! ↩
Now, AI companies could sometimes be doing “responsible” or “safety-oriented” things in order to get good PRs, recruit employees, make existing employees happy, etc. In this sense, the actions could be ultimately profit-motivated. But that would still mean there are enough people who care about reducing AI risk that actions like these have PR benefits, recruiting benefits, etc. That’s a big deal! And it suggests that if concern about AI risks (and understanding of how to reduce them) were more widespread, AI companies might do more good things and fewer dangerous things. ↩
You could argue that it would be better for the world to develop extremely powerful AI systems sooner, for reasons including:
- You might be pretty happy with the global balance of power between countries today, and be worried that it’ll get worse in the future. The latter could lead to a situation where the “wrong” government leads the way on transformative AI.
- You might think that the later we develop transformative AI, the more quickly everything will play out, because there will be more computing resources available in the world. E.g., if we develop extremely powerful systems tomorrow, there would only be so many copies we could run at once, whereas if we develop equally powerful systems in 50 years, it might be a lot easier for lots of people to run lots of copies. (More: Hardware Overhang)
A key reason I believe it’s best to avoid acceleration at this time is because it seems plausible (at least 10% likely) that transformative AI will be developed extremely soon - as in within 10 years of today. My impression is that many people at major AI companies tend to agree with this. I think this is a very scary possibility, and if this is the case, the arguments I give in the main text seem particularly important (e.g., many key interventions seem to be in a pretty embryonic state, and awareness of key risks seems low).
A related case one could make for acceleration is “It’s worth accelerating things on the whole to increase the probability that the particular company in question succeeds” (more here: the “competition” frame). I think this is a valid consideration, which is why I talk about tricky tradeoffs in the main text. ↩
Note that my wife is a former employee of OpenAI, the company I link to there, and she owns equity in the company. ↩
Holden - thanks for this thoughtful and constructive piece.
However, I think a crucial strategy is missing here.
If we're serious that AI imposes existential risks on humanity, then the best thing that AI companies can do to help us survive this pivotal century is simple: Shut down their AI research. Do something else. Act like they care about the fate of their kids and grandkids.
AI research doesn't need to be shut down forever. Maybe just for the next few centuries, until we better understand the risks and how to manage them.
I simply don't understand why so many EAs are encouraging AI development as if it's too cool to question, too inevitable to challenge, and too incentivized to deter. Almost all of us agree that AI will impose potentially catastrophic risks. We all agree that AI alignment is far from solved, and many of us believe it probably won't be solved in time to save us from recklessly fast AI development.
We probably can't shut down AI research through government regulation or gentle coaxing, given the coordination problems, governance problems, arms races, and corporate incentives. But we could probably do it through promoting new social & ethical norms that impose a heavy moral stigma against AI research, AI researchers, and AI companies. Historically, intense moral stigmatization has been successful at handicapping, delaying, pausing, defunding, marginalizing, and/or shutting down many research fields. And moral stigmatization in the modern social media world can operate even more quickly, powerfully, globally, and effectively. (I'm working on a longer piece about this moral stigmatization strategy for reducing AI X-risk.)
In short: maybe it's time for EA to stop playing nice with the AI industry -- given that the AI industry is not playing safely with humanity's future.
And maybe it's time to call a spade a spade: if AI companies are pursuing AI capabilities at a rate that could end our species, without any credible safeguards that could protect our species, then they're evil. Maybe we should say they're evil, treat them as evil, and encourage others to do the same, until they stop doing evil.
If I saw a path to slowing down or stopping AI development, reliably and worldwide, I think it’d be worth considering.
But I don’t think advising particular AI companies to essentially shut down (or radically change their mission) is a promising step toward that goal.
And I think partial progress toward that goal is worse than none, if it slows down relatively caution-oriented players without slowing down others.
A deeper problem to this is market forces - investments is pouring into the industry and its just not going to stop especially as we've seen how fast chatGPT was adopted (100M users in 2 months). This is a big reason why AI industries will not stop, they have the support of economics to push the boundaries of the AI. My hope is there are installed AI safety guidelines on the first one that will be adopted by billions of people.
Miguel -- the market forces are strong, but they can be over-ridden by moral stigmatization and moral disgust.
If it becomes morally taboo to invest in AI companies, to work in AI research, to promote AI development, or to vote for pro-AI politicians, then AI research will be handicapped. Just as many other areas of research and development have been handicapped by moral taboos over the last century.
Greed is a strong emotion driving AI investment. But moral disgust can be an even stronger emotion that could reduce AI investment.
Greed is one thing. It is a human universal problem. I would say that a big chunk is greedy but there are those who seek to adapt and were just trying to help build it properly. People in the alignment research probably are those in these category but not sure of how does the moral standards is for them.
Weighing on moral disgust, my analysis is it is possible to push this concept but I believe the general public will not gravitate to this - most will choose the technology camp, as those that will defend AI will explain it from the standpoint that it will "make things easier" - an easier idea to sell.
With ChatGPT and Open AI, what is your assessment on your and OpenPhil impact on AI safety in relation to the involvement with Open AI?
He talks about it here: https://www.dwarkeshpatel.com/p/holden-karnofsky#details (Ctrl+F OpenAI)