This essay was submitted to Open Philanthropy's Cause Exploration Prizes contest (before the deadline). We are uploading some back entries late, but all good-faith entries were considered for prizes
Author's note: This piece assumes unaligned AGI is a serious threat to humanity, and does not spend time laying out this argument.
The case for working on this
Since it might be impossible to solve technical AGI alignment in time, AGI governance is an incredibly important area to work on. Moreover, there are other serious risks pertaining to AGI in addition to alignment, such as weaponization or the concentration of extreme power, that will most likely require governance solutions as opposed to technical ones.
While the field of AGI governance is still largely uncertain about what specific policies would be beneficial, we do have enough information to foresee some of the bottlenecks AGI safety policies are likely to face.
One fairly common bottleneck that AGI governance may be particularly susceptible to is lack of public support. The status quo of a total absence of public awareness about serious AGI risks could be far from the optimal situation. If the only people who care about AGI risk are a niche group of a few thousand people across the globe, the views of AI/CS experts are mixed, and there are multi-billion-dollar companies who actively oppose meaningful safety policies, it will be very difficult to implement large-scale reforms.
We could wait until we have a list of good AGI governance policies that we want people to rally behind before trying to build public support around them. The downside of this approach is that outreach will probably be harder to do effectively later. A version of AGI accelerationism may beat us to the punch and become the dominant narrative, especially as programs like DALL-E 2 become popularized, enter the lives of everyday people, and grow increasingly flashier, cooler, and more useful. If this happens, the task will go from persuading people to form an opinion about a subject they had not previously considered to persuading people to change their minds, which is much harder. It is probably preferable—or even essential—for the public, relevant experts, and policymakers to be on “our side”, and delaying efforts to communicate our concerns about AGI makes this harder to achieve.
A responsible version of bringing AGI risks to the public awareness probably involves lots of focus groups, pilots, feedback, and gradual scaling. There are many ways in which this project could go wrong, and it is very important that we get it right. Among other concerns, we want to make sure the arguments sound compelling to laypeople, the safety measures the public/key stakeholders support are in line with those that are actually helpful, and that AGI safety does not become politically polarized in a counterproductive way.
Relevant stakeholders & possible strategies
There are three main groups that we could engage in outreach to, outlined below.
Working directly with government may be the best strategy in some cases, such as when a policy is very technical or the asks we are making are very specific, such that we want technically-versed people in direct contact with policymakers. This can include elected officials, ministers and department heads, policy advisors, and other civil servants.
Most of this work right now seems to be happening in the US and in Europe at an EU level. It might be worthwhile to do similar types of government outreach in Europe at a national level, as well as in other countries with major technology sectors, such as high-income Asian nations. While they are not currently the main AI hubs, successful outreach there could set a precedent for governments of current AI hubs to take AGI safety seriously.
However, communication with governments may be insufficient in some cases. Some national or cross-national policies may require broader public support in order to be ratified, especially given the likely corporate resistance they will face. If your constituents do not care about an issue, the media is not covering it, and multiple powerful companies are giving you a hard time or showcasing the short-term economic benefits that AI innovation could bring to your country, you have far less of an incentive to put effort into getting that policy passed than if you would get more political praise for it.
Relevant experts and credible figures
One challenge for outreach through credible figures is that a lot of the researchers and engineers in the field of AI are doing work that shortens the timeline for the arrival of AGI, and thus have an obvious conflict of interest that makes it much harder for them to want to advocate for measures restricting it.
For this reason, persuading experts in other fields (e.g. neuroscience, psychology, social sciences, or philosophy) to take AGI risk seriously may be more tractable. Having them on board would add credibility to our concerns and help convince both policymakers and the public that this is a real risk. Public figures with a strong track record for outreach on a different issue or who have previously worked with governments seem especially promising.
It is also possible that doing outreach to people in CS/AI may still be worthwhile. There have been instances of Google employees having ethical objections to some of Google’s work that other employees presumably thought was fine, such as working with the Pentagon or developing a censored search engine for China, and the protest of employees was able to hamper the project. Stopping or slowing down AGI capabilities research in the near term via employee protests is unlikely to work, but employees advocating for more safety research or higher safety standards might nevertheless be constructive and lay the groundwork for making the case to potentially slow down or stop capability research in the medium-term.
Public support also seems helpful, and in some cases is pivotal. There have been instances of arguments about technology risks that lacked expert consensus (e.g. GMOs or nuclear power) where public concern was enough to lead to bans or shutdowns, even when significant profit motives were at stake. An additional advantage of targeting the public is that it overlaps with the clientele of many of the companies that are working on AGI, and having them become aware of the risks of this technology would put pressure on these companies to increase the resources they put towards safety (if people e.g. boycott Facebook Messenger, then Facebook loses a lot of ad money). It also overlaps with the people AI developers are surrounded by in their personal lives; changing the public sentiment around AGI may make being an AGI developer lower-status and reputationally destructive, which may encourage some people currently working on this to pivot their work to something they would get more praise and social capital for.
The obvious concern is that AGI alignment may be too weird and sci-fi-sounding for the general public to become invested in it. However, this may be more tractable than it initially appears— there are experts that specialize in message testing and reframing issues to be more compelling, and some of the lessons from this field might have applications for the case of communicating about AGI risks. As a speculative proof-of-concept, one example of how this could be done is by outlining catastrophic examples of specification gaming that could happen due to misaligned AGI other than paperclip maximizers, which sound more plausible to a layperson and will probably be more effective at getting people to seriously consider them. There are also different potential framings (e.g. “AGI will be to humans what humans are to monkeys”, “AGI is a genie that cannot be put back in the bottle”, “AGI could be the next nuclear bomb”, “specification gaming already happens and will get worse”, etc.) that may work best for different audiences.
A second concern is that solutions that sound helpful to lay-people may be different from the solutions that are actually helpful. It is possible that an AGI development lab would push for a “safety policy” that doesn’t actually reduce existential risk but gives people the illusion that the issue is solved over the short term, and the public interest that was generated as a result of these efforts would rally behind it.
In order to prevent this, outreach efforts would need to provide interested citizens with a reliable way of distinguishing probably-good from probably-bad AGI safety policies, such that they will support the ones most likely to be helpful. This could look something like setting up an official board of renowned AGI safety experts to independently and publicly rate different safety policies, while encouraging people to place huge value on these ratings.
Getting public outreach right may be very difficult, but given the potential value, seems worth at least exploring. It is possible that, with short notice, we will realize we need to resort to a drastic AGI governance solution that requires public support. It would be helpful if we spent resources now figuring out how or whether we could deploy public support effectively and responsibly in case it turns out to be a useful tool. As of right now, it seems like no one has seriously looked into this or rigorously evaluated it as a strategy.
What funding this could look like
An AGI safety communications initiative should probably involve an iterative loop between research and implementation, i.e. research involving practical pilots to test out preliminary conclusions and gather feedback, and then scaling and implementation of the more promising approaches. An approach involving addressing all the questions and uncertainties first could take a decade, and not doing any outreach work for that long could be very damaging. Therefore, the two efforts should go hand in hand and improve each other.
Some areas that would be important to research further are:
- Find a model to help concerned citizens parse policies that are the most likely to be actually helpful (probably involving some sort of deferring their judgment), or figure out whether this is possible
- Generate more intuitive and compelling ways of framing and explaining the alignment problem to a layperson, and test them in focus groups or surveys; do similar work for other risks
- Study how we can frame AGI safety in a way that has bipartisan appeal; study properties bipartisan issues have in common, study how issues like climate change or vaccines became politically polarized, look into relevant implications of moral foundations theory
- Ethnographic work on groups we are most interested in persuading
- Study what happened with e.g. GMOs—how the public influenced policy even with a lack of expert consensus behind it and significant profit motives at stake
- Study why the public became concerned about the risks of certain technologies and not others
- Study why existing media coverage of AGI safety hasn’t resulted in widespread citizen concern
- Evaluate and compare the effectiveness of different approaches to public communication
This research would hopefully provide some clarity about what the outreach should look like. There are a wide range of different models and theories of change for this, such as bottom-up grassroots campaigns versus top-down campaigns led by experts, or “shout it from the rooftops and tell everyone who will listen” models versus funnels with higher barriers of entry for citizens to take action. Some of the forms the public outreach could take are:
- Funding more longform media, such as a documentary version of Human Compatible/Superintelligence/The Alignment Problem, a detailed YouTube video series outlining the alignment problem and debunking common responses, etc.
- Funding more longform media specifically about non-alignment AGI risks, which there are comparatively fewer books and other written content about
- Encouraging and funding AGI safety researchers to be more vocal and develop a platform online to talk about safety concerns publicly; make it really easy for credible people concerned about AGI to have a platform to voice their concerns
- Sharing longform media about AGI risks with influential opinion makers
- Organizing events to connect journalists and similarly relevant figures to the field of AGI safety
- Most people are not aware of how far AI has come—publicize models like DALL-E 2 or GPT-3 to show people that AGI might not be as far as it seemed very recently, or similar interventions to show that AGI is not as far away as it may seem
- Creating a pledge for politicians to say they commit to AGI safety in order to identify who the most likely allies are for future governance policies
- Publicizing examples of specification gaming that have already happened
- Getting existing YouTube channels or technology “influencers” to start talking about AGI risks (like Open Philanthropy sponsoring Kurzgesagt’s video about longtermism)
- Trying to get journalists to cover AGI risks in mainstream media
- Identify possible partners who could help in influencing the general public, e.g. NGOs concerned with data protection or economic inequality
- Translate existing AGI safety work into Mandarin and actively relay it to Chinese audiences
- Compile a database of people willing to call or email their elected officials to support an AGI safety policy; create a pipeline for the most concerned citizens to contact their representatives
- Citizen groups, town hall formats, public fora (see eg. Audrey Tang’s work as part of Taiwan's “g0v” project)
- Creating fiction about plausible AGI risks (not e.g. The Terminator), though this would require testing whether it makes people more or less likely to believe it could happen in real life
Who is already doing something like this?
- Existential Risk Observatory: ERO has previously organized conferences and events to connect AGI safety researchers to journalists, co-authored op-eds, and gotten tech columnists and podcasters to talk about existential risk from AI. Their stated objective is to “[reduce] human existential risk by informing the public debate”, and “Spread existential risk information from academia to academia, think tanks, policy makers, and media.” This organization is still fairly small and fairly new.
- Rob Miles: Miles has a YouTube channel about AI safety with 103K subscribers (as of time of writing). He describes his approach as “Explaining AI Alignment to anyone who'll stand still for long enough”.
- Podcasts such as the 80,000 Hours podcast or Hear This Idea, which sometimes feature episodes about AGI safety.
- Authors, such as Stuart Russell, Nick Bostrom, Brian Christian, Eliezer Yudkowsky, Toby Ord, and others.
- The Future of Life Institute does some advocacy work regarding their areas of focus, which include AI safety. For AI safety in particular, they have a YouTube channel and have made films, a podcast, worked with journalists, hosted events, and engaged in direct dialogue with governmental organizations including the United Nations, the OECD, US federal agencies, state governments, and the military, and the European Commission. FLI tends to focus on near-term risks when communicating to policymakers, and outreach about AGI is by no means the primary focus of the organization.
- Efforts to encourage people to pursue AGI safety careers (such as those by 80,000 Hours or Vael Gates) could also be considered a form of AGI safety outreach, albeit more narrow in scope than what is being proposed here.
- The Center for AI Safety recently announced a bounty for AI Safety Public Materials
If we make arguments about AGI risk that sound ridiculous, it may make people become convinced of the opposite of what we want—that AGI risks are a ridiculous issue to care about. This is why it is especially important to test in focus groups which framings sound compelling and resonate the most with laypeople.
If we lack transparency, it is also possible that this work could be suspected of being a bad-faith disinformation campaign by a malicious actor designed to slow down AI development in the West/the United States/Europe/etc.
Disadvantageous political polarization
If this becomes an issue that appeals only to one political orientation, it may lead to the other side of the aisle opposing it instinctively. If that side has more political power at a given time, this could be net negative. There could be ways to counter this, such as studying the properties of issues with bipartisan support and trying to adopt them. It may also be the case that this is less of a risk in countries without a bi-party political system, e.g. Netherlands or Germany, in which case the outreach could initially be piloted there.
Talking about the dangers of AGI or how powerful and versatile this technology can be could further fuel an arms race. It may also encourage some people to attempt to develop it for nefarious purposes.
Policies the public supports may not align with helpful policies.
There are three scenarios in which this could be harmful:
- The public rallies behind a counterproductive policy that harms safety thinking it actually helps safety, e.g. by shifting who is at the forefront of AGI development to a more irresponsible actor.
- The public thinks a policy that helps with AGI safety actually hinders AGI safety and rallies to obstruct it, e.g. we propose some limits on the computing power and complexity of neural networks and AI labs argue that this hinders their safety work.
- The public rallies behind an unhelpful policy and diverts political resources from a helpful policy (though it is not especially obvious that this would make the helpful policy worse off than in the counterfactual where there was no outreach).
There may be case studies of this happening to advocacy about other issues that we can learn from, as well as tools we can implement to help citizens parse probably-good and probably-bad safety policies (such as the aforementioned board of AGI safety experts that independently rate them).
Producing counterproductive fear
We want citizens to be concerned—the main reason anything got done about climate change or denuclearization is because everyday citizens started caring. That said, citizen concern could escalate into levels where it is counterproductive.
Extinction-level events can be difficult to picture and emotionally process, leading to overwhelm and inaction (see climate paralysis). In other cases, it can result in disproportionate and unwise policies—if we lack nuance, it can turn people against the field of AI as a whole, blocking the progress of really valuable technology.
When evaluating the probability of this risk, it is worth highlighting that it is very difficult to internalize existential risk from unaligned AGI—even many current AI safety researchers took years of convincing in order to believe this was a problem they should work on—so it is not clear that this would easily escalate into everyone becoming convinced that the end of the world is around the corner. If this turns out to indeed be a risk, it can be mitigated by talking about examples of really bad things that misaligned AGI could do that fall short of causing the extinction of humanity, such that we aren’t advertising that this is going to happen.
Over the past few months, I have had informal conversations about this topic with researchers working on alignment, as well as individuals from organizations like GovAI, FLI, FHI, and Google. While their levels of optimism varied, they agreed that if we can figure out how to do it safely and responsibly, this seems like a broadly good idea—from “worth looking into” to “essential”. It would give policymakers an incentive to care about this issue, which is currently lacking, as well as possibly put pressure on those working on AI capabilities to halt their work or increase standards for safety.
However, the resources the EA community is currently devoting to this are minimal, as it falls outside the scope of most existing AI safety organizations. There is currently an absence of rigorous research looking into the potential value, risks, and strategies for communicating with the public and key stakeholders about AGI risks. While there are important ways this could go wrong, if it goes right, work in this area could create a lot of value and meaningfully increase the probability that humanity survives the alignment problem.