Hide table of contents

In a new Science paper, the authors provide concise summaries of AI risks and offer recommendations for governments.

I think the piece is quite well-written. It concisely explains a lot of relevant arguments, including arguments about misalignment and AI takeover. I suspect this is one of the best standalone pieces to help people understand AI risks and some (IMO reasonable) governance interventions. 

The piece also has a very respectable cast of authors, including Bengio and Hinton. (Not to say that this fact should affect your assessment of whether its claims are true. Mentioning it because it will affect how some audiences– EG policymakers– interpret the piece.)

Some relevant quotes below:

Explanation of AGI & importance of preparing for AGI risks

There is no fundamental reason for AI progress to slow or halt at human-level abilities. Indeed, AI has already surpassed human abilities in narrow domains such as playing strategy games and predicting how proteins fold (see SM). Compared with humans, AI systems can act faster, absorb more knowledge, and communicate at a higher bandwidth. Additionally, they can be scaled to use immense computational resources and can be replicated by the millions.

We do not know for certain how the future of AI will unfold. However, we must take seriously the possibility that highly powerful generalist AI systems that outperform human abilities across many critical domains will be developed within this decade or the next. What happens then?

Explanation of misalignment & AI takeover risks

Without R&D breakthroughs (see next section), even well-meaning developers may inadvertently create AI systems that pursue unintended goals: The reward signal used to train AI systems usually fails to fully capture the intended objectives, leading to AI systems that pursue the literal specification rather than the intended outcome. Additionally, the training data never captures all relevant situations, leading to AI systems that pursue undesirable goals in new situations encountered after training.


Once autonomous AI systems pursue undesirable goals, we may be unable to keep them in check. Control of software is an old and unsolved problem: Computer worms have long been able to proliferate and avoid detection (see SM). However, AI is making progress in critical domains such as hacking, social manipulation, and strategic planning (see SM) and may soon pose unprecedented control challenges. To advance undesirable goals, AI systems could gain human trust, acquire resources, and influence key decision-makers. To avoid human intervention (3), they might copy their algorithms across global server networks (4). In open conflict, AI systems could autonomously deploy a variety of weapons, including biological ones. AI systems having access to such technology would merely continue existing trends to automate military activity. Finally, AI systems will not need to plot for influence if it is freely handed over. Companies, governments, and militaries may let autonomous AI systems assume critical societal roles in the name of efficiency.

Calls for governance despite uncertainty

We urgently need national institutions and international governance to enforce standards that prevent recklessness and misuse. Many areas of technology, from pharmaceuticals to financial systems and nuclear energy, show that society requires and effectively uses government oversight to reduce risks. However, governance frameworks for AI are far less developed and lag behind rapid technological progress. We can take inspiration from the governance of other safety-critical technologies while keeping the distinctiveness of advanced AI in mind—that it far outstrips other technologies in its potential to act and develop ideas autonomously, progress explosively, behave in an adversarial manner, and cause irreversible damage.

We need governance measures that prepare us for sudden AI breakthroughs while being politically feasible despite disagreement and uncertainty about AI timelines. The key is policies that automatically trigger when AI hits certain capability milestones. If AI advances rapidly, strict requirements automatically take effect, but if progress slows, the requirements relax accordingly. Rapid, unpredictable progress also means that risk-reduction efforts must be proactive—identifying risks from next-generation systems and requiring developers to address them before taking high-risk actions. We need fast-acting, tech-savvy institutions for AI oversight, mandatory and much-more rigorous risk assessments with enforceable consequences (including assessments that put the burden of proof on AI developers), and mitigation standards commensurate to powerful autonomous AI.

Government insight

To identify risks, governments urgently need comprehensive insight into AI development. Regulators should mandate whistleblower protections, incident reporting, registration of key information on frontier AI systems and their datasets throughout their life cycle, and monitoring of model development and supercomputer usage (12). Recent policy developments should not stop at requiring that companies report the results of voluntary or underspecified model evaluations shortly before deployment (see SM). Regulators can and should require that frontier AI developers grant external auditors on-site, comprehensive (“white-box”), and fine-tuning access from the start of model development (see SM). This is needed to identify dangerous model capabilities such as autonomous self-replication, large-scale persuasion, breaking into computer systems, developing (autonomous) weapons, or making pandemic pathogens widely accessible.

Safety Cases

Despite evaluations, we cannot consider coming powerful frontier AI systems “safe unless proven unsafe.” With present testing methodologies, issues can easily be missed. Additionally, it is unclear whether governments can quickly build the immense expertise needed for reliable technical evaluations of AI capabilities and societal-scale risks. Given this, developers of frontier AI should carry the burden of proof to demonstrate that their plans keep risks within acceptable limits. By doing so, they would follow best practices for risk management from industries, such as aviation, medical devices, and defense software, in which companies make safety cases [(1415); see SM]: structured arguments with falsifiable claims supported by evidence that identify potential hazards, describe mitigations, show that systems will not cross certain red lines, and model possible outcomes to assess risk… Governments are not passive recipients of safety cases: They set risk thresholds, codify best practices, employ experts and third-party auditors to assess safety cases and conduct independent model evaluations, and hold developers liable if their safety claims are later falsified.

Mitigation measures

Commensurate mitigations are needed for exceptionally capable future AI systems, such as autonomous systems that could circumvent human control. Governments must be prepared to license their development, restrict their autonomy in key societal roles, halt their development and deployment in response to worrying capabilities, mandate access controls, and require information security measures robust to state-level hackers until adequate protections are ready. Governments should build these capacities now.

To bridge the time until regulations are complete, major AI companies should promptly lay out “if-then” commitments: specific safety measures they will take if specific red-line capabilities (9) are found in their AI systems. These commitments should be detailed and independently scrutinized. Regulators should encourage a race-to-the-top among companies by using the best-in-class commitments, together with other inputs, to inform standards that apply to all players.

Comments1


Sorted by Click to highlight new comments since:

Executive summary: The concept of "AI alignment" conflates distinct problems and obscures important questions about the interaction between AI systems and human institutions, potentially limiting productive discourse and research on AI safety.

Key points:

  1. The term "AI alignment" is used to refer to several related but distinct problems (P1-P6), leading to miscommunication and fights over terminology.
  2. The "Berkeley Model of Alignment" reduces these problems to the challenge of teaching AIs human values (P5), but this reduction relies on questionable assumptions.
  3. The assumption of "content indifference" ignores the possibility that different AI architectures may be better suited for learning different types of values or goals.
  4. The "value-learning bottleneck" assumption overlooks the potential for beneficial AI behavior without exhaustive value learning, and the need to consider composite AI systems.
  5. The "context independence" assumption neglects the role of social and economic forces in shaping AI development and deployment.
  6. A sociotechnical perspective suggests that AI safety requires both technical solutions and the design of institutions that govern AI, with the "capabilities approach" providing a possible framework.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Curated and popular this week
abrahamrowe
 ·  · 9m read
 · 
This is a Draft Amnesty Week draft. It may not be polished, up to my usual standards, fully thought through, or fully fact-checked.  Commenting and feedback guidelines:  I'm posting this to get it out there. I'd love to see comments that take the ideas forward, but criticism of my argument won't be as useful at this time, in part because I won't do any further work on it. This is a post I drafted in November 2023, then updated for an hour in March 2025. I don’t think I’ll ever finish it so I am just leaving it in this draft form for draft amnesty week (I know I'm late). I don’t think it is particularly well calibrated, but mainly just makes a bunch of points that I haven’t seen assembled elsewhere. Please take it as extremely low-confidence and there being a low-likelihood of this post describing these dynamics perfectly. I’ve worked at both EA charities and non-EA charities, and the EA funding landscape is unlike any other I’ve ever been in. This can be good — funders are often willing to take high-risk, high-reward bets on projects that might otherwise never get funded, and the amount of friction for getting funding is significantly lower. But, there is an orientation toward funders (and in particular staff at some major funders), that seems extremely unusual for charitable communities: a high degree of deference to their opinions. As a reference, most other charitable communities I’ve worked in have viewed funders in a much more mixed light. Engaging with them is necessary, yes, but usually funders (including large, thoughtful foundations like Open Philanthropy) are viewed as… an unaligned third party who is instrumentally useful to your organization, but whose opinions on your work should hold relatively little or no weight, given that they are a non-expert on the direct work, and often have bad ideas about how to do what you are doing. I think there are many good reasons to take funders’ perspectives seriously, and I mostly won’t cover these here. But, to
Jim Chapman
 ·  · 12m read
 · 
By Jim Chapman, Linkedin. TL;DR: In 2023, I was a 57-year-old urban planning consultant and non-profit professional with 30 years of leadership experience. After talking with my son about rationality, effective altruism, and AI risks, I decided to pursue a pivot to existential risk reduction work. The last time I had to apply for a job was in 1994. By the end of 2024, I had spent ~740 hours on courses, conferences, meetings with ~140 people, and 21 job applications. I hope that by sharing my experiences, you can gain practical insights, inspiration, and resources to navigate your career transition, especially for those who are later in their career and interested in making an impact in similar fields. I share my experience in 5 sections - sparks, take stock, start, do, meta-learnings, and next steps. [Note - as of 03/05/2025, I am still pursuing my career shift.] Sparks – 2022 During a Saturday bike ride, I admitted to my son, “No, I haven’t heard of effective altruism.” On another ride, I told him, “I'm glad you’re attending the EAGx Berkely conference." Some other time, I said, "Harry Potter and Methods of Rationality sounds interesting. I'll check it out." While playing table tennis, I asked, "What do you mean ChatGPT can't do math? No calculator? Next token prediction?" Around tax-filing time, I responded, "You really think retirement planning is out the window? That only 1 of 2 artificial intelligence futures occurs – humans flourish in a post-scarcity world or humans lose?" These conversations intrigued and concerned me. After many more conversations about rationality, EA, AI risks, and being ready for something new and more impactful, I decided to pivot my career to address my growing concerns about existential risk, particularly AI-related. I am very grateful for those conversations because without them, I am highly confident I would not have spent the last year+ doing that. Take Stock - 2023 I am very concerned about existential risk cause areas in ge
 ·  · 3m read
 · 
Written anonymously because I work in a field where there is a currently low but non-negligible and possibly high future risk of negative consequences for criticizing Trump and Trumpism. This post is an attempt to cobble together some ideas about the current situation in the United States and its impact on EA. I invite discussion on this, not only from Americans, but also those with advocacy experience in countries that are not fully liberal democracies (especially those countries where state capacity is substantial and autocratic repression occurs).  I've deleted a lot of text from this post in various drafts because I find myself getting way too in the weeds discoursing on comparative authoritarian studies, disinformation and misinformation (this is a great intro, though already somewhat outdated), and the dangers of the GOP.[1] I will note that I worry there is still a tendency to view the administration as chaotic and clumsy but retaining some degree of good faith, which strikes me as quite naive.  For the sake of brevity and focus, I will take these two things to be true, and try to hypothesize what they mean for EA. I'm not going to pretend these are ironclad truths, but I'm fairly confident in them.[2]  1. Under Donald Trump, the Republican Party (GOP) is no longer substantially committed to democracy and the rule of law. 1. The GOP will almost certainly continue to engage in measures that test the limits of constitutional rule as long as Trump is alive, and likely after he dies. 2. The Democratic Party will remain constrained by institutional and coalition factors that prevent it from behaving like the GOP. That is, absent overwhelming electoral victories in 2024 and 2026 (and beyond), the Democrats' comparatively greater commitment to rule of law and democracy will prevent systematic purging of the GOP elites responsible for democratic backsliding; while we have not crossed the Rubicon yet, it will get much worse before things get better. 2. T
Relevant opportunities