Hide table of contents

I am excited about AI developers implementing responsible scaling policies; I’ve recently been spending time refining this idea and advocating for it. Most people I talk to are excited about RSPs, but there is also some uncertainty and pushback about how they relate to regulation. In this post I’ll explain my views on that:

  • I think that sufficiently good responsible scaling policies could dramatically reduce risk, and that preliminary policies like Anthropic’s RSP meaningfully reduce risk by creating urgency around key protective measures and increasing the probability of a pause if those measures can’t be implemented quickly enough.
  • I don’t think voluntary implementation of responsible scaling policies is a substitute for regulation. Voluntary commitments are unlikely to be universally adopted or to have adequate oversight, and I think the public should demand a higher degree of safety than AI developers are likely to voluntarily implement.
  • I think that developers implementing responsible scaling policies now increases the probability of effective regulation. If I instead thought it would make regulation harder, I would have significant reservations.
  • Transparency about RSPs makes it easier for outside stakeholders to understand whether an AI developer’s policies are adequate to manage risk, and creates a focal point for debate and for pressure to improve.
  • I think the risk from rapid AI development is very large, and that even very good RSPs would not completely eliminate that risk. A durable, global, effectively enforced, and hardware-inclusive pause on frontier AI development would reduce risk further. I think this would be politically and practically challenging and would have major costs, so I don’t want it to be the only option on the table. I think implementing RSPs can get most of the benefit, is desirable according to a broader set of perspectives and beliefs, and helps facilitate other effective regulation.

Why I’m excited about RSPs

I think AI developers are not prepared to work with very powerful AI systems. They don’t have the scientific understanding to deploy superhuman AI systems without considerable risk, and they do not have the security or internal controls to even safely train such models.

If protective measures didn’t improve then I think the question would be when rather than if development should be paused. I think the safest action in an ideal world would be pausing immediately until we were better prepared (though see the caveats in the next section). But the current level of risk is low enough that I think it is defensible for companies or countries to continue AI development if they have a sufficiently good plan for detecting and reacting to increasing risk.

If AI developers make these policies concrete and state them publicly, then I believe it puts the public and policymakers in a better place to understand what those policies are and to debate whether they are adequate. And I think the case for companies taking this action is quite strong—AI systems may continue to improve quickly, and a vague promise to improve safety at some unspecified future time isn’t enough.

I think that a good RSP will lay out specific conditions under which further development would need to be paused. Even though the goal is to avoid ever ending up in that situation, I think it’s important for developers to take the possibility seriously, to plan for it, and to be transparent about it with stakeholders.

Thoughts on an AI pause

If the world were unified around the priority of minimizing global catastrophic risk, I think that we could reduce risk significantly further by implementing a global, long-lasting, and effectively enforced pause on frontier AI development—including a moratorium on the development and production of some types of computing hardware. The world is not unified around this goal; this policy would come with other significant costs and currently seems unlikely to be implemented without much clearer evidence of serious risk. 

A unilateral pause on large AI training runs in the West, without a pause on new computing hardware, would have more ambiguous impacts on global catastrophic risk. The primary negative effects on risk are leading to faster catch-up growth in a later period with more hardware and driving AI development into laxer jurisdictions.

However, if governments shared my perspective on risk then I think they should already be implementing domestic policies that will often lead to temporary pauses or slowdowns in practice. For example, they might require frontier AI developers to implement additional protective measures before training larger models than those that exist today, and some of those protective measures may take a fairly long time (such as major improvements in risk evaluations or information security). Or governments might aim to limit the rate at which effective training compute of frontier models grows, in order to provide a smoother ramp for society to adapt to AI and to limit the risk of surprises.

I expect RSPs to help facilitate effective regulation

Regardless of whether risk mitigation takes the form of responsible scaling policies or something else, I think voluntary action by companies isn’t enough. If the risk is large then the most realistic approach is regulation and eventually international coordination. In reality I think the expected risk is large enough (including some risk of a catastrophe surprisingly soon) that a sufficiently competent state would implement regulation immediately.

I believe that AI developers implementing RSPs will make it easier rather than harder to implement effective regulation. RSPs provide a clear path to iteratively improving policy; they provide information about existing practices that can inform or justify regulation; and they build momentum around and legitimize the idea that serious precautions can be necessary for safe development. They are also a step towards building out the procedures and experience that would be needed to make many forms of regulation effective.

I’m not an expert in this area, and my own decisions are mostly guided by a desire to offer my honest assessments of the effects of different policies. That said, my impression from interacting with people who have more policy expertise is that they broadly agree that RSPs are likely to help rather than hurt efforts to implement effective regulation. I have mostly seen voluntary RSPs discussed, and have advocated for them, in contexts where it appears the most likely alternative is less rather than more action.

Anthropic’s RSP

I believe that Anthropic’s RSP is a significant step in the right direction. I would like to see pressure on other developers to implement policies that are at least this good, though I think there is a long way to go from there to an ideal RSP.

Some components I found particularly valuable:

  • Specifying a concrete set of evaluation results that would cause them to move to ASL-3. I think having concrete thresholds by which concrete actions must be taken is important, and I think the proposed threshold is early enough to trigger before an irreversible catastrophe with high probability (well over 90%).
  • Making a concrete statement about security goals at ASL-3—“non-state actors are unlikely to be able to steal model weights, and advanced threat actors (e.g. states) cannot steal them without significant expense”—and describing security measures they expect to take to meet this goal.
  • Requiring a definition and evaluation protocol for ASL-4 to be published and approved by the board before scaling past ASL-3.
  • Providing preliminary guidance about conditions that would trigger ASL-4 and the necessary protective measures to operate at ASL-4 (including security against motivated states, which I expect to be extremely difficult to achieve, and an affirmative case for safety that will require novel science).

Some components I hope will improve over time:

  • The flip side of specifying concrete evaluations right now is that they are extremely rough and preliminary. I think it is worth working towards better evaluations with a clearer relationship to risk.
  • In order for external stakeholders to have confidence in Anthropic’s security I think it will take more work to lay out appropriate audits and red teaming. To my knowledge this work has not been done by anyone and will take time. 
  • The process for approving changes to the RSP is publication and approval by the board. I think this ensures a decision will be made deliberately and is much better than nothing, but it would be better to have effective independent oversight.
  • To the extent that it’s possible to provide more clarity about ASL-4, doing so would be a major improvement by giving people a chance to examine and debate conditions for that level. To the extent that it’s not, it would be desirable to provide more concreteness about a review or decision-making process for deciding whether a given set of safety, security, and evaluation measures is adequate. 

I’m excited to see criticism of RSPs that focuses on concrete ways in which they fail to manage risk. Such criticism can help (i) push AI developers to do better, and (ii) argue to policy makers that we need regulatory requirements stronger than existing RSPs. That said, I think it is significantly better to have an RSP than to not have one, and don’t think that point should be lost in the discussion.

On the name “responsible scaling”

I believe that a very good RSP (of the kind I've been advocating for) could cut risk dramatically if implemented effectively, perhaps a 10x reduction. In particular, I think we will probably have stronger signs of dangerous capabilities before something catastrophic happens, and that realistic requirements for protective measures can probably lead to us either managing that risk or pausing when our protective measures are more clearly inadequate. This is a big enough risk reduction that my primary concern is about whether developers will actually adopt good RSPs and implement them effectively.

That said, I believe that even cutting risk by 10x still leaves us with a lot of risk; I think it’s reasonable to complain that private companies causing a 1% risk of extinction is not “responsible.” I also think the basic idea of RSPs should be appealing to people with a variety of views about risk, and a more pessimistic person might think that even if all developers implement very good RSPs there is still a 10%+ risk of a global catastrophe.

On the one hand, I think it’s good for AI developers to make and defend the explicit claim that they are developing the technology in a responsible way, and to be vulnerable to pushback when they can’t defend that claim. On the other hand, I think it’s bad if calling scaling “responsible” gives (or looks like an attempt to give) a false sense of security, whether about the remaining catastrophic risk or about social impacts beyond catastrophic risk.

So “responsible scaling policy” may not be the right name. I think the important thing is the substance: developers should clearly lay out a roadmap for the relationship between dangerous capabilities and necessary protective measures, should describe concrete procedures for measuring dangerous capabilities, and should lay out responses if capabilities pass dangerous limits without protective measures meeting the roadmap.

Comments5


Sorted by Click to highlight new comments since:

I kept responding in private conversations on Paul’s arguments, to a point that I decided to share my comments here.

  1. The hardware overhang argument has poor grounding.

Labs scaling models results in more investment in producing more GPU chips with more flops (see Sam Altman’s play for the UAE chip factory) and less latency between (see the EA start-up Fathom Radiant, which started up offering fibre-optic-connected supercomputers for OpenAI and now probably shifted to Anthropic).

The increasing levels of model combinatorial complexity and outside signal connectivity become exponentially harder to keep safe. So the only viable pathway is not scaling that further, rather than “helplessly” take all the hardware that currently gets produced.

Further, AI Impacts found no historical analogues for a hardware overhang. And plenty of common sense reasons why the argument’s premises are unsound.

The hardware overhang claim lacks grounding, but that hasn’t prevented alignment researchers from repeating it in a way that ends up weakening coordination efforts to restrict AI corporations.

  1. Responsible scaling policies have ‘safety-washing’ spelled all over them.

Consider the original formulation by Anthropic: “Our RSP focuses on catastrophic risks – those where an AI model directly causes large scale devastation.”

In other words: our company can scale on as long as our staff/trustees do not deem the risk of a new AI model directly causing a catastrophe as sufficiently high.

Is that responsible?

It’s assuming that further scaling can be risk managed. It’s assuming that just risk management protocols are enough.

Then, the company invents a new wonky risk management framework, ignoring established and more comprehensive practices.

Paul argues that this could be the basis for effective regulation. But Anthropic et al. lobbying national governments to enforce the use of that wonky risk management framework makes things worse.

It distracts from policy efforts to prevent the increasing harms. It creates a perception of safety (instead of actually ensuring safety).

Ideal for AI corporations to keep scaling and circumvent being held accountable.

RSPs support regulatory capture. I want us to become clear about what we are dealing with.

Paul - you wrote that 'If the world were unified around the priority of minimizing global catastrophic risk, I think that we could reduce risk significantly further by implementing a global, long-lasting, and effectively enforced pause on frontier AI development—including a moratorium on the development and production of some types of computing hardware. The world is not unified around this goal....'

I think that underestimates the current public consensus and concerns about AI risk. The polls I've seen suggest widespread public hostility to AGI development, and skepticism about the AI industry's capacity to manage AI development safely. Indeed, the public sentiment seems much closer to that of AI Safety experts (eg within EA), than it does to the views of AI industry insiders (such as Yann LeCun), or to e/acc people who yearn for 'the Singularity'.

I'm still digesting the implications of these opinion polls, but I think they should nudge EAs towards a fairly significant updating on our expectations about the role that the public could play in supporting an AI Pause. It's worth remembering that the public has seen depictions of dangerous AI in novels, movies, and TV series ever since the 1927 movie 'Metropolis'' (or, arguably, maybe even since the 1818 novel 'Frankenstein'). Ordinary folks are primed to understand that AI is very risky. They might not understand the details of technical AI alignment, or RSPs, or LLMs, or deep learning. But the political will seems to be there to support an AI Pause.

My worry is that we EAs have spent so many years assuming that the public can't understand AI risks, that we're still pushing ahead on technical and policy solutions, because that's what we're used to doing. And we assume the political will isn't there to do anything more significant and binding in reducing X risk. But perhaps the public will really is there.

Executive summary: The post discusses the importance of responsible scaling policies (RSPs) in AI development, their relationship with regulation, and their potential to reduce risks associated with powerful AI systems.

Key points

  1. Responsible scaling policies (RSPs) play a crucial role in mitigating the risks associated with rapid AI development, as developers may not have the expertise or controls to handle superhuman AI systems safely. 
  2. RSPs create transparency and clear conditions under which AI development should be paused, improving public understanding and debate about AI safety measures.
  3. While RSPs are important, they are not a substitute for regulation, as voluntary commitments may lack universality and oversight.
  4. Implementing RSPs can increase the likelihood of effective regulation and provide a path for iterative policy improvements.
  5. The post acknowledges that even with RSPs, significant risks remain in rapid AI development, leading to the potential need for a global, hardware-inclusive pause in AI development. 
  6. The author suggests that RSPs, like Anthropic's, can reduce risk and create a framework for measuring and improving AI safety but still need refinement and audits.

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

I'm curating this post. There have been several recent posts on the theme of RSPs. I'm featuring this one, but I recommend the other two posts to readers.

I particularly like that these posts mention that they view these policies as good for eventual regulation, and are willing to be clear about this.

KArax
-23
0
1
Curated and popular this week
Paul Present
 ·  · 28m read
 · 
Note: I am not a malaria expert. This is my best-faith attempt at answering a question that was bothering me, but this field is a large and complex field, and I’ve almost certainly misunderstood something somewhere along the way. Summary While the world made incredible progress in reducing malaria cases from 2000 to 2015, the past 10 years have seen malaria cases stop declining and start rising. I investigated potential reasons behind this increase through reading the existing literature and looking at publicly available data, and I identified three key factors explaining the rise: 1. Population Growth: Africa's population has increased by approximately 75% since 2000. This alone explains most of the increase in absolute case numbers, while cases per capita have remained relatively flat since 2015. 2. Stagnant Funding: After rapid growth starting in 2000, funding for malaria prevention plateaued around 2010. 3. Insecticide Resistance: Mosquitoes have become increasingly resistant to the insecticides used in bednets over the past 20 years. This has made older models of bednets less effective, although they still have some effect. Newer models of bednets developed in response to insecticide resistance are more effective but still not widely deployed.  I very crudely estimate that without any of these factors, there would be 55% fewer malaria cases in the world than what we see today. I think all three of these factors are roughly equally important in explaining the difference.  Alternative explanations like removal of PFAS, climate change, or invasive mosquito species don't appear to be major contributors.  Overall this investigation made me more convinced that bednets are an effective global health intervention.  Introduction In 2015, malaria rates were down, and EAs were celebrating. Giving What We Can posted this incredible gif showing the decrease in malaria cases across Africa since 2000: Giving What We Can said that > The reduction in malaria has be
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f
LewisBollard
 ·  · 8m read
 · 
> How the dismal science can help us end the dismal treatment of farm animals By Martin Gould ---------------------------------------- Note: This post was crossposted from the Open Philanthropy Farm Animal Welfare Research Newsletter by the Forum team, with the author's permission. The author may not see or respond to comments on this post. ---------------------------------------- This year we’ll be sharing a few notes from my colleagues on their areas of expertise. The first is from Martin. I’ll be back next month. - Lewis In 2024, Denmark announced plans to introduce the world’s first carbon tax on cow, sheep, and pig farming. Climate advocates celebrated, but animal advocates should be much more cautious. When Denmark’s Aarhus municipality tested a similar tax in 2022, beef purchases dropped by 40% while demand for chicken and pork increased. Beef is the most emissions-intensive meat, so carbon taxes hit it hardest — and Denmark’s policies don’t even cover chicken or fish. When the price of beef rises, consumers mostly shift to other meats like chicken. And replacing beef with chicken means more animals suffer in worse conditions — about 190 chickens are needed to match the meat from one cow, and chickens are raised in much worse conditions. It may be possible to design carbon taxes which avoid this outcome; a recent paper argues that a broad carbon tax would reduce all meat production (although it omits impacts on egg or dairy production). But with cows ten times more emissions-intensive than chicken per kilogram of meat, other governments may follow Denmark’s lead — focusing taxes on the highest emitters while ignoring the welfare implications. Beef is easily the most emissions-intensive meat, but also requires the fewest animals for a given amount. The graph shows climate emissions per tonne of meat on the right-hand side, and the number of animals needed to produce a kilogram of meat on the left. The fish “lives lost” number varies significantly by