Hide table of contents

Cross-posted from LessWrong

Summary: It seems at least possible that scaling AI systems (broadly construed) could create dangerously powerful agents. I consider methods to discourage groups from massively scaling AI systems with little regard for safety. Examining cultural, regulatory, and technological interventions, cultural approaches seem best suited to this goal in the near term.

Scale

For the purposes of this post, I am going to lump several things together when I talk about "scale". Some of the highly-scalable inputs to an AI system include:

  • Number of parameters
  • Training time
  • Total memory
  • Total compute
  • Dataset size
  • Total cost
  • Number of researchers
  • Total research effort

I'm not particularly concerned with the specific way that these resources can be deployed to increase AI capability, rather, it's important that there are inputs that can be increased arbitrarily in exchange for higher performance.

In other words, once the fundamentally-limited inputs to an AI system have been maxed out, further gains will be determined by the infinitely-scalable inputs. Things like "better prompting" or "clever architectures" can certainly increase capability, but at some point, the low-hanging fruit will be picked. In order to get higher performance, researchers will have to turn to scaling [1]. What happens when we turn these dials up to 11?

The Scaling Hypothesis

The scaling hypothesis is becoming an increasingly important view in the AI safety community; some posit that "scale is all you need" to create AGI.

Personally, I'm uncertain whether scaling will allow us to create AGI; though the fact that transformer models demonstrate emergent capabilities when scaled is certainly suggestive.

Because it's at least possible that scaling existing transformer models can lead to AGI, we should do something to prepare for the worst case scenario where the hypothesis is true.

Slowing down scaling

Assuming that the Scaling Hypothesis is true, what does it mean for AI safety? Since we don't have good solutions to the alignment problem yet, it's important to slow down scaling [2] to provide more time for alignment research, outreach, and coordination.

Let's look at a couple of broad interventions that might slow down scaling.

Culture

The idea here is to "frown upon" groups that massively scale AI systems with little regard for safety. Cultural norms may seem like a weak method to enforce rules, but a tight-knit research community has significant power over its members. The community can punish bad actors by discouraging new researchers from joining the group, reducing collaborations, or halting the flow of tacit knowledge to the group. Researchers involved in the work might suffer reputational damage for risky work and unscrupulous companies might see lower investment. Groups with a track record for safety might see a relative increase in applicant quality and receive more support from the AI community.

Regulation

Regulation could be used to limit the scale of models that companies use to train, withhold funding for risky projects, or restrict the publication of details related to massive scaling. Countries could enter international agreements limiting the scale and deployment of large AI models [3].

Technology

It may be possible to guide the development of AI technology via targeted technological development. For example, it may be possible to develop training paradigms which work well for small models but do not scale to larger models. Alternatively, it may be possible to create satisfactory AI that makes the development of more sophisticated models superfluous, guiding research towards smaller, safer models. Publishing specific open source software may help shape AI development towards less risky paradigms.

Which approach is best?

While I think that all 3 approaches should receive attention, cultural approaches seem most viable in the short term. For one, influencing culture is relatively cheap compared to conducting research or lobbying governments. Additionally, culture in a small research field can change quickly, much faster than it takes to change policy or to develop and deploy new technologies.

But most importantly, culture is far more adaptive than the other approaches. For example, if regulators produced a law limiting the total parameter count, researchers might switch to higher precision floating-point numbers to squeeze more performance out of the same number of parameters. It's extremely hard to craft loophole-free regulation and legislation is produced too slowly to keep up with developments in AI.

On the technology side, let's say that you invented AI accelerator hardware that can cheaply train a 10 billion parameter model, but doesn't scale well to 1 trillion parameter models. It's possible that researchers will find a way to ensemble many 10 billion parameter models to get performance equivalent to a 1 trillion parameter model. In general, it can be hard to predict how a particular technology will be used or whether it will achieve certain safety goals.

But cultural is much harder to thwart. Bad actors would have deceive an entire community of savvy researchers (potentially including their own team) and stop potential whistleblowers. This isn't impossible, but the difficulty and the costs of a bad reputation may be prohibitive.

Does slowing scaling help the bad guys?

One counterargument is that slowing scaling might only work on groups that are already concerned about AI-risk. This would give unscrupulous actors an upper hand, possibly increasing risk on net.

This is an important point which deserves further consideration. However, my initial guess is that efforts to slow scaling across the field will still slow unscrupulous actors. This is because research in different labs is complimentary; slowing scaling at DeepMind would also impede other groups since they rely on each other for insights.

That being said, a uniformly applied scaling slowdown may still create a relative advantage for risky researchers. Cultural approaches are best suited to deal with this problem. If a research group presses on in spite of warnings about the risk, the community can respond by discouraging new researchers from joining the offending group, halting collaborations, and limiting flows of tacit knowledge [4]. This should reduce any advantage that risky groups might enjoy.

Another problem is that these techniques might be used to selectively disadvantage specific groups unrelated to their safety profile. For example, there is a long history of large companies using government regulation to raise barriers to entry in order to reduce competition. Existing AI companies could lobby for additional safety regulations in order to block new entrants. This is another reason to be hesitant about using regulations to slow AI scaling. Fortunately, it seems less likely that the other approaches can be used to gain an unfair advantage.

It's unclear whether this possibility is enough to outweigh the benefits of slowing scaling, but the design of any of these methods should minimize their potential for abuse [5].

Conclusion

In addition to direct alignment work, the AI safety community should consider how to slow down AI scaling to buy more time. Of the approaches listed here, developing cultural norms against reckless scaling is the easiest, fastest, and most adaptive solution.

Future work should specify how to build consensus amongst the broader AI community via outreach to companies, scientists, and industry leaders. Widespread cultural norms against unsafe research practices can slow bad actors, foster coordination, and slow the development of AGI.

I also implore AI researchers to frown upon massive, reckless scaling of AI systems. Public discussion of safety concerns can help to punish bad actors and establish expectations for good practices in AI research.

Notes

  1. This is not to say that scaling is as simple as changing the number of parameters in a Python script. Continual scaling requires new techniques and increasingly specialized researchers. Steady Moore's-law-like improvements may seem automatic from the outside, but constant growth typically requires exponentially increasing resources in order to combat the loss of low-hanging fruit.

  2. For the rest of this post, I'm going to ignore the possibility of stopping scaling entirely since it seems unrealistic. If you like, you can think of stopping scaling entirely as a specific type of slowdown. In general, I am against such pivotal acts, but that's a discussion for another time.

  3. Though it may seem impossible to prevent risky research from occurring in private, there is some evidence that requirements of secrecy halt flows of tacit knowledge and limit the development of dangerous technologies (more on this in a future post).

  4. In the extreme, this policy would create an independent research group, ostracized from the community (though ideally major steps would be taken to avoid this outcome). At this point, cultural incentives are unlikely to have an effect. Nothing can stop completely independent actors, but the policies suggested here can still slow them.

  5. Regardless, attempts to slow AI scaling will probably not make the situation worse. Companies likely already use similar techniques to gain an advantage and it is unclear that independent efforts to slow scaling would give them a larger advantage. Even if these approaches were partially abused in order to gain an unfair advantage, they would still accomplish the goal of slowing down AI research.

9

0
0

Reactions

0
0

More posts like this

Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Dr Kassim
 ·  · 4m read
 · 
Hey everyone, I’ve been going through the EA Introductory Program, and I have to admit some of these ideas make sense, but others leave me with more questions than answers. I’m trying to wrap my head around certain core EA principles, and the more I think about them, the more I wonder: Am I misunderstanding, or are there blind spots in EA’s approach? I’d really love to hear what others think. Maybe you can help me clarify some of my doubts. Or maybe you share the same reservations? Let’s talk. Cause Prioritization. Does It Ignore Political and Social Reality? EA focuses on doing the most good per dollar, which makes sense in theory. But does it hold up when you apply it to real world contexts especially in countries like Uganda? Take malaria prevention. It’s a top EA cause because it’s highly cost effective $5,000 can save a life through bed nets (GiveWell, 2023). But what happens when government corruption or instability disrupts these programs? The Global Fund scandal in Uganda saw $1.6 million in malaria aid mismanaged (Global Fund Audit Report, 2016). If money isn’t reaching the people it’s meant to help, is it really the best use of resources? And what about leadership changes? Policies shift unpredictably here. A national animal welfare initiative I supported lost momentum when political priorities changed. How does EA factor in these uncertainties when prioritizing causes? It feels like EA assumes a stable world where money always achieves the intended impact. But what if that’s not the world we live in? Long termism. A Luxury When the Present Is in Crisis? I get why long termists argue that future people matter. But should we really prioritize them over people suffering today? Long termism tells us that existential risks like AI could wipe out trillions of future lives. But in Uganda, we’re losing lives now—1,500+ die from rabies annually (WHO, 2021), and 41% of children suffer from stunting due to malnutrition (UNICEF, 2022). These are preventable d
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f