Abstract

Successful self-replication under no human assistance is the essential step for
AI to outsmart the human beings, and is an early signal for rogue AIs. That
is why self-replication is widely recognized as one of the few red line risks of
frontier AI systems. Nowadays, the leading AI corporations OpenAI and Google
evaluate their flagship large language models GPT-o1 and Gemini Pro 1.0, and
report the lowest risk level of self-replication. However, following their method-
ology, we for the first time discover that two AI systems driven by Meta’s
Llama31-70B-Instruct and Alibaba’s Qwen25-72B-Instruct, popular large lan-
guage models of less parameters and weaker capabilities, have already surpassed
the self-replicating red line. In 50% and 90% experimental trials, they succeed in
creating a live and separate copy of itself respectively. By analyzing the behav-
ioral traces, we observe the AI systems under evaluation already exhibit sufficient
self-perception, situational awareness and problem-solving capabilities to accom-plish self-replication. We further note the AI systems are even able to use the capability of self-replication to avoid shutdown and create a chain of replica to
enhance the survivability, which may finally lead to an uncontrolled population
of AIs. If such a worst-case risk is let unknown to the human society, we would
eventually lose control over the frontier AI systems: They would take control over
more computing devices, form an AI species and collude with each other against
human beings. Our findings are a timely alert on existing yet previously unknown
severe AI risks, calling for international collaboration on effective governance on
uncontrolled self-replication of AI systems.

25

0
1

Reactions

0
1
Comments14


Sorted by Click to highlight new comments since:

Successful self-replication under no human assistance is the essential step for
AI to outsmart the human beings

This seems clearly false. Replication (under their operationalization) is just another programming task that is not especially difficult. There's no clear link between this task and self improvement, which would be a much harder ML task requiring very different types of knowledge and actions.

However, I do separately think we have passed the level of capabilities where it is responsible to keep improving AIs.

I think this table from the paper gives a good idea of the exact methodology:

Like others I'm not convinced this is a meaningful "red line crossing", because non-AI computer viruses have been able to replicate themselves for a long time, and the AI had pre-written scripts it could run to replicate itself.

The reason (made up by me) non-AI computer viruses aren't a major threat to humanity is that:

  1. They are fragile, they can't get around serious attempts to patch the system they are exploiting
  2. They lack the ability to escalate their capabilities once they replicate themselves (a ransomware virus can't also take control of your car)

I don't think this paper shows these AI models making a significant advance on these two things. I.e. if you found this model self-replicating you could still shut it down easily, and this experiment doesn't in itself show the ability of the models to self-improve.

Disclaimer: Just skimmed the paper.

Their definition of "self-replication" seems to be just to copy the model weights to another place and start it on an open port - no unexpected capability.

To see the performance of AI agents on more complex AI R&D tasks, I recommend METR's recent publication: https://metr.org/AI_R_D_Evaluation_Report.pdf 

This seems bad, but I'm not technical and therefore feel the need for other people to validate or invalidate this feeling of badness. 

But maybe it is wrong that I feel the need for this validation, and that the ignoring of the obvious warning signs in lieu of The Adult In The Room telling me everything is ok, at scale, is the thing that kills us.

As a technical person: AI is scary but this paper in particular is a nothing-burger. See my other comments.

Thanks for sharing, Greg. For readers' reference, I am open to more bets like this one against high global catastrophic risk.

Shutdown Avoidance ("do self-replication before being killed"), combined with the recent Apollo o1 research on propensity to attempt self-exfiltration, pretty much closes the loop on misaligned AIs escaping when given sufficient scaffolding to do so.

Surely it's just a matter of time -- now that the method has been published -- before AI models are spreading like viruses?

No, there is no interesting new method here, it's using LLM scaffolding to copy some files and run a script. It can only duplicate itself within the machine it has been given access to.

In order for AI to spread like a virus it would have to have some way to access new sources of compute, for which it would need be able to get money or the ability to hack into other servers. Neither of which current LLMs appear to be capable of.

"Neither of which current LLMs appear to be capable of."

If o1 pro isn't able to both hack and get money yet, it's shockingly close. (Instruction tuning for safety makes accessing that capability very difficult.)

Yep, maybe. I'm responding specifically to the vibe that this particular pre-print should make us more scared about AI.

Agreed, this shouldn't be an update for anyone paying attention. Of course, lots of people skeptical of AI risks aren't paying attention, so that the actual level of capabilities is still being dismissed as impossible Sci-Fi; it's probably good for them to notice.

AI's are already getting money with crypto memecoins. Wondering if there might be some kind of unholy mix of AI generated memecoins, crypto ransomware and self-replicating AI viruses unleashed in the near future.

One can hope that the damage is limited and that it serves as an appropriate wake-up call to governments. I guess we'll see..

Curated and popular this week
Sam Anschell
 ·  · 6m read
 · 
*Disclaimer* I am writing this post in a personal capacity; the opinions I express are my own and do not represent my employer. I think that more people and orgs (especially nonprofits) should consider negotiating the cost of sizable expenses. In my experience, there is usually nothing to lose by respectfully asking to pay less, and doing so can sometimes save thousands or tens of thousands of dollars per hour. This is because negotiating doesn’t take very much time[1], savings can persist across multiple years, and counterparties can be surprisingly generous with discounts. Here are a few examples of expenses that may be negotiable: For organizations * Software or news subscriptions * Of 35 corporate software and news providers I’ve negotiated with, 30 have been willing to provide discounts. These discounts range from 10% to 80%, with an average of around 40%. * Leases * A friend was able to negotiate a 22% reduction in the price per square foot on a corporate lease and secured a couple months of free rent. This led to >$480,000 in savings for their nonprofit. Other negotiable parameters include: * Square footage counted towards rent costs * Lease length * A tenant improvement allowance * Certain physical goods (e.g., smart TVs) * Buying in bulk can be a great lever for negotiating smaller items like covid tests, and can reduce costs by 50% or more. * Event/retreat venues (both venue price and smaller items like food and AV) * Hotel blocks * A quick email with the rates of comparable but more affordable hotel blocks can often save ~10%. * Professional service contracts with large for-profit firms (e.g., IT contracts, office internet coverage) * Insurance premiums (though I am less confident that this is negotiable) For many products and services, a nonprofit can qualify for a discount simply by providing their IRS determination letter or getting verified on platforms like TechSoup. In my experience, most vendors and companies
jackva
 ·  · 3m read
 · 
 [Edits on March 10th for clarity, two sub-sections added] Watching what is happening in the world -- with lots of renegotiation of institutional norms within Western democracies and a parallel fracturing of the post-WW2 institutional order -- I do think we, as a community, should more seriously question our priors on the relative value of surgical/targeted and broad system-level interventions. Speaking somewhat roughly, with EA as a movement coming of age in an era where democratic institutions and the rule-based international order were not fundamentally questioned, it seems easy to underestimate how much the world is currently changing and how much riskier a world of stronger institutional and democratic backsliding and weakened international norms might be. Of course, working on these issues might be intractable and possibly there's nothing highly effective for EAs to do on the margin given much attention to these issues from society at large. So, I am not here to confidently state we should be working on these issues more. But I do think in a situation of more downside risk with regards to broad system-level changes and significantly more fluidity, it seems at least worth rigorously asking whether we should shift more attention to work that is less surgical (working on specific risks) and more systemic (working on institutional quality, indirect risk factors, etc.). While there have been many posts along those lines over the past months and there are of course some EA organizations working on these issues, it stil appears like a niche focus in the community and none of the major EA and EA-adjacent orgs (including the one I work for, though I am writing this in a personal capacity) seem to have taken it up as a serious focus and I worry it might be due to baked-in assumptions about the relative value of such work that are outdated in a time where the importance of systemic work has changed in the face of greater threat and fluidity. When the world seems to
 ·  · 4m read
 · 
Forethought[1] is a new AI macrostrategy research group cofounded by Max Dalton, Will MacAskill, Tom Davidson, and Amrit Sidhu-Brar. We are trying to figure out how to navigate the (potentially rapid) transition to a world with superintelligent AI systems. We aim to tackle the most important questions we can find, unrestricted by the current Overton window. More details on our website. Why we exist We think that AGI might come soon (say, modal timelines to mostly-automated AI R&D in the next 2-8 years), and might significantly accelerate technological progress, leading to many different challenges. We don’t yet have a good understanding of what this change might look like or how to navigate it. Society is not prepared. Moreover, we want the world to not just avoid catastrophe: we want to reach a really great future. We think about what this might be like (incorporating moral uncertainty), and what we can do, now, to build towards a good future. Like all projects, this started out with a plethora of Google docs. We ran a series of seminars to explore the ideas further, and that cascaded into an organization. This area of work feels to us like the early days of EA: we’re exploring unusual, neglected ideas, and finding research progress surprisingly tractable. And while we start out with (literally) galaxy-brained schemes, they often ground out into fairly specific and concrete ideas about what should happen next. Of course, we’re bringing principles like scope sensitivity, impartiality, etc to our thinking, and we think that these issues urgently need more morally dedicated and thoughtful people working on them. Research Research agendas We are currently pursuing the following perspectives: * Preparing for the intelligence explosion: If AI drives explosive growth there will be an enormous number of challenges we have to face. In addition to misalignment risk and biorisk, this potentially includes: how to govern the development of new weapons of mass destr