Proposing the Conditional AI Safety Treaty (linkpost TIME)

Otto

This is a linkpost for https://time.com/7171432/conditional-ai-safety-treaty-trump/

Technological progress can excite us, politics can infuriate us, and wars can mobilize us. But faced with the risk of human extinction that the rise of artificial intelligence is causing, we have remained surprisingly passive. In part, perhaps this was because there did not seem to be a solution. This is an idea I would like to challenge.

AI’s capabilities are ever-improving. Since the release of ChatGPT two years ago, hundreds of billions of dollars have poured into AI. These combined efforts will likely lead to Artificial General Intelligence (AGI), where machines have human-like cognition, perhaps within just a few years.

Hundreds of AI scientists think we might lose control over AI once it gets too capable, which could result in human extinction. So what can we do?

The existential risk of AI has often been presented as extremely complex. A 2018 paper, for example, called the development of safe human-level AI a “super wicked problem.” This perceived difficulty had much to do with the proposed solution of AI alignment, which entails making superhuman AI act according to humanity’s values. AI alignment, however, was a problematic solution from the start.

First, scientific progress in alignment has been much slower than progress in AI itself. Second, the philosophical question of which values to align a superintelligence to is incredibly fraught. Third, it is not at all obvious that alignment, even if successful, would be a solution to AI’s existential risk. Having one friendly AI does not necessarily stop other unfriendly ones.

Because of these issues, many have urged technology companies not to build any AI that humanity could lose control over. Some have gone further; activist groups such as PauseAI have indeed proposed an international treaty that would pause development globally.

That is not seen as politically palatable by many, since it may still take a long time before the missing pieces to AGI are filled in. And do we have to pause already, when this technology can also do a lot of good? Yann Lecun, AI chief at Meta and prominent existential risk skeptic, says that the existential risk debate is like “worrying about turbojet safety in 1920.”

On the other hand, technology can leapfrog. If we get another breakthrough such as the transformer, a 2017 innovation which helped launch modern Large Language Models, perhaps we could reach AGI in a few months’ training time. That’s why a regulatory framework needs to be in place before then.

Fortunately, Nobel Laureate Geoffrey Hinton, Turing Award winner Yoshua Bengio, and many others have provided a piece of the solution. In a policy paper published in Science earlier this year, they recommended “if-then commitments”: commitments to be activated if and when red-line capabilities are found in frontier AI systems.

Building upon their work, we at the nonprofit Existential Risk Observatory propose a Conditional AI Safety Treaty. Signatory countries of this treaty, which should include at least the U.S. and China, would agree that once we get too close to loss of control they will halt any potentially unsafe training within their borders. Once the most powerful nations have signed this treaty, it is in their interest to verify each others’ compliance, and to make sure uncontrollable AI is not built elsewhere, either.

One outstanding question is at what point AI capabilities are too close to loss of control. We propose to delegate this question to the AI Safety Institutes set up in the U.K., U.S., China, and other countries. They have specialized model evaluation know-how, which can be developed further to answer this crucial question. Also, these institutes are public, making them independent from the mostly private AI development labs. The question of how close is too close to losing control will remain difficult, but someone will need to answer it, and the AI Safety Institutes are best positioned to do so.

We can mostly still get the benefits of AI under the Conditional AI Safety Treaty. All current AI is far below loss of control level, and will therefore be unaffected. Narrow AIs in the future that are suitable for a single task—such as climate modeling or finding new medicines—will be unaffected as well. Even more general AIs can still be developed, if labs can demonstrate to a regulator that their model has loss of control risk less than, say, 0.002% per year (the safety threshold we accept for nuclear reactors). Other AI thinkers, such as MIT professor Max Tegmark, Conjecture CEO Connor Leahy, and ControlAI director Andrea Miotti, are thinking in similar directions.

Fortunately, the existential risks posed by AI are recognized by many close to President-elect Donald Trump. His daughter Ivanka seems to see the urgency of the problem. Elon Musk, a critical Trump backer, has been outspoken about the civilizational risks for many years, and recently supported California’s legislative push to safety-test AI. Even the right-wing Tucker Carlson provided common-sense commentary when he said: “So I don’t know why we’re sitting back and allowing this to happen, if we really believe it will extinguish the human race or enslave the human race. Like, how can that be good?” For his part, Trump has expressed concern about the risks posed by AI, too.

The Conditional AI Safety Treaty could provide a solution to AI’s existential risk, while not unnecessarily obstructing AI development right now. Getting China and other countries to accept and enforce the treaty will no doubt be a major geopolitical challenge, but perhaps a Trump government is exactly what is needed to overcome it.

A solution to one of the toughest problems we face—the existential risk of AI—does exist. It is up to us whether we make it happen, or continue to go down the path toward possible human extinction.

The title of this piece has been adapted to increase clarity for a different audience

12 Reactions

More posts like this

Comments6

Sorted by

New & upvoted

Click to highlight new comments since: Today at 8:31 AM

harfeNov 15 20242

One outstanding question is at what point AI capabilities are too close to loss of control. We propose to delegate this question to the AI Safety Institutes set up in the U.K., U.S., China, and other countries.

I consider it clickbait if you write "There Is a Solution", but then say that there are these AI safety institutes that will figure out the crucial details of the solution some time in the future.

Toby Tremlett🔹Nov 15 20245

To be fair, this is a linkpost, and the norm is often to use the heading from the piece wherever else it is published. Time magazine's norms are probably a bit more click-bait pro. I'll dm Otto though with some more Forum friendly titles.

OttoNov 15 20244

Thanks for your comment.

I changed the title, the original one came from TIME. Still, we do believe there is a solution to existential risk. What we want to do is outlining the contours of such a solution. A lot has to be filled in by others, including the crucial question of when to pause. We acknowledge this in the piece.

Matrice Jacobine🔸🏳️‍⚧️Nov 15 20241

Fortunately, the existential risks posed by AI are recognized by many close to President-elect Donald Trump. His daughter Ivanka seems to see the urgency of the problem. Elon Musk, a critical Trump backer, has been outspoken about the civilizational risks for many years, and recently supported California’s legislative push to safety-test AI. Even the right-wing Tucker Carlson provided common-sense commentary when he said: “So I don’t know why we’re sitting back and allowing this to happen, if we really believe it will extinguish the human race or enslave the human race. Like, how can that be good?” For his part, Trump has expressed concern about the risks posed by AI, too.

This is a strange contrast from the rest of the article, considering both Donald and Ivanka Trump's positions are largely informed by the "situational awareness" position arguing that the US should develop AGI before China to ensure US victory over China – which is explicitly the position Tegmark and Leahy argue against (and consider existentially harmful) when they call to stop work on AGI and work on international co-operation to restrict it and develop tool AI instead.

I still see this kind of confusion between the two positions a fair bit and it is extremely strange. It's like if back in the original Cold War people couldn't tell the difference between anti-communist hawks and the Bulletin of the Atomic Scientists (let alone anti-war hippies) because technically they both considered nuclear arms race to be very important for the future of humanity.

OttoNov 15 20242

I'm aware and I don't disagree. However, in xrisk, many (not all) of those who are most worried are also most bullish about capabilities. Reversely, many (not all) who are not worried are unimpressed with capabilities. Being aware of the concept of AGI, that it may be coming soon, and of how impactful it could be, is in practice often a first step towards becoming concerned about the risks, too. This is not true for everyone unfortunately. Still, I would say that at least for our chances to get an international treaty passed, it is perhaps hopeful that the power of AGI is on the radar of leading politicians (although this may also increase risk through other paths).

Matrice Jacobine🔸🏳️‍⚧️Nov 16 20242

I don't think that's true at all. The effective accelerationists and the (to coin a term) AI hawks are major factions in the conflict over AI. I think you could argue they aren't bullish enough about the full extent of the capabilities of AGI (except for the minority of extinctionist Landians, this is partly true) – in which case the Trumps aren't bullish enough either. As @Garrison noted here, prominent Republicans like Ted Cruz and JD Vance himself are already explicitly hostile to AI safety.