Hide table of contents

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

Subscribe here to receive future versions.


Both China and the US take action to regulate AI

Last week, regulators in both China and the US took aim at generative AI services. These actions show that China and the US are both concerned with AI safety. Hopefully, this is a sign they can eventually coordinate.

China’s new generative AI rules

On Thursday, China’s government released new rules governing generative AI. China’s new rules, which are set to take effect on August 15th, regulate publicly-available generative AI services. The providers of such services will be criminally liable for the content their services generate. 

The rules specify illegal content as:

“inciting subversion of national sovereignty or the overturn of the socialist system, endangering national security and interests or harming the nation’s image, inciting separatism or undermining national unity and social stability, advocating terrorism or extremism, promoting ethnic hatred and ethnic discrimination, violence and obscenity, as well as fake and harmful information [...]”

Providers are also expected to respect intellectual property rights, protect personal information, and avoid using discriminatory training data.

The rules are some of the strictest regulations of generative AI to date. In order to publicly release generative AI services, Chinese companies will need to be able to tightly control what their models generate. According to Helen Toner, the Director of Georgetown’s Center for Security and Emerging Technology (CSET), the rules represent “a pretty significant set of responsibilities, and will make it hard for smaller companies without an existing compliance and censorship apparatus to offer services.”

However, even large companies will face regulatory challenges. Baidu is still waiting on government approval to release its chatbot, Ernie. Baidu isn’t alone — currently no LLM is publicly available in China.

China is balancing regulation with promoting technological innovation. Critically, the rules only apply to publicly-available generative AI. They do not regulate research done by AI labs. This distinction represents a change from an earlier draft of the rules, which also regulated the “research and development” of generative AI. The earlier draft would have also required companies to verify the accuracy of data they used to train their models, which, given the huge amount of data used to train LLMs, would have been an impossible task.

In the face of a stalling recovery after the pandemic, China might have to rely on technological innovation to drive economic growth. China aims to become a world leader in AI by 2030. At the same time as regulators fear the dangers of AI, they want to keep Chinese AI development competitive with the US. The final draft of the rules on generative AI represents China’s attempt at balancing those interests.

The FTC investigates OpenAI

The FTC is investigating OpenAI over possible breaches of consumer protection law. On Thursday — the same day as China released its new rules — the Washington Post reported that the FTC had opened an investigation into OpenAI. The FTC is investigating whether OpenAI has engaged in unfair or deceptive practices that have caused "reputational harm" to consumers. It is also investigating a March security incident disclosed by OpenAI, where a system bug allowed some users to access payment-related information and other users' chat history.

The FTC sent a civil investigative demand to OpenAI, which requested documents pertaining to possible breaches of consumer protection law. If the FTC finds that OpenAI did breach consumer protection laws, it can compel OpenAI to reach an agreement known as a consent decree, in which OpenAI would change its practices with FTC oversight. If OpenAI does not comply with the agreement, the FTC could levy fines.

The FTC’s investigation represents the most serious challenge to generative AI services in the US to date. The US has lagged behind the rest of the world (notably the EU, and now China) in developing AI regulations. In the meantime, federal agencies have had to rely on existing legislation to regulate AI. Despite facing questions over its authority, the FTC has remained committed to regulating tech companies. It has warned that there is no "AI exemption" to consumer protection laws. 

The US and China are independently concerned with AI safety, and are both moving quickly to regulate private actors. Hopefully, they can eventually coordinate to avoid an AI arms race.

Results from a tournament forecasting AI risk

A new report presents the views of both AI experts and ‘superforecasters’ on AI risk. In this story, we summarize the key results and discuss their implications. 

The report pits AI experts against ‘superforecasters’. The Forecasting Research Institute (FRI) has released results from its 2022 Existential Risk Persuasion Tournament. In this tournament, both subject domain experts and ‘superforecasters’ predicted and debated the likelihoods of various catastrophic and existential risks. 

‘Superforecasting’ is a term coined by Philip Tetlock, who is one of the authors of FRI’s report. Tetlock’s prior research has found that certain individuals, dubbed 'superforecasters,' consistently outperform experts in forecasting many near-term geopolitical events.

Superforecasters and AI experts disagreed about AI risk. One key result from the report is that superforecasters predicted lower probabilities of catastrophic or existential events than domain experts, and particularly from AI. The median superforecaster gave a 2.31% risk of catastrophe and 0.38% risk of extinction due to AI by 2100. In contrast, the median AI domain expert gave a 12% risk of catastrophe and 3.9% risk of extinction.

In the tournament, neither superforecasters nor AI experts were able to persuade each other to change their probabilities. However, it might be argued that we should give more weight to the views of superforecasters. After all, they have made more accurate predictions than experts in Tetlock’s previous research. 

AI risk might not be “superforecastable.” Some of the methods that superforecasters use aren’t applicable to AI risk. One such method is the “outside view.” Rather than rely on their internal assessments, superforecasters identify a reference class of similar past events. For example, an expert might predict the outcome of a basketball game based on their in-depth knowledge of the players and coaches. In contrast, a superforecaster might predict the outcome by looking at the results of games between teams with similar records. 

The “outside view” approach is difficult or impossible to apply to AI risk because there is no clear reference class. Perhaps the outside view is to continue extrapolating economic trends. Another outside view is that we should zoom out and recognize we are in a biologically significant period; prior periods are the emergence of singular cellular life, then multicellular life, and now we are approaching the transition from biological to digital life. There is no backlog of AI catastrophes to learn from. In statistics, an AI catastrophe would be a “tail event” — a rare but incredibly impactful event. Nassim Nicholas Taleb has described many tail events as “black swans.” Black swans are almost impossible to be completely certain about. Because of their importance, we must prepare for them nonetheless.

In sum, we do not find this report reassuring. Despite their disagreement about probabilities, AI experts and superforecasts agreed that AI presented the greatest threat of human extinction of any of the risks.


Updates on xAI’s plan

On Friday, xAI held a Twitter Space in which gave details about its plan. 

xAI’s plan is to build “maximally-curious” AGI. During the call, Musk argued that the safest way to build an AGI is to be maximally-curious, rather than to align it to human values directly. He believes that a curious AGI would leave humanity alone because humanity is interesting. At the same time, a curious AGI would help satisfy xAI’s goal of “understanding the universe.”

However, it is not at all clear that a superintelligent AGI curious about humans would help them flourish. By analogy, “many scientists are curious about fruit flies, but this rarely ends well for the fruit flies.”

Meta releases Llama 2, open-source and commercially available

Meta has released Llama 2, its newest language model, for research and commercial use.

Llama 2 is now the most capable open-source language model. Llama 2 was trained with 40% more data than Llama 1, and its benchmark results indicate that it's now the most capable open-source language model.

Meta affirms its commitment to open-source models. Meta Together with the model, Meta released a statement and list of signatories in support of its approach to AI safety:

“We support an open innovation approach to AI. Responsible and open innovation gives us all a stake in the AI development process, bringing visibility, scrutiny and trust to these technologies. Opening today’s Llama models will let everyone benefit from this technology.”

In addition to being open-source, Llama 2 is available for commercial use. Businesses will be able to fine-tune and deploy the model for their own purposes. 

Open-source models increase the probability of misuse. Llama 2 will likely prove a valuable tool for AI safety researchers. However, equally likely is that it will be used maliciously. After the model parameters for Llama 1 were leaked online, Senators Hawley and Blumenthal released a letter criticizing Meta for the leak. In the letter, they write:

“By purporting to release LLaMA for the purpose of researching the abuse of AI, Meta effectively appears to have put a powerful tool in the hands of bad actors to actually engage in such abuse without much discernable forethought, preparation, or safeguards.”

Language model misuse isn’t hypothetical. Already, cybercriminals are using generative AI to launch email phishing attacks. By releasing Llama 2 open-source, Meta has either ignored this potential for misuse, or wagered that allowing misuse in the short term will contribute to AI safety in the long term.

Links:

See also: CAIS websiteCAIS twitterA technical safety research newsletter, and An Overview of Catastrophic AI Risks

Subscribe here to receive future versions.

Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f
 ·  · 8m read
 · 
In my past year as a grantmaker in the global health and wellbeing (GHW) meta space at Open Philanthropy, I've identified some exciting ideas that could fill existing gaps. While these initiatives have significant potential, they require more active development and support to move forward.  The ideas I think could have the highest impact are:  1. Government placements/secondments in key GHW areas (e.g. international development), and 2. Expanded (ultra) high-net-worth ([U]HNW) advising Each of these ideas needs a very specific type of leadership and/or structure. More accessible options I’m excited about — particularly for students or recent graduates — could involve virtual GHW courses or action-focused student groups.  I can’t commit to supporting any particular project based on these ideas ahead of time, because the likelihood of success would heavily depend on details (including the people leading the project). Still, I thought it would be helpful to articulate a few of the ideas I’ve been considering.  I’d love to hear your thoughts, both on these ideas and any other gaps you see in the space! Introduction I’m Mel, a Senior Program Associate at Open Philanthropy, where I lead grantmaking for the Effective Giving and Careers program[1] (you can read more about the program and our current strategy here). Throughout my time in this role, I’ve encountered great ideas, but have also noticed gaps in the space. This post shares a list of projects I’d like to see pursued, and would potentially want to support. These ideas are drawn from existing efforts in other areas (e.g., projects supported by our GCRCB team), suggestions from conversations and materials I’ve engaged with, and my general intuition. They aren’t meant to be a definitive roadmap, but rather a starting point for discussion. At the moment, I don’t have capacity to more actively explore these ideas and find the right founders for related projects. That may change, but for now, I’m interested in