Hide table of contents

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

Subscribe here to receive future versions.

Listen to the AI Safety Newsletter for free on Spotify.

This week’s key stories include: 

  • The UK, US, and Singapore have announced national AI safety institutions.
  • The UK AI Safety Summit concluded with a consensus statement, the creation of an expert panel to study AI risks, and a commitment to meet again in six months.
  • xAI, OpenAI, and a new Chinese startup released new models this week. 

UK, US, and Singapore Establish National AI Safety Institutions

Before regulating a new technology, governments often need time to gather information and consider their policy options. But during that time, the technology may diffuse through society, making it more difficult for governments to intervene. This process, termed the Collingridge Dilemma, is a fundamental challenge in technology policy.

But recently, several governments concerned about AI have enacted straightforward plans to meet this challenge. In the hopes of quickly gathering new information about AI risks, the United Kingdom, United States, and Singapore have all established new national bodies to empirically evaluate threats from AI systems and promote research and regulations on AI safety. 

The UK’s Foundation Model Taskforce becomes the UK AI Safety Institute. The UK’s AI safety organization has been through a bevy of names in its short life, from the Foundation Model Taskforce to the Frontier AI Taskforce and now the AI Safety Institute. But its purpose has always been the same: to evaluate, discuss, and mitigate AI risks. 

The UK AI Safety Institute is not a regulator and will not make government policy. Instead, it will focus on evaluating four key kinds of risks from AI systems: misuse, societal impacts, systems safety and security, and loss of control. Sharing information about AI safety will also be a priority, as done in their recent paper on risk management for frontier AI labs.

The US creates an AI Safety Institute within NIST. Following the recent executive order on AI, the White House has announced a new AI Safety Institute. It will be housed under the Department of Commerce in the National Institute for Standards and Technology (NIST).

The Institute aims to “facilitate the development of standards for safety, security, and testing of AI models, develop standards for authenticating AI-generated content, and provide testing environments for researchers to evaluate emerging AI risks and address known impacts.”

Funding has not been appropriated for this institute, so many have called for Congress to raise NIST’s budget. Currently, the agency only has about 20 employees working on emerging technologies and responsible AI.

Applications to join the new NIST Consortium to inform the AI Safety Institute are now being accepted. Organizations may apply here.

Singapore’s Generative AI Evaluation Sandbox. Mitigating AI risks will require the collaborative efforts of many different nations. So it’s encouraging to see Singapore, an Asian nation which has a strong relationship with China, establish its own body for AI evaluations.

Singapore’s IMDA has previously worked with Western nations on AI governance, such as by providing a crosswalk between their domestic AI testing framework with the American NIST AI RMF.

Singapore’s new Generative AI Evaluation Sandbox will bring together industry, academic, and non-profit actors to evaluate AI capabilities and risks. Their recent paper explicitly highlights the need for evaluations of extreme AI risks including weapons acquisition, cyber attacks, autonomous replication, and deception. 

UK Summit Ends with Consensus Statement and Future Commitments

The UK’s AI Summit wrapped up on Thursday with several key announcements. 

International expert panel on AI. Just as the UN IPCC summarizes scientific research on climate change to help guide policymakers, the UK has announced an international expert panel on AI to help establish consensus and guide policy on AI. Its work will be published in a “State of the Science” report before the next summit, which will be held in South Korea in six months.

Separately, eight leading AI labs agreed to give several governments early access to their models. OpenAI, Anthropic, Google Deepmind, and Meta are among the companies agreeing to share models for private testing ahead of public release.

US Secretary of Commerce Gina Raimondo and Chinese Vice Minister of Science and Technology Wu Zhaohu spoke at the UK AI Safety Summit.

The Bletchley Declaration. Twenty-eight governments, including China, signed the Bletchley Declaration, a document recognizing both short- and long-term risks of AI, as well as a need for international cooperation. It notes, “We are especially concerned by such risks in domains such as cybersecurity and biotechnology, as well as where frontier AI systems may amplify risks such as disinformation. There is potential for serious, even catastrophic, harm, either deliberate or unintentional, stemming from the most significant capabilities of these AI models.”

The declaration establishes an agenda for addressing risk but doesn’t set concrete policy goals. Further work is necessary to ensure continued collaboration both between different governments, as well as between governments and AI labs.

New Models From xAI, OpenAI, and a New Chinese Startup

Elon Musk’s xAI released its first language model, Grok. Elon Musk launched xAI in July. Given his potential access to compute, we speculated that xAI might be able to compete with leading AI labs like OpenAI and DeepMind. Four months later, Grok-1 represents the company’s first attempt to do so.

Grok-1 outcompetes GPT-3.5 across several standard capabilities benchmarks. While it can’t match leading labs’ latest models — such as GPT-4, PaLM-2, or Claude-2 — Grok-1 was also trained with significantly less data and compute. Grok-1’s efficiency and rapid development indicate that xAI’s bid to become a leading AI lab might soon be successful.

In the announcement, xAI committed to “work towards developing reliable safeguards against catastrophic forms of malicious use.” xAI has not released information about the model’s potential for misuse or hazardous capabilities.

Note: CAIS Director Dan Hendrycks is an advisor to xAI. 

 

OpenAI announces a flurry of new products. Nearly a year after the release of ChatGPT, OpenAI hosted its first in-person DevDay event to announce new products. None of this year’s products are as significant as GPT-3.5 or GPT-4, but are still a few notable updates.

Agentic AI systems which take actions to accomplish goals have been a focus for OpenAI this year. In March, the release of plugins allowed GPT to use external tools such as search engines, calculators, and coding environments. Now, OpenAI has released the Assistants API, which makes it easier for people to build AI agents that pursue goals by using plugin tools. The consumer version of this product is called GPTs and will allow anyone to create a chatbot with custom instructions and access to plugins.

This paper showed that GPT-3.5 can be fine-tuned to behave harmfully. OpenAI has since decided to allow some users to fine-tune GPT-4.

Some users will also be allowed to fine-tune GPT-4. This decision was made despite research showing that GPT-3.5’s safety guardrails can be removed via fine-tuning. OpenAI has not released details about their plan to mitigate this risk, but it’s possible that the closed source nature of their model will allow them to monitor customer accounts for suspicious behavior and block attempts at malicious use. 

Enterprise customers will also have the opportunity to work with OpenAI to train domain-specific versions of GPT-4, with prices starting at several million dollars. Additional products include GPT-4 Turbo, which is cheaper, faster, and has a longer context window than the original model, as well as new APIs for GPT-4V, text-to-speech models, and DALL·E 3. 

Additionally, if OpenAI’s customers are sued for using a product which was trained on copyrighted data, OpenAI has promised to cover their legal fees. 

New Chinese startup releases an open source LLM. Kai Fu Lee, previously the president of Google China, has founded a new AI startup called 01.AI. Seven months after its founding, the company has open sourced its first two models, Yi-7B and its larger companion Yi-34B. 

Yi-34B outperforms all other open source models on a popular set of benchmarks hosted by Hugging Face. It’s possible that these scores are artificially inflated, given that the benchmarks are public and the model could’ve been trained to memorize answers to the specific questions on the benchmarks. Some have pointed out that the model does not perform as well on other straightforward tests. 

See also: CAIS website, CAIS twitter, A technical safety research newsletter, An Overview of Catastrophic AI Risks, and our feedback form

Listen to the AI Safety Newsletter for free on Spotify.

Subscribe here to receive future versions.

Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
Paul Present
 ·  · 28m read
 · 
Note: I am not a malaria expert. This is my best-faith attempt at answering a question that was bothering me, but this field is a large and complex field, and I’ve almost certainly misunderstood something somewhere along the way. Summary While the world made incredible progress in reducing malaria cases from 2000 to 2015, the past 10 years have seen malaria cases stop declining and start rising. I investigated potential reasons behind this increase through reading the existing literature and looking at publicly available data, and I identified three key factors explaining the rise: 1. Population Growth: Africa's population has increased by approximately 75% since 2000. This alone explains most of the increase in absolute case numbers, while cases per capita have remained relatively flat since 2015. 2. Stagnant Funding: After rapid growth starting in 2000, funding for malaria prevention plateaued around 2010. 3. Insecticide Resistance: Mosquitoes have become increasingly resistant to the insecticides used in bednets over the past 20 years. This has made older models of bednets less effective, although they still have some effect. Newer models of bednets developed in response to insecticide resistance are more effective but still not widely deployed.  I very crudely estimate that without any of these factors, there would be 55% fewer malaria cases in the world than what we see today. I think all three of these factors are roughly equally important in explaining the difference.  Alternative explanations like removal of PFAS, climate change, or invasive mosquito species don't appear to be major contributors.  Overall this investigation made me more convinced that bednets are an effective global health intervention.  Introduction In 2015, malaria rates were down, and EAs were celebrating. Giving What We Can posted this incredible gif showing the decrease in malaria cases across Africa since 2000: Giving What We Can said that > The reduction in malaria has be
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f
LewisBollard
 ·  · 8m read
 · 
> How the dismal science can help us end the dismal treatment of farm animals By Martin Gould ---------------------------------------- Note: This post was crossposted from the Open Philanthropy Farm Animal Welfare Research Newsletter by the Forum team, with the author's permission. The author may not see or respond to comments on this post. ---------------------------------------- This year we’ll be sharing a few notes from my colleagues on their areas of expertise. The first is from Martin. I’ll be back next month. - Lewis In 2024, Denmark announced plans to introduce the world’s first carbon tax on cow, sheep, and pig farming. Climate advocates celebrated, but animal advocates should be much more cautious. When Denmark’s Aarhus municipality tested a similar tax in 2022, beef purchases dropped by 40% while demand for chicken and pork increased. Beef is the most emissions-intensive meat, so carbon taxes hit it hardest — and Denmark’s policies don’t even cover chicken or fish. When the price of beef rises, consumers mostly shift to other meats like chicken. And replacing beef with chicken means more animals suffer in worse conditions — about 190 chickens are needed to match the meat from one cow, and chickens are raised in much worse conditions. It may be possible to design carbon taxes which avoid this outcome; a recent paper argues that a broad carbon tax would reduce all meat production (although it omits impacts on egg or dairy production). But with cows ten times more emissions-intensive than chicken per kilogram of meat, other governments may follow Denmark’s lead — focusing taxes on the highest emitters while ignoring the welfare implications. Beef is easily the most emissions-intensive meat, but also requires the fewest animals for a given amount. The graph shows climate emissions per tonne of meat on the right-hand side, and the number of animals needed to produce a kilogram of meat on the left. The fish “lives lost” number varies significantly by