Hide table of contents

Cross-posted on LessWrong. This article is part of a series of ~10 posts comprising a 2024 State of the AI Regulatory Landscape Review, conducted by the Governance Recommendations Research Program at Convergence Analysis. Each post will cover a specific domain of AI governance, such as incident reportingsafety evals, model registries, and more. We’ll provide an overview of existing regulations, focusing on the US, EU, and China as the leading governmental bodies currently developing AI legislation. Additionally, we’ll discuss the relevant context behind each domain and conduct a short analysis.

This series is intended to be a primer for policymakers, researchers, and individuals seeking to develop a high-level overview of the current AI governance space. We’ll publish individual posts on our website and release a comprehensive report at the end of this series.

What are open-source models, and what are their effects on AI safety?

Some software developers choose to open-source their software; they freely share the underlying source code and allow anyone to use, modify, and deploy their work. This can encourage friendly collaboration and community-building, and has produced many popular pieces of software, including operating systems like Linux, programming languages and platforms like Python and Git, and many more.

Similarly, AI developers are open-sourcing their models and algorithms, though the details can vary. Generally, open-sourcing of AI models involves some combination of:

  • Sharing the model weights. These are the specific parameters that make the model function, and are set during training. If these are shared, others can reconstruct the model without doing their own training, which is the most expensive part of developing such AI.
  • Sharing the training data used to train the model. 
  • Sharing the underlying source code. 
  • Licensing for free commercial usage.

For example, Meta released  the model weights of their LLM, Llama 2, but not their training code, methodology, original datasets, or model architecture details. In their excellent article on Openness In Language Models, Prompt Engineering labels this an example of an “open weight” model. Such an approach allows external parties to use the model for inference and fine-tuning, but doesn’t allow them to meaningfully improve or analyze the underlying model. Prompt Engineering points out a drawback of this approach:

So, open weights allows model use but not full transparency, while open source enables model understanding and customization but requires substantially more work to release [...] If only open weights are available, developers may utilize state-of-the-art models but lack the ability to meaningfully evaluate biases, limitations, and societal impacts. Misalignment between a model and real-world needs can be difficult to identify.

Further, while writing this article in April 2024, Meta released Llama 3 with the same open-weights policy, claiming that it is “the most capable openly available LLM to date”. This has brought fresh attention to the trade-offs of open-sourcing, as the potential harms of freely sharing software are greater the more powerful the model in question is. Even those who are fond of sharing wouldn’t want everyone in the world to have easy access to the instructions for a 3D-printable rocket launcher, and freely sharing powerful AI could present similar risks; such AI could be used to generate instructions for assembling homemade bombs or even designing deadly pathogens. Distributing information of this nature widely is termed an information hazard.

To prevent these types of hazards, AI models like ChatGPT have safeguards built in during the fine-tuning phase towards the end of their development (implementing techniques such as Reinforcement Learning by Human Feedback, or RLHF). This technique can limit AI models from producing harmful or undesired content.

Some people find ways to get around this fine-tuning, but experts have pointed out that malicious actors could circumvent the problem entirely. ChatGPT and Claude, the two most prominent LLMs are closed-source (and their model weights are closely guarded secrets), but open-source models can be used and deployed without fine-tuning safeguards. This was demonstrated practically with Llama 2, a partly open-source LLM developed by Meta in Palisade Research’s paper BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B. To quote an interview with one of its authors Jeoffrey Ladish:

You can train away the harmlessness. You don’t even need that many examples. You can use a few hundred, and you get a model that continues to maintain its helpfulness capabilities but is willing to do harmful things. It cost us around $200 to train even the biggest model for this. Which is to say, with currently known techniques, if you release the model weights there is no way to keep people from accessing the full dangerous capabilities of your model with a little fine tuning.

Therefore, these models and their underlying software may themselves be information hazards, and many argue that open-sourcing advanced AI should be legally prohibited, or at least prohibited until developers can guarantee the safety of their software. In “Will releasing the weights of future large language models grant widespread access to pandemic agents?”, the authors conclude that

Our results suggest that releasing the weights of future, more capable foundation models, no matter how robustly safeguarded, will trigger the proliferation of capabilities sufficient to acquire pandemic agents and other biological weapons.

Others counter that openness is necessary to stop the power and wealth generated by powerful AI falling into the hands of a few, and that prohibitions won’t be effective safeguards, as argued in GitHub’s Supporting Open Source and Open Science in the EU AI Act and Mozilla’s Joint Statement on AI Safety and Openness, which was signed by over 1,800 people and states: 

Yes, openly available models come with risks and vulnerabilities — AI models can be abused by malicious actors or deployed by ill-equipped developers. However, we have seen time and time again that the same holds true for proprietary technologies — and that increasing public access and scrutiny makes technology safer, not more dangerous. The idea that tight and proprietary control of foundational AI models is the only path to protecting us from society-scale harm is naive at best, dangerous at worst.

Finally, some argue that open-sourcing or not is a false dichotomy, putting forward intermediate policies such as structured access:

Instead of openly disseminating AI systems, developers facilitate controlled, arm's length interactions with their AI systems. The aim is to prevent dangerous AI capabilities from being widely accessible, whilst preserving access to AI capabilities that can be used safely.

There are more perspectives and arguments than we can concisely include here, and you might be interested in the following discussions:

Current Regulatory Policies

The US

The US AI Bill of Rights doesn’t discuss open-source models, but the Executive Order on AI does initiate an investigation into the risk-reward tradeoff of open-sourcing. Section 4.6 calls for soliciting input on foundation models with “widely available model weights”, specifically targeting open-source models. Section 4.6 summarizes the risk-reward tradeoff of publicly sharing model weights, which offers “substantial benefits to innovation, but also substantial security risks, such as the removal of safeguards within the model”. In particular: 4.6 calls for the Secretary of Commerce to:

  • Section 4.6(a): Set up a public consultation with the private sector, academia, civil society, and other stakeholders on the impacts and appropriate policy related to dual-use foundation models with widely available weights (“such models” below), including:
    • 4.6(a)(i): Risks associated with fine-tuning or removing the safeguards from such models; 
    • 4.6(a)(ii): Benefits to innovation, including research into AI safety and risk management, of such models;
    • 4.6(a)(iii): Potential voluntary, regulatory, and international mechanisms to manage risk and maximize the benefits of such models;
  • 4.6(b): Submit a report to the president based on the results of 4.6(a), on the impacts of such models, including policy and regulatory recommendations. 

The EU

The EU AI Act states that open-sourcing can increase innovation and economic growth. The act therefore exempts open-source models and developers from some restrictions and responsibilities placed on other models and developers. Note though that these exemptions do not apply to foundation models (meaning generative AI like ChatGPT), or if the open-source software is monetized or is a component in high-risk software. 

  • Section 57: Places responsibilities on providers throughout the “AI value chain”, i.e. anyone developing components or software that’s used in AI. Third parties should be exempt if their products are open-source, though it encourages open-source developers to implement documentation practices, such as model cards and data sheets.
  • Section 60i & i+1: Clarifies that GPAI models released under free and open-source licenses count as satisfying “high levels of transparency and openness” if their parameters are made publicly available, and a license should be considered free and open-source when users can run, copy, distribute, study, change, and improve the software and data. This exception does not apply if the component is monetized in any way. 
  • Section 60f: Exempts providers of open-source GPAI models from the transparency requirements unless they present a systemic risk. This does not exempt GPAI developers from the obligation to produce a summary about training data or to enact a copyright policy. 
  • Section 60o: Specifies that developers of GPAI models should notify the AI Office if they’re developing a GPAI model that exceeds certain thresholds (therefore conferring systemic risk), and that this is especially important for open-source models.
  • Article 2(5g): States that obligations shall not apply to AI systems released under free and open-source licenses unless they are placed on the market or put into service as high-risk AI systems. 
  • Article 28(2b): States that providers of high-risk AI systems and third parties providing components for such systems have a written agreement on what information the provider will need to comply with the act. However, third parties publishing “AI components other than GPAI models under a free and open licence” are exempt from this.
  • Article 52c(-2) & 52ca(5): Exempt providers of AI models under a free and open licence that publicly release the weights and information on their model from (1) the obligation to draw up technical documentation and (2) from the requirement to appoint an authorized representative in the EU. Neither of these exemptions apply if the GPAI model has systemic risks.

Notably, the treatment of open-source models was contentious during the development of the EU AI Act (see also here). 

China

There is no mention of open-source models in China’s regulations between 2019 and 2023; open-source models are neither exempt from any aspects of the legislation, nor under any additional restrictions or responsibilities. 

Convergence’s Analysis

The boundaries and terminology around open-sourcing are often underspecified. 

  • Open-sourcing vs closed-sourcing AI models is not binary, but a spectrum. Developers must choose whether to publicly release multiple aspects of each model: the weights and parameters of the model; the data used to train the model; the source code and algorithms underlying the model and its training; licenses for free use; and so on. 
  • Existing legislation does not clearly delineate how partially open-sourced models should be categorized and legislated. It’s unclear, for example, whether Meta’s open-weight Llama-2 model would be considered open-source under EU legislation, as its source code is not public. 

Open-sourcing models improves transparency and accountability, but also gives the public broader access to dangerous information and reduces the efficacy of legislation. There is significant disagreement No one agrees on the right balance.

  • Through their training on vast swathes of data, LLMs contain hazardous information. Although RLHF is not sufficient to stop users accessing underlying hazardous information, it is a barrier, and one that can be much more easily bypassed in open-sourced models. 
  • The more powerful a model is, the greater harm its misuse could lead to, and the more open-source a model is, the more easily misused it is. This means the potential harms of open-source models will increase over time.
  • Open-source models can be easily used and altered by potentially any motivated party, making it harder to implement and enforce safety legislation.
  • However, many experts are still staunch advocates for open-sourcing (as listed in the Context section), and believe it is essential for an accountable and transparent AI ecosystem. There is profound disagreement on the right balance between open and closed-source models, and such disagreement is likely to persist. 

Developers of open-source models are not currently under any additional legal  obligations compared to developers of private or commercial models. 

  • In particular, the US Executive Order and Chinese regulations currently have no particular rules unique to open-source models or developers, though the US does recognize the risk-reward tradeoff presented by open-source AI, and has commissioned a report into its safety and appropriate policy. 

The EU legislation treats open-source models favorably. 

  • Unlike the US Executive Order, the EU AI Act only describes the potential benefits of open-sourcing powerful models, without mentioning potential risks. 
  • The EU AI act exempts open-source developers from many obligations faced by commercial competitors, unless the open-sourced software is part of a general-purpose or high-risk system. 
  • Despite this, and despite the exemptions, proponents of open-sourcing have criticized the EU regulations for what they perceive as over-regulation of open-source models. 
Comments1


Sorted by Click to highlight new comments since:

Executive summary: Open-sourcing AI models can foster collaboration and innovation, but also pose serious risks including the potential for misuse and the distribution of harmful information, leading to debates over whether open-sourcing advanced AI should be legally prohibited or regulated.

Key points:

  1. Open-sourcing AI models involves sharing model weights, training data, the underlying source code, and licensing for free commercial usage, which can lead to increased collaboration and innovation.
  2. However, open-sourcing also presents as it may allow for misuse of the model, especially with powerful models that could be used for harmful purposes.
  3. AI models often have safeguards built in to prevent the production of harmful content but these can be circumvented, particularly in the case of open-source models.
  4. While some argue that open-sourcing advanced AI should be prohibited until safety can be guaranteed, others believe that openness is necessary to prevent the concentration of power and wealth, and that prohibitions won't effectively safeguard against misuse.
  5. There is also an argument for intermediate policies such as structured access which allows controlled interactions with AI systems to prevent dangerous capabilities from being widely accessible.
  6. Current regulatory policies vary by region: the US AI Bill of Rights does not specifically address open-source models, the EU AI Act exempts open-source models and developers from some restrictions, and China's regulations do not mention open-source models at all.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Dr Kassim
 ·  · 4m read
 · 
Hey everyone, I’ve been going through the EA Introductory Program, and I have to admit some of these ideas make sense, but others leave me with more questions than answers. I’m trying to wrap my head around certain core EA principles, and the more I think about them, the more I wonder: Am I misunderstanding, or are there blind spots in EA’s approach? I’d really love to hear what others think. Maybe you can help me clarify some of my doubts. Or maybe you share the same reservations? Let’s talk. Cause Prioritization. Does It Ignore Political and Social Reality? EA focuses on doing the most good per dollar, which makes sense in theory. But does it hold up when you apply it to real world contexts especially in countries like Uganda? Take malaria prevention. It’s a top EA cause because it’s highly cost effective $5,000 can save a life through bed nets (GiveWell, 2023). But what happens when government corruption or instability disrupts these programs? The Global Fund scandal in Uganda saw $1.6 million in malaria aid mismanaged (Global Fund Audit Report, 2016). If money isn’t reaching the people it’s meant to help, is it really the best use of resources? And what about leadership changes? Policies shift unpredictably here. A national animal welfare initiative I supported lost momentum when political priorities changed. How does EA factor in these uncertainties when prioritizing causes? It feels like EA assumes a stable world where money always achieves the intended impact. But what if that’s not the world we live in? Long termism. A Luxury When the Present Is in Crisis? I get why long termists argue that future people matter. But should we really prioritize them over people suffering today? Long termism tells us that existential risks like AI could wipe out trillions of future lives. But in Uganda, we’re losing lives now—1,500+ die from rabies annually (WHO, 2021), and 41% of children suffer from stunting due to malnutrition (UNICEF, 2022). These are preventable d
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f
Recent opportunities in AI safety
20
Eva
· · 1m read