So far I have only been posting on LessWrong, but from now on I will publish my posts both here and on LW. I have four LW posts that are all connected to the same topic: risks from scenarios where someone successfully hits a bad alignment target. This post gives a short summary of this topic, and describes how my four LW posts fits together.

 

A post describing a problem with Parliamentarian CEV (PCEV).

My first LW post showed that PCEV gives a large amount of extra influence to people that intrinsically value hurting other people. A powerful AI controlled by such people would be very dangerous. The fact that this feature of PCEV went undetected for more than a decade shows that analysing alignment targets is difficult. The fact that a successfully implemented PCEV would have led to an outcome massively worse than extinction, shows that failure to properly analyse an alignment target can be very dangerous. The fact that the issue was eventually noticed, shows that it is possible to reduce these dangers. In other words: Alignment Target Analysis (ATA) is a tractable way of reducing a serious risk. But there does not seem to exist a single research project dedicated to ATA. This is why I do ATA.

There seems to exist a wide variety of reasons for thinking that doing ATA now is not needed. In other words: there exists a wide variety of arguments for why it is ok to stay at our current levels of ATA progress, without making any real effort to improve things. These arguments are usually both unpublished and hand wavy. My three other LW posts each counter one such argument.


A post discussing the idea of building a limited AI, that is only used to shut down competing AI projects.

The post assumes that some limited AI will prevent all unauthorised AI projects forever. The post then shows why this assumption does not actually remove the urgency of doing ATA now. Decisions regarding Sovereign AI will still be in human hands (the limited AI could by assumption be safely launched without any further ATA progress. Thus, no decisions regarding Sovereign AI can be deferred to this limited AI). There exists many reasons why someone would decide to quickly launch a Sovereign AI. The risks from a competing AI project is only one such reason, and the post discusses other reasons. If some alignment target has a hidden flaw, then finding that flaw would thus remain urgent, even if competing AI projects are taken out of the equation.


A post explaining why the Last Judge idea does not remove the need to do ATA now.

This post points out that a Last Judge off-switch add on can fail, meaning that this idea cannot remove the need for doing ATA now. The post also outlines a specific scenario where such an add on fails. It finally points out that such an add on can be attached to many different AI projects, aiming at many different alignment targets. This means that the idea of a Last Judge is not very helpful when deciding what alignment target to aim at.


A post explaining why the Corrigibility idea does not remove the need to do ATA now.

This post shows that a partially successful Corrigibility method can actually make things worse. It outlines a scenario where a Corrigibility method works for a limited AI that is used to buy time. But fails for an AI Sovereign. This can make things worse, because a bad alignment target might end up getting successfully implemented. The Corrigible limited AI makes the Sovereign AI project possible. And the Sovereign AI project moves forwards because the designers think that the Corrigibility method will also work for this project. The designers know that it is possible that their alignment target is bad. But they think that this is a manageable risk, because they think that the Sovereign AI will also be Corrigible.

Comments


No comments on this post yet.
Be the first to respond.
Curated and popular this week
 ·  · 16m read
 · 
Applications are currently open for the next cohort of AIM's Charity Entrepreneurship Incubation Program in August 2025. We've just published our in-depth research reports on the new ideas for charities we're recommending for people to launch through the program. This article provides an introduction to each idea, and a link to the full report. You can learn more about these ideas in our upcoming Q&A with Morgan Fairless, AIM's Director of Research, on February 26th.   Advocacy for used lead-acid battery recycling legislation Full report: https://www.charityentrepreneurship.com/reports/lead-battery-recycling-advocacy    Description Lead-acid batteries are widely used across industries, particularly in the automotive sector. While recycling these batteries is essential because the lead inside them can be recovered and reused, it is also a major source of lead exposure—a significant environmental health hazard. Lead exposure can cause severe cardiovascular and cognitive development issues, among other health problems.   The risk is especially high when used-lead acid batteries (ULABs) are processed at informal sites with inadequate health and environmental protections. At these sites, lead from the batteries is often released into the air, soil, and water, exposing nearby populations through inhalation and ingestion. Though data remain scarce, we estimate that ULAB recycling accounts for 5–30% of total global lead exposure. This report explores the potential of launching a new charity focused on advocating for stronger ULAB recycling policies in low- and middle-income countries (LMICs). The primary goal of these policies would be to transition the sector from informal, high-pollution recycling to formal, regulated recycling. Policies may also improve environmental and safety standards within the formal sector to further reduce pollution and exposure risks.   Counterfactual impact Cost-effectiveness analysis: We estimate that this charity could generate abou
sawyer🔸
 ·  · 2m read
 · 
Note: This started as a quick take, but it got too long so I made it a full post. It's still kind of a rant; a stronger post would include sources and would have gotten feedback from people more knowledgeable than I. But in the spirit of Draft Amnesty Week, I'm writing this in one sitting and smashing that Submit button. Many people continue to refer to companies like OpenAI, Anthropic, and Google DeepMind as "frontier AI labs". I think we should drop "labs" entirely when discussing these companies, calling them "AI companies"[1] instead. While these companies may have once been primarily research laboratories, they are no longer so. Continuing to call them labs makes them sound like harmless groups focused on pushing the frontier of human knowledge, when in reality they are profit-seeking corporations focused on building products and capturing value in the marketplace. Laboratories do not directly publish software products that attract hundreds of millions of users and billions in revenue. Laboratories do not hire armies of lobbyists to control the regulation of their work. Laboratories do not compete for tens of billions in external investments or announce many-billion-dollar capital expenditures in partnership with governments both foreign and domestic. People call these companies labs due to some combination of marketing and historical accident. To my knowledge no one ever called Facebook, Amazon, Apple, or Netflix "labs", despite each of them employing many researchers and pushing a lot of genuine innovation in many fields of technology. To be clear, there are labs inside many AI companies, especially the big ones mentioned above. There are groups of researchers doing research at the cutting edge of various fields of knowledge, in AI capabilities, safety, governance, etc. Many individuals (perhaps some readers of this very post!) would be correct in saying they work at a lab inside a frontier AI company. It's just not the case that any of these companies as
 ·  · 11m read
 · 
My name is Keyvan, and I lead Anima International’s work in France. Our organization went through a major transformation in 2024. I want to share that journey with you. Anima International in France used to be known as Assiettes Végétales (‘Plant-Based Plates’). We focused entirely on introducing and promoting vegetarian and plant-based meals in collective catering. Today, as Anima, our mission is to put an end to the use of cages for laying hens. These changes come after a thorough evaluation of our previous campaign, assessing 94 potential new interventions, making several difficult choices, and navigating emotional struggles. We hope that by sharing our experience, we can help others who find themselves in similar situations. So let me walk you through how the past twelve months have unfolded for us.  The French team Act One: What we did as Assiettes Végétales Since 2018, we worked with the local authorities of cities, counties, regions, and universities across France to develop vegetarian meals in their collective catering services. If you don’t know much about France, this intervention may feel odd to you. But here, the collective catering sector feeds a huge number of people and produces an enormous quantity of meals. Two out of three children, more than seven million in total, eat at a school canteen at least once a week. Overall, more than three billion meals are served each year in collective catering. We knew that by influencing practices in this sector, we could reach a massive number of people. However, this work was not easy. France has a strong culinary heritage deeply rooted in animal-based products. Meat and fish-based meals remain the standard in collective catering and school canteens. It is effectively mandatory to serve a dairy product every day in school canteens. To be a certified chef, you have to complete special training and until recently, such training didn’t include a single vegetarian dish among the essential recipes to master. De