Hide table of contents

We’re starting a new reading group for people interested in applying mechanism design tools to technical AI alignment. If you’re interested in joining, you can apply here by the end of the day Berkeley time on August 22nd (applying takes less than five minutes). If you have recommendations for papers to discuss, please mention them in the comments.

What We’re Doing

Mechanism design is the study of how to reach desirable outcomes or equilibria in the face of differing incentives and incomplete information. Many AI safety researchers have expressed enthusiasm about the potential of using these tools for work in alignment, but relatively little work has been done in the intersection. We believe this is partially due to a lack of potential researchers with expertise in both technical AI safety and mechanism design, and partially due to a lack of shovel-ready problems. The goal of this reading group is to make progress on both fronts.

There are three main areas that this reading group will cover:

  1. Work at the intersection of technical AI safety and mechanism design, and where it can be expanded
  2. Current work in technical AI safety, and how mechanism design tools can help
  3. Current work on mechanism design, and how it can be applied to technical AI safety

The plan is to start with papers in the intersection, then alternate between papers on technical AI safety and papers on mechanism design while keeping the broader perspective in mind. Note that although we believe AI governance work is important and contains many applications for mechanism design, that will not be the focus of this reading group.

Who We Are and Who We Want

I (Rubi) am entering the 2nd year of a PhD in Economics this fall, and am currently working on technical AI safety in Berkeley through the SERI MATS program. Other likely participants include three Economics PhD students at top schools and a Math undergraduate student currently taking part in the SERI Summer Research Fellowship. Our hope for this reading group is to connect with people who have similar interests and create the potential for future collaborations.

Based on current expressions of interest, we expect the modal participant in the reading group to be a PhD student in economics, focusing on economic theory, who has read through the AGI Safety Fundamentals curriculum (or an equivalent, such as Eleuther's). If that sounds like you, definitely apply! However, these should not be considered necessary qualifications. Talented undergraduates with an interest in both areas or experts in one area who would like to learn more about the other should also apply. 

If you’re unsure whether you have the background necessary to keep up with this reading group, a good test is to try skimming The Off-Switch Game. It’s a short paper, and on the more accessible end of papers we will be discussing. If you understand it or predict you would be able to understand it within an hour, then you are likely to be able to process the papers that we will discuss without too much additional work. If you find yourself struggling to understand the mathematical notation and proofs, then that is likely a bottleneck and you should consider prioritizing work to advance your comfort level there.

Participants will be expected to commit approximately eight hours a week for this reading group, which consists of five to seven hours reading the week’s paper and an hour and a half to discuss it. If it becomes apparent that a participant is repeatedly not reading or only skimming the papers, they will be removed from the reading group. Please ensure that you can dedicate the required time before applying.

Logistics

The application form can be found here. The only mandatory fields are a link/upload of your CV and confirmation that you are willing to make the necessary time commitment, although there are also optional fields if you would like to elaborate on your background in either mechanism design or technical AI safety. 

Applications will close on Monday August 22nd at midnight PST, and acceptances will be sent out by August 28th. Discussions will begin in the first week of September and continue weekly for twelve weeks. Meetings will be held online, at a time chosen based on the schedules of participants.

We currently expect one discussion group of between five to eight people. However, if there is sufficient interest then we will run however many groups are required to include all qualified applicants.

Exceptional candidates who cannot commit to attending all meetings can contact me directly about sitting in on the subset of meetings that are relevant to their work.

What We’ll be Reading

A number of people have asked to be provided with the reading list that we will be using. This list will be public once it has been finalized, but due to the small nature of the reading group we plan to customize the papers discussed to the interests of the participants. 

To give a taste of the curriculum and to give potential participants a head start on readings, here is what we have planned for the first two weeks:

Week 1 

(Double session, 2.5 hours) 

Incomplete Contracting and AI Alignment by Dylan Hadfield-Menell and Gillian Hadfield,

The Principal-Agent Alignment Problem in Artificial Intelligence by Dylan Hadfield-Menell

This week will begin with introductions and a short icebreaker. The first paper discusses applying mechanism design to AI safety in broad terms, while the second delves more into specifics. In addition to the two papers, this week’s discussion will cover the areas of AI safety where mechanism could be useful, the limitations of the approach, and the potential upside from success

Week 2

(Normal session, 1.5 hours)

Risks from Learned Optimization in Advanced Machine Learning Systems by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant

We expect most participants will have already read this paper, which covers the differences between outer and inner alignment. This week’s discussion will involve a brief review of the paper, followed by consideration of which mechanism design tools can help with each form of alignment.

For the following weeks, a list of relevant topics which may be covered (subject to participant interest) include: the principal-agent problem, cheap talk, multi-agent systems, dynamic mechanisms, robust mechanism design, corrigibility, multi-agent reinforcement learning, cooperative AI and communication, adversarial training and zero-sum mechanisms, causal incentives, and algorithmic mechanism design. Other topics may also be covered, if requested.

Future Plans

Our intention for this reading group is to transition to a working group upon completion. With a shared background, we will be in a good position to provide feedback on each others’ work or collaborate on projects. In addition to a working group, we would also like to have the group produce an agenda in which we lay out what we feel are the most promising research directions, the potential challenges, and the next steps to work on. Ideally (i.e. conditional on funding) this agenda would be hammered out over multiple days at a retreat that includes subject matter experts in both mechanism design and technical AI safety. 

Between a reading list, a research agenda, and an active community of researchers, we would be in a position where new members could quickly get up to speed. The long-term goal is to increase the number of people working on technical AI safety by making it easy for mechanism design researchers to contribute, and to improve the quality of technical AI safety research by expanding the set of available tools.

The application form is here, and applications are due by Monday August 22nd at 11:59pm PDT. Please pass along this post to anyone who you think would be interested in this reading group.

Thanks to Cecilia Wood and Amelia Michael for reviewing earlier drafts of this post.


 

36

0
0

Reactions

0
0

More posts like this

Comments1


Sorted by Click to highlight new comments since:

oof, super bummed to have missed this and just now find out about it. 

Curated and popular this week
 ·  · 12m read
 · 
Economic growth is a unique field, because it is relevant to both the global development side of EA and the AI side of EA. Global development policy can be informed by models that offer helpful diagnostics into the drivers of growth, while growth models can also inform us about how AI progress will affect society. My friend asked me to create a growth theory reading list for an average EA who is interested in applying growth theory to EA concerns. This is my list. (It's shorter and more balanced between AI/GHD than this list) I hope it helps anyone who wants to dig into growth questions themselves. These papers require a fair amount of mathematical maturity. If you don't feel confident about your math, I encourage you to start with Jones 2016 to get a really strong grounding in the facts of growth, with some explanations in words for how growth economists think about fitting them into theories. Basics of growth These two papers cover the foundations of growth theory. They aren't strictly essential for understanding the other papers, but they're helpful and likely where you should start if you have no background in growth. Jones 2016 Sociologically, growth theory is all about finding facts that beg to be explained. For half a century, growth theory was almost singularly oriented around explaining the "Kaldor facts" of growth. These facts organize what theories are entertained, even though they cannot actually validate a theory – after all, a totally incorrect theory could arrive at the right answer by chance. In this way, growth theorists are engaged in detective work; they try to piece together the stories that make sense given the facts, making leaps when they have to. This places the facts of growth squarely in the center of theorizing, and Jones 2016 is the most comprehensive treatment of those facts, with accessible descriptions of how growth models try to represent those facts. You will notice that I recommend more than a few papers by Chad Jones in this
LintzA
 ·  · 15m read
 · 
Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achieve 25% on its Frontier Math
Omnizoid
 ·  · 5m read
 · 
Edit 1/29: Funding is back, baby!  Crossposted from my blog.   (This could end up being the most important thing I’ve ever written. Please like and restack it—if you have a big blog, please write about it). A mother holds her sick baby to her chest. She knows he doesn’t have long to live. She hears him coughing—those body-wracking coughs—that expel mucus and phlegm, leaving him desperately gasping for air. He is just a few months old. And yet that’s how old he will be when he dies. The aforementioned scene is likely to become increasingly common in the coming years. Fortunately, there is still hope. Trump recently signed an executive order shutting off almost all foreign aid. Most terrifyingly, this included shutting off the PEPFAR program—the single most successful foreign aid program in my lifetime. PEPFAR provides treatment and prevention of HIV and AIDS—it has saved about 25 million people since its implementation in 2001, despite only taking less than 0.1% of the federal budget. Every single day that it is operative, PEPFAR supports: > * More than 222,000 people on treatment in the program collecting ARVs to stay healthy; > * More than 224,000 HIV tests, newly diagnosing 4,374 people with HIV – 10% of whom are pregnant women attending antenatal clinic visits; > * Services for 17,695 orphans and vulnerable children impacted by HIV; > * 7,163 cervical cancer screenings, newly diagnosing 363 women with cervical cancer or pre-cancerous lesions, and treating 324 women with positive cervical cancer results; > * Care and support for 3,618 women experiencing gender-based violence, including 779 women who experienced sexual violence. The most important thing PEPFAR does is provide life-saving anti-retroviral treatments to millions of victims of HIV. More than 20 million people living with HIV globally depend on daily anti-retrovirals, including over half a million children. These children, facing a deadly illness in desperately poor countries, are now going