Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities

c.trout

TL;DR

I need help finding/developing a mech that can reliably elicit honest and effortful risk-estimates from frontier AI labs regarding their models, risk-estimates to be used in risk-priced premiums that labs then pay the government (i.e. a "Pigouvian tax for x-risk"). Current best guess: Bayesian Truth Serum.

Stretch-goal: find/develop a mech for incenting optimal production of public safety research. Current best guess: Quadratic Financing.

DM me if you're interested in collaborating!

The Project

X-risk poses an extreme judgment-proof problem: threats of ex post punishment for causing an existential (or nationally existential, or even just a disaster that renders your company insolvent) have little to any deterrent effect. Liability on its own will completely fail to internalize these negative externalities.

Traditionally, risk-priced insurance premiums are used to solve judgment-proofness (turn large ex post costs into small regular ex ante costs). However, insurers are also judgment-proof in the face of x-risk.

I'm developing a regime for insuring these "uninsurable" risks from frontier AI. It's modeled after the arguably very successful liability and insurance regime for nuclear power operators. In two recent workshop papers, I argue we should make foundation model developers liable for a certain class of catastrophic harms/near miss events and:

Mandate private insurance for commercially insurable losses (e.g. up to ~$500B in damages)
Have the government charge risk-priced premiums for insurance against uninsurable losses (i.e. a "Pigouvian tax for x-risk")

A government agency – through audits, its own forecasts and so on – could (and should) try to make these risk-estimates. However, this will be costly and they will struggle to collapse the information asymmetry between it and the developers it insures. Relying mostly on mechanism design to just incentivize labs to report honest and effortful risk-estimates has a number of advantages:

It should be cheaper for the government (more politically viable)
It should better leverage all available information, and result in more information gathering – it should just result in better risk -estimates
It's more secure: developers can divulge the risk implications of their private info without sharing sensitive private info (e.g. model weights).

The regime in schematic form:

The Work

I lack the mech design expertise to confidently assess the quality/relevance/usability of mechs I read about; I'm looking for someone with at least a graduate level understanding of mechanism design to collaborate with me.

The work will most likely involve:

you sifting through some papers/mechs
you floating the best contenders to the top
debating the pros and cons of each with me
(if necessary: you make modifications to our favorite pick)
(if necessary: you prove some nice things about the mech)
you explain things well enough to me so that I can make the write-up (assuming you don't want to make the write-up)

It's possible this only takes you ~40 hrs to accomplish, if an appropriately plug-n-play mechanism is already out there. I doubt this however (I have done some preliminary searching).

You can find more details of the questions I think need working out here (a much older, longer draft of the workshop papers linked above).

Theory of Impact

The goal is to write a large policy paper and then spin off some policy memos. I've applied to some policy fellowships. I plan to work on this proposal regardless, but obviously if I get in, that will be my platform for sharing this work.

Policy folks I've talked to, including a few people that work in DC, have expressed interest in seeing this developed further – e.g. Makenzie Arnold tells me this is "in the category of sensible" proposals. But obviously it needs more fleshing out.

If the research collaboration I'm proposing here goes great and policy folks love it, we may want to do a follow-up running an experiment to empirically verify our claims in as analogous a setting as we can muster.

Compensation/Timeline

If money is an issue, I'm willing to pay 20~40$/hr, possibly more for exceptional collaborators. I'm also happy to do all the writing/paperwork if you just want to provide the thought input. Happy to help write a grant proposal too (but I'm unlikely to be able to secure one on my own – see below.)
Ideally, I'd like to get something written up by EoY. Within that time frame though, I'm very flexible.

About me

FWIW, my two workshop papers linked above were accepted into the GenLaw workshop at ICML 2024.

I recently did this deep dive into insurance, liability and especially the nuclear power precedent, but I'm only a casual appreciator of mechanism design and economics. My MA was in philosophy.

NB: I'm a very early career professional and do not have an institutional affiliation.

I'm based in Boston (out of the AISST office).

Contact

If all this sounds like you, or someone you know, feel free to DM me or email at ctroutcsi@gmail.com! If you're just interested in the proposed regime or have questions, feel free to ask them in the comments.

Effective Altruism Forum
EA Forum