3584 karmaJoined Oct 2020


AI safety governance/strategy research & field-building.

Formerly a PhD student in clinical psychology @ UPenn, college student at Harvard, and summer research fellow at the Happier Lives Institute.


Sorted by New



Congratulations on launching!

On the governance side, one question I'd be excited to see Apollo (and ARC evals & any other similar groups) think/write about is: what happens after a dangerous capability eval goes off? 

Of course, the actual answer will be shaped by the particular climate/culture/zeitgeist/policy window/lab factors that are impossible to fully predict in advance.

But my impression is that this question is relatively neglected, and I wouldn't be surprised if sharp newcomers were able to meaningfully improve the community's thinking on this. 


Excited to see this! I'd be most excited about case studies of standards in fields where people didn't already have clear ideas about how to verify safety.

In some areas, it's pretty clear what you're supposed to do to verify safety. Everyone (more-or-less) agrees on what counts as safe.

One of the biggest challenges with AI safety standards will be the fact that no one really knows how to verify that a (sufficiently-powerful) system is safe. And a lot of experts disagree on the type of evidence that would be sufficient.

Are there examples of standards in other industries where people were quite confused about what "safety" would require? Are there examples of standards that are specific enough to be useful but flexible enough to deal with unexpected failure modes or threats? Are there examples where the standards-setters acknowledged that they wouldn't be able to make a simple checklist, so they requested that companies provide proactive evidence of safety?


Glad to see this write-up & excited for more posts.

I think these are three areas that MATS has handled well. I'd be especially excited to hear more about areas where MATS thinks it's struggling, MATS is uncertain, or where MATS feels like it has a lot of room to grow. Potential candidates include:

  • How is MATS going about talent selection and advertising for the next cohort, especially given the recent wave of interest in AI/AI safety?
  • How does MATS intend to foster (or recruit) the kinds of qualities that strong researchers often possess?
  • How does MATS define "good" alignment research? 

Other things I'm be curious about:

  • Which work from previous MATS scholars is the MATS team most excited about? What are MATS's biggest wins? Which individuals or research outputs is MATS most proud of?
  • Most peoples' timelines have shortened a lot since MATS was established. Does this substantially reduce the value of MATS (relative to worlds with longer timelines)?
  • Does MATS plan to try to attract senior researchers who are becoming interested in AI Safety (e.g., professors, people with 10+ years of experience in industry)? Or will MATS continue to recruit primarily from the (largely younger and less experienced) EA/LW communities?

Clarification: I think we're bottlenecked by both, and I'd love to see the proposals become more concrete. 

Nonetheless, I think proposals like "Get a federal agency to regulate frontier AI labs like the FDA/FAA" or even "push for an international treaty that regulates AI in a way that the IAEA regulates atomic energy" are "concrete enough" to start building political will behind them. Other (more specific) examples include export controls, compute monitoring, licensing for frontier AI models, and some others on Luke's list

I don't think any of these are concrete enough for me to say "here's exactly how the regulatory process should be operationalized", and I'm glad we're trying to get more people to concretize these. 

At the same time, I expect that a lot of the concretization happens after you've developed political will. If the USG really wanted to figure out how to implement compute monitoring, I'm confident they'd be able to figure it out. 

More broadly, my guess is that we might disagree on how concrete a proposal needs to be before you can actually muster political will behind it, though. Here's a rough attempt at sketching out three possible "levels of concreteness". (First attempt; feel free to point out flaws). 

Level 1, No concreteness: You have a goal but no particular ideas for how to get there. (e.g., "we need to make sure we don't build unaligned AGI")

Level 2, Low concreteness: You have a goal with some vagueish ideas for how to get there (e.g., "we need to make sure we don't build unaligned AGI, and this should involve evals/compute monitoring, or maybe a domestic ban on AGI projects and a single international project). 

Level 3, Medium concreteness: You have a goal with high-level ideas for how to get there. (e.g., "We would like to see licensing requirements for models trained above a certain threshold. Still ironing out whether or not that threshold should be X FLOP, Y FLOP, or $Z, but we've got some initial research and some models for how this would work.)

Level 4, High concreteness: You have concrete proposals that can be debated. (e.g., We should require licenses for anything above X FLOP, and we have some drafts of the forms that labs would need to fill out.)

I get the sense that some people feel like we need to be at "medium concreteness" or "high concreteness" before we can start having conversations about implementation. I don't think this is true.

Many laws, executive orders, and regulatory procedures have vague language (often at Level 2 or in-between Level 2 and Level 3). My (loosely-held, mostly based on talking to experts and reading things) sense quite common for regulators to be like "we're going to establish regulations for X, and we're not yet exactly sure what they look like. Part of this regulatory agency's job is going to be to figure out exactly how to operationalize XYZ."

I also think that recent events have been strong evidence in favor of my position: we got a huge amount of political will "for free" from AI capabilities advances, and the best we could do with it was to push a deeply flawed "let's all just pause for 6 months" proposal.

I don't think this is clear evidence in favor of the "we are more bottlenecked by concrete proposals" position. My current sense is that we were bottlenecked both by "not having concrete proposals" and by "not having relationships with relevant stakeholders."

I also expect that the process of concretizing these proposals will likely involve a lot of back-and-forth with people (outside the EA/LW/AIS community) who have lots of experience crafting policy proposals. Part of the benefit of "building political will" is "finding people who have more experience turning ideas into concrete proposals."


I don't actually think the implementation of governance ideas is mainly bottlenecked by public support; I think it's bottlenecked by good concrete proposals. And to the extent that it is bottlenecked by public support, that will change by default as more powerful AI systems are released.

I appreciate Richard stating this explicitly. I think this is (and has been) a pretty big crux in the AI governance space right now.

Some folks (like Richard) believe that we're mainly bottlenecked by good concrete proposals. Other folks believe that we have concrete proposals, but we need to raise awareness and political support in order to implement them.

I'd like to see more work going into both of these areas. On the margin, though, I'm currently more excited about efforts to raise awareness [well], acquire political support, and channel that support into achieving useful policies. 

I think this is largely due to (a) my perception that this work is largely neglected, (b) the fact that a few AI governance professionals I trust have also stated that they see this as the higher priority thing at the moment, and (c) worldview beliefs around what kind of regulation is warranted (e.g., being more sympathetic to proposals that require a lot of political will).


Lots of awesome stuff requires AGI or superintelligence. People think LLMs (or stuff LLMs invent) will lead to AGI or superintelligence.

So wouldn’t slowing down LLM progress slow down the awesome stuff?


I think more powerful (aligned) LLMs would lead to more awesome stuff, so caution on LLMs does delay other awesome stuff.

I agree with the point that "there's value that can be gained from figuring out how to apply systems at current capabilities levels" (AI summer harvest), but I wouldn't go as far as "you can almost have the best of both worlds." It seems more like "we can probably do a lot of good with existing AI, so even though there are costs of caution, those costs are worth paying, and at least we can make some progress applying AI to pressing world problems while we figure out alignment/governance." (My version isn't catchy though, oops).


I appreciate that this post acknowledges that there are costs to caution. I think it could've gone a bit further in emphasizing how these costs, while large in an absolute sense, are small relative to the risks.

The formal way to do this would be a cost-benefit analysis on longtermist grounds (perhaps with various discount rates for future lives). But I think there's also a way to do this in less formal/wonky language, without requiring any longtermist assumptions.

If you have a technology where half of experts believe there's a ~10% chance of extinction, the benefits need to be enormous for them to outweigh the costs of caution. I like Tristan Harris's airplane analogy:

Imagine: would you board an airplane if 50% of airplane engineers who built it said there was a 10% chance that everybody on board dies?

Here's another frame (that I've been finding useful with folks who don't follow the technical AI risk scene much): History is full of examples of people saying that they are going to solve everyone's problems. There are many failed messiah stories. In the case of AGI, it's true that aligned and responsibly developed AI could do a lot of good. But when you have people saying "the risks are overblown-- we're smart and responsible enough to solve everything", I think it's pretty reasonable to be skeptical (on priors alone).

Finally, one thing that sometimes gets missed in this discussion is that most advocates of pause still want to get to AGI eventually. Slowing down for a few years or decades is costly, and advocates of slowdown should recognize this. But the costs are substantially lower than the risks. I think both of these messages get missed in discussions about slowdown.

Answer by AkashApr 27, 202365

The impression I get is that lots of people are like “yeah, I’d like to see more work on this & this could be very important” but there aren’t that many people who want to work on this & have ideas.

Is there evidence that funding isn’t available for this work? My loose impression is that mainstream funders would be interested in this. I suppose it’s an area where it’s especially hard to evaluate the promisingness of a proposal, though.

Reasons people might not be interested in doing this work: — Tractability — Poor feedback loops — Not many others in the community to get feedback from — Has to deal with thorny and hard-to-concretize theoretical questions

Reasons people might want to work on this: — Importance and neglectedness — Seems plausible that one could become one of the most knowledgeable EAs on this topic in not much time — Interdisciplinary; might involve interacting a lot with the non-EA world, academia, etc — Intellectually stimulating

See also: https://80000hours.org/podcast/episodes/robert-long-artificial-sentience/



On the letter itself, though, I have a bunch of uncertainties around whether a six month pause right now would actually help

I share many of your concerns, though I think on balance I feel more enthusiastic about the six-month pause. (Note that I'm thinking about a six-month pause on frontier AI development that is enforced across the board, at least in the US, and I'm more confused about a six-month pause that a few specific labs opt-in to). 

I wonder if this relates more to an epistemic difference (e.g., the actual credence we put on the six-month pause being net positive [or to be more nuanced, the EV we expect once we account for the entire distribution of outcomes]) or a communication difference (e.g., differences in our willingness to express support for things under conditions of high uncertainty). 

Regarding the worries you list, #4 is the one I'm most concerned/uncertain about. The others seem to boil down to "if we get a pause, we need to make sure we use it well." If we get a pause, we should use it to (a) strengthen AI governance ideas and evals, (b) develop and push for more ambitious asks, and (c) build a larger a coalition of people who are concerned about risks from advanced AI.

All of these things are hard. But, all else equal, they seem more likely to happen in a world with a six-month pause than a world without one. 

Whereas I think the fourth worry argues why the pause might be net negative. I'm particularly concerned about scenarios where there are many more actors at the frontier of AI development, and race dynamics are even more concerning. (On the other hand, a six-month pause is also a signal that the world is more likely to regulate frontier AI labs. If people expect that the six-month pause will be followed by additional regulation, this might make it less appealing for new actors to enter the race.)

Anyways, I'm still left wondering why I (a) agree with lots of your points yet (b) feel more enthusiastic about a six-month pause. 

I'm curious about your all-things-considered perspective on the six-month pause idea: Do you currently think it's net positive, net negative, or near-zero-value in expectation?

Load more