3919 karmaJoined Oct 2020


AI safety governance/strategy research & field-building.

Formerly a PhD student in clinical psychology @ UPenn, college student at Harvard, and summer research fellow at the Happier Lives Institute.



Congrats to Zach! I feel like this is mostly supposed to be a "quick update/celebratory post", but I feel like there's a missing mood that I want to convey in this comment. Note that my thoughts mostly come from an AI Safety perspective, so these thoughts may be less relevant for folks who focus on other cause areas.

My impression is that EA is currently facing an unprecedented about of PR backlash, as well as some solid internal criticisms among core EAs who are now distancing from EA. I suspect this will likely continue into 2024. Some examples:

  • EA has acquired several external enemies as a result of the OpenAI coup. I suspect that investors/accelerationists will be looking for ways to (further) damage EA's reputation.
  • EA is acquiring external enemies as a result of its political engagements. There have been a few news articles recently criticizing EA-affiliated or EA-influenced fellowship programs and think-tanks.
  • EA is acquiring an increasing number of internal critics. Informally, I feel like many people I know (myself included) have become increasingly dissatisfied with the "modern EA movement" and "mainstream EA institutions". Examples of common criticisms include "low integrity/low openness", "low willingness to critique powerful EA institutions", "low willingness to take actions in the world that advocate directly/openly for beliefs", "cozyness with AI labs", "general slowness/inaction bias", and "lack of willingness to support groups pushing for concrete policies to curb the AI race." (I'll acknowledge that some of these are more controversial than others and could reflect genuine worldview differences, though even so, my impression is that they're meaningfully contributing to a schism in ways that go beyond typical worldview differences).

I'd be curious to know how CEA is reacting to this. The answer might be "well, we don't really focus much on AI safety, so we don't really see this as our thing to respond to." The answer might be "we think these criticisms are unfair/low-quality, so we're going to ignore them." Or the answer might be "we take X criticism super seriously and are planning to do Y about it."

Regardless, I suspect that this is an especially important and challenging time to be the CEO of CEA. I hope Zach (and others at CEA) are able to navigate the increasing public scrutiny & internal scrutiny of EA that I suspect will continue into 2024.

Do you know anything about the strategic vision that Zach has for CEA? Or is this just meant to be a positive endorsement of Zach's character/judgment? 

(Both are useful; just want to make sure that the distinction between them is clear). 

I appreciate the comment, though I think there's a lack of specificity that makes it hard to figure out where we agree/disagree (or more generally what you believe).

If you want to engage further, here are some things I'd be excited to hear from you:

  • What are a few specific comms/advocacy opportunities you're excited about//have funded?
  • What are a few specific comms/advocacy opportunities you view as net negative//have actively decided not to fund?
  • What are a few examples of hypothetical comms/advocacy opportunities you've been excited about?
  • What do you think about EG Max Tegmark/FLI, Andrea Miotti/Control AI, The Future Society, the Center for AI Policy, Holly Elmore, PauseAI, and other specific individuals or groups that are engaging in AI comms or advocacy? 

I think if you (and others at OP) are interested in receiving more critiques or overall feedback on your approach, one thing that would be helpful is writing up your current models/reasoning on comms/advocacy topics.

In the absence of this, people simply notice that OP doesn't seem to be funding some of the main existing examples of comms/advocacy efforts, but they don't really know why, and they don't really know what kinds of comms/advocacy efforts you'd be excited about.

Answer by AkashDec 16, 202325

I expect that your search for a "unified resource" will be unsatisfying. I think people disagree enough on their threat models/expectations that there is no real "EA perspective".

Some things you could consider doing:

  • Having a dialogue with 1-2 key people you disagree with
  • Pick one perspective (e.g., Paul's worldview, Eliezer's worldview) and write about areas you disagree with it.
  • Write up a "Matthew's worldview" doc that focuses more on explaining what you expect to happen and isn't necessarily meant as a "counterargument" piece. 

Among the questions you list, I'm most interested in these:

  • How bad human disempowerment would likely be from a utilitarian perspective
  • Whether there will be a treacherous turn event, during which AIs violently take over the world after previously having been behaviorally aligned with humans
  • How likely AIs are to kill every single human if they are unaligned with humans
  • How society is likely to respond to AI risks, and whether they'll sleepwalk into a catastrophe

Thanks for this overview, Trevor. I expect it'll be helpful– I also agree with your recommendations for people to consider working at standard-setting organizations and other relevant EU offices.

One perspective that I see missing from this post is what I'll call the advocacy/comms/politics perspective. Some examples of this with the EU AI Act:

  • Foundation models were going to be included in the EU AI Act, until France and Germany (with lobbying pressure from Mistral and Aleph Alpha) changed their position.
  • This initiated a political/comms battle between those who wanted to exclude foundation models (led by France and Germany) and those who wanted to keep it in (led by Spain).
  • This political fight rallied lots of notable figures, including folks like Gary Marcus and Max Tegmark, to publicly and privately fight to keep foundation models in the act.
  • There were open letters, op-eds, and certainly many private attempts at advocacy.
  • There were attempts to influence public opinion, pieces that accused key lobbyists of lying, and a lot of discourse on Twitter.

It's difficult to know the impact of any given public comms campaign, but it seems quite plausible to me that many readers would have more marginal impact by focusing on advocacy/comms than focusing on research/policy development.

More broadly, I worry that many segments of the AI governance/policy community might be neglecting to think seriously about what ambitious comms/advocacy could look like in the space.

I'll note that I might be particularly primed to bring this up now that you work for Open Philanthropy. I think many folks (rightfully) critique Open Phil for being too wary of advocacy, campaigns, lobbying, and other policymaker-focused activities. I'm guessing that Open Phil has played an important role in shaping both the financial and cultural incentives that (in my view) leads to an overinvestment into research and an underinvestment into policy/advocacy/comms. 

(I'll acknowledge these critiques are pretty high-level and I don't claim that this comment provides compelling evidence for them. Also, you only recently joined Open Phil, so I'm of course not trying to suggest that you created this culture, though I guess now that you work there you might have some opportunities to change it).

I'll now briefly try to do a Very Hard thing which is like "put myself in Trevor's shoes and ask what I actually want him to do." One concrete recommendation I have is something like "try to spend at least 5 minutes thinking about ways in which you or others around you might be embedded in a culture that has blind spots to some of the comms/advocacy stuff." Another is "make a list of people you read actively or talked to when writing this post. Then ask if there were any other people/orgs you could've reached out, particularly those that might focus more on comms+adovacy". (Also, to be clear, you might do both of these things and conclude "yea, actually I think my approach was very solid and I just had Good Reasons for writing the post the way I did.")

I'll stop here since this comment is getting long, but I'd be happy to chat further about this stuff. Thanks again for writing the post and kudos to OP for any of the work they supported/will support that ends up increasing P(good EU AI Act goes through & gets implemented). 

I'm excited to see the EAIF share more about their reasoning and priorities. Thank you for doing this!

I'm going to give a few quick takes– happy to chat further about any of these. TLDR: I recommend (1) getting rid of the "principles-first" phrase & (2) issuing more calls for proposals focused on the specific projects you want to see (regardless of whether or not they fit neatly into an umbrella term like "principles-first")

  • After skimming the post for 5 minutes, I couldn't find a clear/succinct definition of what "principles-first" actually means. I think it means something like "focus more on epistemics and core reasoning" and "focus less on specific cause areas". But then some of the examples of the projects that Caleb is excited about are basically just like "get people together to think about a specific cause area– but not one of the mainstream ones, like one of the more neglected ones."
  • I find the "principles-first" frame a bit icky at first glance. Something about it feels... idk... just weird and preachy or something. Ok, what's actually going on there?
    • Maybe part of it is that it seems to imply that people who end up focusing on specific cause areas are not "principles-first" people, or like in the extreme case they're not "good EAs". And then it paints a picture for me where a "good EA" is one who spends a bunch of time doing "deep reasoning", instead of doing cause-area-specific work. Logically, it's pretty clear to me that this isn't what the posters are trying to say, but I feel like that's part of where the system 1 "ick" feeling is coming from.
  • I worry that the term "principles-first EA" might lead to a bunch of weird status things and a bunch of unhelpful debates. For me, the frame naturally invokes questions like "what principles?" and "who gets to decide what those principles are?" and all sorts of "what does it truly mean to be an EA?" kinds of questions. Maybe the posters think that, on the margin, more people should be asking these questions. But I think that should be argued for separately– if EAIF adopts this phrase as their guiding phrase, I suspect a lot of people will end up thinking "I need to understand what EAIF thinks the principles of EA are and then do those things".
  • Personally, I don't think the EAIF needs to have some sort of "overarching term" that summarizes what it is prioritizing. I think it's quite common for grantmaking organizations to just say "hey, here's a call for proposals with some examples of things we're excited about."
  • Personally, I'm very excited about the projects that Caleb listed in the appendix. Some of these don't really seem to me to fall neatly under the "principles-first" label (a bunch of them just seem like "let's do deconfusion work or make progress in specific areas that are important and highly neglected."
  • Historically, my impression is that EAIF hasn't really done many calls for proposals relating to specific topics. It has been more like "hey anyone with any sort of meta idea can apply." I'm getting the sense from this post that Caleb wants EAIF to have a clearer focus. Personally, I would encourage EAIF to do more "calls for proposals" focused on specific projects that they want to see happen in the world. As an example, EAIF could say something like "we are interested in seeing proposals about acausal trade and ethics of digital minds. Here are some examples of things you could do."
    • I think there are a lot of "generally smart and agentic people" around who don't really know what to do, and some guidance from grantmakers along the lines of "here are some projects that we want to see people apply to" could considerably lower the amount of agency/activation energy/confidence/inside-viewness that such people need.
    • On the flip side, we'd want to avoid a world in which people basically just blindly defer to grantmakers. I don't suspect calls for proposals to contribute to that too much, and I also suspect there's a longer conversation that could be had about how to avoid these negative cultural externalities.

Personally, I still think there is a lot of uncertainty around how governments will act. There are at least some promising signs (e.g., UK AI Safety Summit) that governments could intervene to end or substantially limit the race toward AGI. Relatedly, I think there's a lot to be done in terms of communicating AI risks to the public & policymakers, drafting concrete policy proposals, and forming coalitions to get meaningful regulation through. 

Some folks also have hope that internal governance (lab governance) could still be useful. I am not as optimistic here, but I don't want to rule it out entirely. 

There's also some chance that we end up getting more concrete demonstrations of risks. I do not think we should wait for these, and I think there's a sizable chance we do not get them in time, but I think "have good plans ready to go in case we get a sudden uptick in political will & global understanding of AI risks" is still important. 

I would be interested in seeing your takes about why building runway might be more cost-effective than donating.

Separately, if you decide not to go with 10% because you want to think about what is actually best for you, I suggest you give yourself a deadline. Like, suppose you currently think that donating 10% would be better than status quo. I suggest doing something like “if I have not figured out a better solution by Jan 1 2024, I will just do the community-endorsed default of 10%.”

I think this protects against some sort of indefinite procrastination. (Obviously less relevant if you never indefinitely procrastinate on things like this, but my sense is that most people do at least sometimes).

I think it’s good for proponents of RSPs to be open about the sorts of topics I’ve written about above, so they don’t get confused with e.g. proposing RSPs as a superior alternative to regulation. This post attempts to do that on my part. And to be explicit: I think regulation will be necessary to contain AI risks (RSPs alone are not enough), and should almost certainly end up stricter than what companies impose on themselves.

Strong agree. I wish ARC and Anthropic had been more clear about this, and I would be less critical of their RSP posts if they were upfront and clear about this stance. I think your post is strong and clear (you state multiple times, unambiguously, that you think regulation is necessary and that you wish the world had more political will to regulate). I appreciate this, and I'm glad you wrote this post.

I think it’d be unfortunate to try to manage the above risk by resisting attempts to build consensus around conditional pauses, if one does in fact think conditional pauses are better than the status quo. Actively fighting improvements on the status quo because they might be confused for sufficient progress feels icky to me in a way that’s hard to articulate.

A few thoughts:

  1. One reason I'm critical of the Anthropic RSP is that it does not make it clear under what conditions it would actually pause, or for how long, or under what safeguards it would determine it's OK to keep going. It is nice that they said they would run some evals at least once every 4X in effective compute and that they don't want to train catastrophe-capable models until their infosec makes it more expensive for actors to steal their models. It is nice that they said that once they get systems that are capable of producing biological weapons, they will at least write something up about what to do with AGI before they decide to just go ahead and scale to AGI. But I mostly look at the RSP and say "wow, these are some of the most bare minimum commitments I could've expected, and they don't even really tell me what a pause would look like and how they would end it."
  2.  Meanwhile, we have OpenAI (that plans to release an RSP at some point), DeepMind (rumor has it they're working on one but also that it might be very hard to get Google to endorse one), and Meta (oof). So I guess I'm sort of left thinking something like "If Anthropic's RSP is the best RSP we're going to get, then yikes, this RSP plan is not doing so well." Of course, this is just a first version, but the substance of the RSP and the way it was communicated about doesn't inspire much hope in me that future versions will be better.
  3. I think the RSP frame is wrong, and I don't want regulators to use it as a building block. My understanding is that labs are refusing to adopt an evals regime in which the burden of proof is on labs to show that scaling is safe. Given this lack of buy-in, the RSP folks concluded that the only thing left to do was to say "OK, fine, but at least please check to see if the system will imminently kill you. And if we find proof that the system is pretty clearly dangerous or about to be dangerous, then will you at least consider stopping" It seems plausible to me that governments would be willing to start with something stricter and more sensible than this "just keep going until we can prove that the model has highly dangerous capabilities" regime. 
  4. I think some improvements on the status quo can be net negative because they either (a) cement in an incorrect frame or (b) take a limited window of political will/attention and steer it toward something weaker than what would've happened if people had pushed for something stronger. For example, I think the UK government is currently looking around for substantive stuff to show their constituents (and themselves) that they are doing something serious about AI. If companies give them a milktoast solution that allows them to say "look, we did the responsible thing!", it seems quite plausible to me that we actually end up in a worse world than if the AIS community had rallied behind something stronger. 
  5. If everyone communicating about RSPs was clear that they don't want it to be seen as sufficient, that would be great. In practice, that's not what I see happening. Anthropic's RSP largely seems devoted to signaling that Anthropic is great, safe, credible, and trustworthy. Paul's recent post is nuanced, but I don't think the "RSPs are not sufficient" frame was sufficiently emphasized (perhaps partly because he thinks RSPs could lead to a 10x reduction in risk, which seems crazy to me, and if he goes around saying that to policymakers, I expect them to hear something like "this is a good plan that would sufficiently reduce risks"). ARC's post tries to sell RSPs as a pragmatic middle ground and IMO pretty clearly does not emphasize (or even mention?) some sort of "these are not sufficient" message. Finally, the name itself sounds like it came out of a propaganda department– "hey, governments, look, we can scale responsibly". 
  6. At minimum, I hope that RSPs get renamed, and that those communicating about RSPs are more careful to avoid giving off the impression that RSPs are sufficient.
  7. More ambitiously, I hope that folks working on RSPs seriously consider whether or not this is the best thing to be working on or advocating for. My impression is that this plan made more sense when it was less clear that the Overton Window was going to blow open, Bengio/Hinton would enter the fray, journalists and the public would be fairly sympathetic, Rishi Sunak would host an xrisk summit, Blumenthal would run hearings about xrisk, etc. I think everyone working on RSPs should spend at least a few hours taking seriously the possibility that the AIS community could be advocating for stronger policy proposals and getting out of the "we can't do anything until we literally have proof that the model is imminently dangerous" frame. To be clear, I think some people who do this reflection will conclude that they ought to keep making marginal progress on RSPs. I would be surprised if the current allocation of community talent/resources was correct, though, and I think on the margin more people should be doing things like CAIP & Conjecture, and fewer people should be doing things like RSPs. (Note that CAIP & Conjecture both impt flaws/limitations– and I think this partly has to do with the fact that so much top community talent has been funneled into RSPs/labs relative to advocacy/outreach/outside game).
Load more