Hide table of contents

Thanks to Frances Lorenz and Shiri for their feedback on a draft of this post, which is a crosspost from LessWrong with the section on other EA causes added.

tl;dr: Please fill out this short form if you might be willing to take a few small, simple actions in the next ~5 weeks that have the chance to dramatically increase funding for AI Safety through the NSF.

What is this?

The National Science Foundation (NSF) has put out a Request for Information relating to topics they will be funding in 2023 as part of their NSF Convergence Accelerator program. A group of us are working on coordinating responses to maximize chances that they'll choose AI Safety as one of their topics. This has the potential to add millions of dollars to the available grant pool for AI Safety in 2023.

Shiri Dori-Hacohen originally posted about this on the 80,000 Hours AI Safety email list. Here's an excerpt from her email which explains the situation well (some emphasis hers, some mine):

To make a long story short(ish), the responses they get to this RfI now (by Feb 28) will influence the call for proposals they put out in this program in 2023.

This RfI is really quite easy to respond to, and it could be a potentially very influential thing to propose AI Safety as a topic. It's the kind of thing that could have a disproportionate impact on the field by influencing downstream funding, which would then have a ripple effect on additional researchers learning more about AI safety and possibly shifting their work to it. This impact would last over and above any kind of research results funded by this specific call, and I sincerely believe this is one of the highest-impact actions we can take right now.

In my experience, it would be incredibly powerful to mount an orchestrated / coordinated response to this call, i.e. having multiple folks replying with distinct but related proposals. For example, I know that a large group of [redacted] grantees had mounted such a coordinated response a couple of years ago in response to this exact RfI, and that was what led the NSF to pick disinformation as one of the two topics for the 2021 Convergence call, leading to $9M in federal funding (including my own research!) -- and many many additional funding opportunities downstream for the NSF grantees. 

Even if there was a relatively small probability of success for this particular call, the outsized impact of success would make the expected value of our actions quite sizable. Furthermore, the program managers reading the responses to these calls have incredible influence on the field, so even if we "fail" in setting the topic for 2023, but nonetheless manage to slightly shift the opinion of the PMs and inclining their perspective towards viewing AI safety as important, that could still have a downstream positive impact on the acceptance of this subfield.

Could this backfire?

Some people in the AI alignment community have raised concerns about how talking to governments about AI existential risk could do more harm than good. For example, in the "Discussion with Eliezer Yudkowsky on AGI interventions" Alignment Forum post on Nov 21, 2021, Eliezer said:

Maybe some of the natsec people can be grownups in the room and explain why "stealing AGI code and running it" is as bad as "full nuclear launch" to their foreign counterparts in a realistic way. Maybe more current AGI groups can be persuaded to go closed; or, if more than one has an AGI, to coordinate with each other and not rush into an arms race. I'm not sure I believe these things can be done in real life, but it seems understandable to me how I'd go about trying - though, please do talk with me a lot more before trying anything like this, because it's easy for me to see how attempts could backfire

This is a valid concern in general, but it doesn't pertain to the present NSF RfI. The actions we're taking here are targeted at expanding grant opportunities for AI Safety through the NSF. They are unlikely to have any direct impact on US policies or regulations. Also, the NSF has a reputation of being quite nuanced and thoughtful in its treatment of research challenges.

What about other EA causes?

This effort is limited in scope to coordinating a response to the NSF's RfI to promote AI Safety. But someone could essentially copy this post and the form below and change some words and adapt it to another prominent effective altruist cause such as Biosecurity. Then they could follow what we're doing in the next few weeks for AI Safety and adapt the actions to that cause/topic.

If we could get the NSF to prioritize both AI Safety and Biosecurity in their grants for 2023, that would be all the better. If they chose either one as a priority topic, that would be much better than neither.

There are probably some other causes besides AI Safety and Biosecurity that could be worth doing this for. Maybe people can mention them in the comments below. Though it's hard for me to imagine certain prominent EA causes such as animal welfare being a fit for the NSF, because that's generally taken more of a moral issue than a scientific one (even though certain areas of scientific research could make a large positive impact for animal welfare).

What actions do I take?

If you're interested in helping out with this, all you have to do right now is fill out this short form so that we can follow up with you:

https://airtable.com/shrk0bAxm0EeJbyPC

Then, over the next several weeks before the NSF's RfI deadline (Feb 28), we'll ask you to take a few quick, coordinated actions to help us make the best case we can to the NSF on why AI Safety should be prioritized as a funding area for their 2023 Convergence Accelerator.

20

0
0

Reactions

0
0

More posts like this

Comments7
Sorted by Click to highlight new comments since: Today at 8:37 PM

How high quality do you think the grants the NSF will make would be?

Right now there is a very large amount of EA money available for AI safety research, between at least four different major groups. Each makes use of well connected domain experts to solicit and evaluate grants, and have application processes designed to be easy for applicants to fill in and then distribute the funds in a rapid fashion. Awards can be predicted sufficiently confidently that it is possible to create a career from a single one of these funders. However, the pool of good applicants is extremely small - all the organisations have struggled to spend this money effectively, and actively look for new ways of spending money faster.

In contrast I would be pessimistic about the quality of potential NSF grants. My concern is that, while we might be able to influence them to fund something called 'AI safety', it would not actually be that closely related to the thing we care about. Chosen grant reviewers could have many prestigious qualifications but lack a deep understanding of the problem. NSF grants can, I understand, take a long time to apply to, and the evaluation is also very slow - taking over six months. Even then success is not assured. So it's possible a lot of high quality safety researchers would prefer dedicated EA funding anyway. Who does this leave getting NSF grants? I worry it would be largely existing academics in related fields who are able to reframe their research as corresponding to 'safety', thereby diluting the field, without significantly contributing to the project - similar to 'AI Ethics'.

I do agree that NSF funding could help significantly raise the prestige of the field, but I would want to be confident in how high fidelity the idea transmission to grantmakers would be before promoting the idea, and I'm not sure how we can ensure this degree of alignment.

I agree that we need to care with high fidelity idea transmission, and there is some risk of diluting the field. But I think the reasonable chance of this spurring some more good research in AI safety is worth it, even if there will also be some wasted money.

One thing that's interesting in the RfI is that it links to something called THE NATIONAL ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT STRATEGIC PLAN: 2019 UPDATE. This PDF outlines a federal committee's strategic plan for dealing with AI. Strategy #4 is Ensure the Safety and Security of AI Systems and they are saying a lot of the "right things". For example, it includes discussion of emergent behavior, goal misspecification, explainability/transparency and long-term AI safety and value-alignment. Whether this will help translate into useful actions isn't certain, but it's heartening to see some acknowledgment of AI concerns from the US government besides just "develop it before China does".

As for the current funding situation in the EA/AI risk community, I have also heard about this issue of there being too much funding for not enough good researchers/applicants right now. I don't think we should get to used to this dynamic though. The situation could easily reverse in a short time if awareness about AI risk causes a wave of new research interest, or if 80,000 Hours, AGI Safety Fundamentals Curriculum, AI Safety Camp and related programs are able to introduce more people into the field. So just because we have a funding glut now doesn't mean we should assume that will continue through 2023 which is the time period that this NSF RfI pertains to.

I sadly don't have time to explain all of my models here, but I do want to publicly state that I think the NSF getting involved here is likely net-negative for the world, given the abundance of funding in the space, and people's bad experiences interfacing with governmentally-funded research. I don't currently assign any substantial probability to our funding abundance changing over the next few years, also since funding seems to be growing substantially faster than interested and deployable talent.

I hopefully can get back to this thread in a few days when I have more time to explain my models, but for now it seemed good to at least share the outputs of my models.

Habryka, I appreciate you sharing your outputs. Do you have a few minutes to follow up with a little explanation of your models yet? It's ok if it's a rough/incomplete explanation. But it would help to know a bit more about what you've seen with government-funded research etc.  that makes you think this would be net-negative for the world.

Alas, sorry, I do think it would take me a while to write things up in any comprehensive way, and sadly I've been sick the last few days and so ended up falling behind a number of other commitments. 

Here is a very very rough outline: 

  • There really is already a lot of money in the space. Indeed, enough money that even substantial contributions from the NSF are unlikely to increase funding in any substantial proportion.
  • I've talked to multiple people at various organizations in EA and AI Alignment over the years who accepted NSF and other government funding over the years, and I think they regretted it in every instance, and found the experience very strongly distorting on the quality and alignment of the research.
  • I think there are indeed multiple fields that ended up derailed by actors like the NSF entering them, and then strongly distorting the incentives of the field. Nanotechnology for example I think ended up derailed in this kind of way, and there are a number of other subfields that I studied that seemed kind of similar.
  • I also expect the NSF getting involved will attract a number of adversarial actors that I expect will be quite distracting and potentially disruptive.
  • There are some more complicated feelings I have about having high-prestige research that deals with the potential negative consequences of AGI being net-negative, by increasing the probability of arms-races towards AGI. E.g. it's pretty plausible to me that publishing Superintelligence was quite bad for the world. I don't have super settled thoughts here, and am still quite confused, but I think it's an important dimension to think about, and  I think it adds some downside risk to this NSF situation, with relatively limited upside.

The situation could easily reverse in a short time if awareness about AI risk causes a wave of new research interest, or if 80,000 Hours, AGI Safety Fundamentals Curriculum, AI Safety Camp and related programs are able to introduce more people into the field. So just because we have a funding glut now doesn't mean we should assume that will continue through 2023 which is the time period that this NSF RfI pertains to.

Could you put some numbers around this please - e.g. how much you think we might be able to get the NSF to spend on this? I think we have a big difference in our models here; I can't think of any scenario you're thinking of where this seems plausible.

 For context, it looks like the NSF currently spends around $8.5bn a year, and this particular program was only $12.5m. It seems unlikely to me that we could get them to spend 2% of the budget ($170m) on AI safety in 2023. In contrast, if there was somehow $170m dollars of high quality grant proposals I'm pretty confident the existing EA funding system would be able to fund it all.

This might make sense if all the existing big donors suddenly decided that AI safety was not very important, so we were very short on money. But if that happens it's probably because they have become aware of compelling new arguments not to fund AI safety, in which case the decision is probably reasonable!

Judging by the voting and comments so far (both here as well as on the LessWrong crosspost), my sense is that many here support this effort, but some definitely have concerns. A few of the concerns are based in hardcore skepticism about academic research that I'm not sure are compatible with responding to the RfI. Many concerns though seems to be about this generating vague NSF grants that are in the name of AI safety but don't actually contribute to the field.

For these latter concerns, I wonder is there a way we could resolve them by limiting the scope of topics in our NSF responses or giving them enough specificity? For example, what if we convinced the NSF that all they should make grants for is mechanistic interpretability projects like the Circuits Thread. This is an area that most researchers in the alignment community seem to agree is useful, we just need a lot more people doing it to make substantial progress. And maybe there is less room to go adrift or mess up this kind of concrete and empirical research compared to some of the more theoretical research directions.

It doesn't have to be just mechanistic interpretability, but my point is, are there ways we could shape or constrain our responses to the NSF like this that would help address your concerns?

Curated and popular this week
Relevant opportunities