Hide table of contents

This is a quickly written post listing opportunities for people to apply for funding from funders that are part of the EA community. 

Update: As of 2022, Effective Thesis is maintaining a more informative and up-to-date Airtable version of this list of EA funding opportunities. You can view that here. If you want to comment on the Airtable please click here (please note other commenters will be able to see your email address). You can also suggest new funding opportunities here.

Update #2: See also my slides on The what, why, and how of applying for EA funding.


I strongly encourage people to consider applying for one or more of these things. Given how quick applying often is and how impactful funded projects often are, applying is often worthwhile in expectation even if your odds of getting funding aren’t very high. (I think the same basic logic applies to job applications.)

I'm probably forgetting some opportunities relevant to longtermist and EA movement building work, and many opportunities relevant to other cause areas. Please comment if you know of things I’m missing! 

This post doesn't include non-EA funding opportunities that would be well-suited to EA-aligned projects, though it'd probably be useful for someone to make a separate collection of such things.

I follow the name of each funding opportunity with some text from the linked page.

I wrote this post in a personal capacity, not as a representative of any of the orgs mentioned.

See also Things I often tell people about applying to EA Funds.

Currently open Open Phil funding opportunities

Request for proposals for growing the community of people motivated to improve the long-term future

“We are seeking proposals from applicants interested in growing the community of people motivated to improve the long-term future via the kinds of projects described below.

Apply to start a new project here; express interest in helping with a project here.

Applications are open until further notice and will be assessed on a rolling basis. If we plan to stop accepting applications, we will indicate it on this page at least a month ahead of time.

See this post for additional details about our thinking on these projects.”

Open Philanthropy Undergraduate Scholarship

“Apply here (see below for details regarding application deadlines).

This program aims to provide support for highly promising and altruistically-minded students who are hoping to start an undergraduate degree at one of the top universities in the USA or UK (see below for details) and who do not qualify as domestic students at these institutions for the purposes of admission and financial aid.”

Open Philanthropy Course Development Grants  

“This program aims to provide grant support to academics for the development of new university courses (including online courses). At present, we are looking to fund the development of courses on a range of topics that are relevant to certain areas of Open Philanthropy’s grantmaking that form part of our work to improve the long-term future (potential risks from advanced AI, biosecurity and pandemic preparedness, other global catastrophic risks), or to issues that are of cross-cutting relevance to our work. We are primarily looking to fund the development of new courses, but we are also accepting proposals from applicants who are looking for funding to turn courses they have already taught in an in-person setting into freely-available online courses.

Applications are open until further notice and will be assessed on a rolling basis.


Early-career funding for individuals interested in improving the long-term future 

“This program aims to provide support – primarily in the form of funding for graduate study, but also for other types of one-off career capital-building activities – for early-career individuals who want to pursue careers that help improve the long-term future1 and who don’t qualify for our existing program focused on careers related to biosecurity and pandemic preparedness.

Apply here.

Applications are open until further notice and will be assessed on a rolling basis.

Generally speaking, we aim to review proposals within at most 6 weeks of receiving them, although this may not prove possible for all applications. Candidates who require more timely decisions can indicate this in their application forms, and we may be able to expedite the decision process in such cases."

Open Philanthropy Biosecurity Scholarships

“This program aims to provide flexible support for a small group of people early in their careers to pursue work and study related to global catastrophic biological risks (GCBRs), events in which biological agents could lead to sudden, extraordinary, and widespread disaster. Our goal is to reduce risks to humanity’s long-run future, and this opportunity is aimed at people whose chief interest is GCBRs as they relate to the impact on the very long-run future.

Applications are due here by January 1st, 2022, at 11.59 p.m. Pacific Time. We will review applications and make decisions on a rolling basis.”

(They also previously provided a similar batch of funding: ​​Early-Career Funding for Global Catastrophic Biological Risks — Scholarship Support (2018).)

The Open Phil AI Fellowship

“The Open Phil AI Fellowship is a fellowship for full-time PhD students focused on artificial intelligence or machine learning.

Applications are due by Friday, October 29, 2021, 11:59 PM Pacific time. Letters of recommendation are due exactly one week later, on Friday, November 5, at 11:59 PM Pacific time. Click the button below to submit your application:


Please ask your recommenders to submit letters of recommendation using this form:


With this program, we seek to fully support a small group of the most promising PhD students in AI and ML who are interested in research that makes it less likely that advanced AI systems pose a global catastrophic risk. Fellows receive a $40,000 stipend, $10,000 in research support, and payment of tuition and fees, each year, starting in the year of their selection until the end of the 5th year of their PhD.

Decisions will be sent out before March 31, 2022.

If you have questions or concerns, please email aifellowship@openphilanthropy.org.

Read on for more information about the Open Phil AI Fellowship.” 

Request for proposals for projects in AI alignment that work with deep learning systems

“As part of our work on reducing potential risks from advanced artificial intelligence, we are seeking proposals for projects working with deep learning systems that could help us understand and make progress on AI alignment: the problem of creating AI systems more capable than their designers that robustly try to do what their designers intended. We are interested in proposals that fit within certain research directions, described below, that we think could contribute to reducing the risks we are most concerned about.

Anyone is eligible to apply, including those working in academia, industry, or independently. Applicants are invited to submit proposals for up to $1M in total funding covering up to 2 years. We may invite grantees who do outstanding work to apply for larger and longer grants in the future.

Proposals are due January 10, 2022.


If you have any questions, please contact ai-alignment-rfp@openphilanthropy.org.”

Currently open funding opportunities that aren’t from Open Phil

Recall that this is not exhaustive, and that I welcome comments mentioning things I missed.

Apply for Funding from EA Funds

“If you have a project you think will improve the world, and it seems like a good fit for one of our Funds, we encourage you to apply.

Grant sizes are typically between $5,000 and $100,000, but can be as low as $1,000 and higher than $300,000. EA Funds can make grants to individuals, non-profit organizations, academic institutions, and other entities. You do not need to be based in the US or the UK to apply for a grant. If you are unsure whether you are eligible to apply for a grant, please email funds@effectivealtruism.org.

We sometimes meet people who did not apply because they thought they would not be funded. Some of them eventually applied and were funded, despite their doubts, because we were excited by their projects. Applying is fast and easy; we really do encourage it!

EA Funds is always open to applications.

EA Funds will consider funding applications from grantseekers who wish to remain anonymous in public reporting.

You can also suggest that we give money to other people, or let us know about ideas for how we could spend our money. Suggest a grant.”

Survival and Flourishing (SAF) and SFF

“Survival and Flourishing (SAF; /sæf/) is a newly formed Sponsored Project of Social and Environmental Entrepreneurs, a 501(c)(3) public charity (proof of sponsorship; proof of charity status). SAF’s mission is to secure funding and fiscal sponsorship for projects that will benefit the long-term survival and flourishing of sentient life, including but not limited to humans.

SAF works closely with the Survival and Flourishing Fund (SFF), a donor advised fund with a similar mission and overlapping leadership. While we share no formal relationship with SFF, SAF and SFF have complementary functions:

  • SAF as a general rule does not make grants to 501(c)(3) public charities, while SFF does, and
  • SFF as a general rule does not make grants to individuals, while SAF does.”

"Sign up to our newletter to be notified of future funded project rounds!"

Various Future of Life funding opportunities

“Emerging technologies have the potential to help life flourish like never before – or self-destruct. The Future of Life Institute is delighted to announce a $25M multi-year grant program aimed at tipping the balance toward flourishing, away from extinction. This is made possible by the generosity of cryptocurrency pioneer Vitalik Buterin and the Shiba Inu community.

COVID-19 showed that our civilization is fragile, and can handle risk better when planning ahead. Our grants are for those who have taken these lessons to heart, wish to study the risks from ever more powerful technologies, and develop strategies for reducing them. The goal is to help humanity win the wisdom race: the race between the growing power of our technology and the wisdom with which we manage it.

We are excited to offer a range of grant opportunities within the areas of AI Existential Safety, Policy/Advocacy and Behavioral Science. 

Our AI Existential Safety Program is launching first. Applications for PhD and Postdoctoral Fellowships are being accepted in the fall of 2021. We are working to build a global community of AI Safety researchers who are keen to ensure that AI remains safe and beneficial to humanity. You can see who is already part of the community on our website here.

Additional programs will be rolling out later. Please subscribe to our newsletter and follow us on twitter!

Grantmaking – Center on Long-Term Risk 

We have a dedicated fund to support promising projects and individuals. The Center on Long-Term Risk Fund (CLR Fund) operates in line with our mission to build a global community of researchers and professionals working to do the most good in terms of reducing suffering.

Apply for funding"

Sorted by Click to highlight new comments since:

Can you put something on here to the effect of: "Eliezer Yudkowsky continues to claim that anybody who comes to him with a really good AGI alignment idea can and will be funded."

I'm finding this difficult to interpret - I can't find a way of phrasing my question without it seeming snarky but this isn't intended.

One reading of this offer looks something like:

if you have an idea which may enable some progress, it's really important that you be able to try and I'll get you the funding to make sure you do

Another version of this offer looks more like:

I expect basically never to have to pay out because almost all ideas in the space are useless, but if you can convince me yours is the one thing that isn't useless I guess I'll get you the money.

I guess maybe a way of making this concrete would be:

-have you paid out on this so far, if so, can you say what for?

-if not can you point to any existing work which you would have funded if someone had approached you asking for funding to try it?

Eliezer gave some more color on this here:

This is your regular reminder that, if I believe there is any hope whatsoever in your work for AGI alignment, I think I can make sure you get funded. It's a high bar relative to the average work that gets proposed and pursued, and an impossible bar relative to proposals from various enthusiasts who haven't understood technical basics or where I think the difficulty lies. But if I think there's any shred of hope in your work, I am not okay with money being your blocking point. It's not as if there are better things to do with money.


There might be more discussion in the thread.

I interpreted it as the former fwiw. Skimming his FB timeline, Eliezer has recently spoken positively of Redwood Research, and in the past about Chris Olah's work on interpretability. 

Should GiveWell's incubation grants be listed?

(And there are other adjacent programmes like Evidence Action.)

What about Charity Entrepreneurship?

AI Safety Impact Markets

Description provided to me by one of the organizers: 

This is a public platform for AI safety projects where funders can find you. You shop around for donations from donors that already have a high donor score on the platform, and their donations will signal-boost your project so that more donors and funders will see it. 

The AI Safety Fundamentals opportunities board, filtered for "funding" as the opportunity type, is probably also useful. 

$20 Million in NSF Grants for Safety Research

After a year of negotiation, the NSF has announced a $20 million request for proposals for empirical AI safety research.

Here is the detailed program description.

The request for proposals is broad, as is common for NSF RfPs. Many safety avenues, such as transparency and anomaly detection, are in scope

Thanks for making this. Did you consider making this into an Airtable? It could also be a Google spreadsheet, but I think an Airtable would work better. 

An Airtable would be slightly easier to manage and update over time than a post, and it would also be easier to filter and scan through (i.e. if you had columns for cause areas, usual grant amounts, and application deadlines)

Yeah, I think complementing this with an Airtable would indeed be handy, and I'd be in favour of someone making such an Airtable based on this post (and then maybe giving me edit access as well, so I can help maintain it) :)

I've already started doing this. Will get in contact with you.

Thanks again for doing that! 

Just in case other commenters were wondering: JJ usefully started this, and then we all mutually agreed to have Effective Thesis take over maintenance of the Airtable, so I've now added to the top of the post an update linking to the latest version of the Airtable. 

I'll add suggested things to the post itself if people provide me with full text I can directly copy in (like the name with a link to the relevant page, followed by a summary). Otherwise I'll let the comments cover things, to save myself time. 

Also, JJ Hepburn has now created an Airtable with similar info, which you can view the outputs of here. That currently complements this post, and could supersede this post if someone takes ownership of adding things there, updating the info, and ironing out potential glitches. Please contact me if you're interested in doing that.

CEEALAR:  "We make grants to individuals and charities in the form of providing free or subsidised serviced accommodation and board, and a moderate stipend for other living expenses, at our hotel in Blackpool, UK."

I'd recommend putting the airtable at the top of your post to make it the schelling point

New opportunity: Announcing the Clearer Thinking Regrants program

Do you have an idea for a project, or run an existing project, startup, or organization that could one day have a big positive impact on the future of the world? Apply now to our brand new Clearer Thinking Regrants program!

We plan to award grants to around 20 selected altruistic projects (depending on the quality and relevance of submissions we receive).

Grants will be a minimum of $10,000 per project, up to a conceivable maximum of $500,000. These projects don’t have to relate to Clearer Thinking’s mission, they just have to be aimed at improving the future!

We have designed the first round of the application to be completed in a single sitting of just 20 minutes. 

If you know of any projects, non-profits, or startups that aim to improve the future of the world, regardless of which stage they’re in, please share the application form with them. 

We want to hear about as many great projects as we can! Applications are now open! Apply by 11:59pm Eastern Time July 15th, 2022.

Apply now for funding (designed to take ~20 minutes)

What about individual Earning To Givers?

Is there some central place where all the people doing Earning To Give are listed, potentially with some minimal info about their potential max grant size and the type of stuff they are happy to fund?

If not, how do ETGers usually find non-standard funding opportunities? Just personal networks?

See also this detailed breakdown of potential funding options for EA (community-building-type) groups specifically.

AI Safety Support have a list of funding opportunities. I'm pretty sure all of them are already in this post + comments section, but it's plausible that'll change in future. 

See also An Overview of the AI Safety Funding Situation for indications of some additional non-EA funding opportunities relevant to AI safety (e.g. for people doing PhDs or further academic work). 

Announcing Manifund Regrants

Manifund is launching a new regranting program! We will allocate ~$2 million over the next six months based on the recommendations of our regrantors. Grantees can apply for funding through our site; we’re also looking for additional regrantors and donors to join.

I've just now learned of www.futurefundinglist.com, which seems also relevant (though I haven't looked at it closely or tried to assess how useful it'd be to people)

Open Phil is seeking applications from grantees impacted by recent events

First part of the post:

"We (Open Phil) are seeking applications from grantees affected by the recent collapse of the FTX Future Fund (FTXFF) who fall within our long-termist focus areas (biosecurity, AI risk, and building the long-termist EA community). If you fit the following description, please fill out this application form.

We’re open to applications from:

  • Grantees who never received some or all of their committed funds from FTXFF.
  • Grantees who received funds, but want to set them aside to return to creditors or depositors.
    • We think there could be a number of complex considerations here, and we don’t yet have a clear picture of how we’ll treat requests like these. We’d encourage people to apply if in doubt, but to avoid making assumptions about whether you’ll be funded (and about what our take will end up being on what the right thing to do is for your case). (Additionally, we’re unsure if there will be legal barriers to returning funds.) That said, we’ll do our best to respond to urgent requests quickly, so you have clarity as soon as possible.
  • Grantees whose funding was otherwise affected by recent events.[1]"

SFF Speculation Grants as an expedited funding source

"Hi everyone, SFF has received numerous emails recently from organizations interested in expedited funding.  I believe a number of people here already know about SFF Speculation Grants, but since we've never actually announced our existence on the EA Forum before:

The Survival and Flourishing Fund has a means of expediting funding requests at any time of year, via applications to our Speculation Grants program:


SFF Speculation Grants are expedited grants organized by SFF outside of our biannual grant-recommendation process (the S-process). “Speculation Grantors” are volunteers with budgets to make these grants. Each Speculation Grantor’s budget grows or increases with the settlement of budget adjustments that we call “impact futures” (explained further below). Currently, we have a total of ~20 Speculation Grantors, with a combined budget of approximately $10MM (up from $4MM initially). Our process and software infrastructure for funding these grants were co-designed by Andrew Critch and Oliver Habryka.

For instructions on how to apply, please visit the link above.

For general information about the Survival and Flourishing Fund, see:


Nonlinear Support Fund: Productivity grants for people working in AI safety

Get up to $10,000 a year for therapy, coaching, consulting, tutoring, education, or childcare


You automatically qualify for up to $10,000 a year if:

  • You work full time on something helping with AI safety
    • Technical research
    • Governance
    • Graduate studies
    • Meta (>30% of beneficiaries must work in AI safety)
  • You or your organization received >$40,000 of funding to do the above work from any one of these funders in the last 365 days
  • Your organization does not pay for these services already
  • (Only if you're applying for therapy or child care) You make less than $100,000. There are no income qualifiers for any other services we provide.

As long as you meet these criteria, you will qualify. Funding is given out in the order the applications were received. (For more details about how the fund works and why we made it read here)

What services can you apply for?

  • Therapy (only if you make less than $100,000)
  • Coaching
  • Consultants* (e.g. management, operations, IT, marketing, etc)
  • Childcare (only if you make less than $100,000)
  • Tutors (e.g. ML, CS, English, etc) 
  • Meditation classes 
  • Mental health apps
  • Books (either for your mental health or that's relevant to your work)
  • Anything educational that helps you do better at your work (e.g. classes, books, workshops, etc)

*Consultant can mean a lot of things. In the context of the Nonlinear Support Fund, it refers to people who give you advice on how to do better at your work. It does not refer to people who are more like contract hires, who actually go and do the work for you. 

Another opportunity: Amplify creative grants

Some info from that post:

We're announcing a small grants program for creative media, supported by Hear This Idea and the FTX Future Fund regranting program.


Over the next few months, we’re aiming to make grants to podcasts and other creative media projects that spread ideas to help humanity navigate this century.

We’re interested in applications that look to cover topics related to (i) reducing existential risk, (ii) helping fix pressing global problems, and (iii) putting humanity on a positive long-term trajectory. More details on all three below.

We want to amplify ideas associated with effective altruism and longtermism, but you don’t need to be actively engaged with effective altruism to apply.

We’re excited to support projects (new and existing) in English and other languages — sharing important ideas with new, global, audiences.

If you are unsure whether your idea fits among the areas we outline below, please lean toward applying— we want to hear your ideas! The form is a single page of questions, and should easily take less than an hour. The first batch of grants will be approved around three weeks after this announcement. After that, we’ll approve grants on a rolling basis. Let us know if you have a deadline you need to hear back by, and we’ll do our best to accommodate and get back very quickly if necessary. If you have any questions which aren’t answered below, please contact us on grants@hearthisidea.com.

You can apply now by following this link.


October 22nd update: we've been really impressed with the number and quality of applications over the last few weeks; enough to disburse all of our initial pot to the (successful) applications we have already received. As such, we will no longer be actively considering new applications for the first set of grants. However, we will keep the application form open as a way to express interest in case we renew the program. Thanks for your understanding!

Adjacent opportunity: grants from Scott Alexander / Astral Codex Ten https://astralcodexten.substack.com/p/apply-for-an-acx-grant 

Also the adjacent Emergent Ventures grants https://www.mercatus.org/emergent-ventures 

What follows was previously a section of this post, but I've moved it into the comments instead to keep the post more focused on the most useful content.

Previously open Open Phil funding opportunities

One reason I’m compiling these is that I imagine some might be run again in future. But that’s just a guess. 

But note that, in any case, people interested in these funding opportunities may find that one of the currently open opportunities suits them, such as the “Early-career funding for individuals interested in improving the long-term future” grants program.

Open Philanthropy Technology Policy Fellowship 

“Open Philanthropy is seeking applicants for a US policy fellowship program focused on high-priority emerging technologies, especially AI and biotechnology. Selected applicants will receive policy-focused training and mentorship and be supported in matching with a host organization for a full-time, fully-funded fellowship based in the Washington, DC area. Potential host organizations include executive branch offices, Congressional offices, and think tank programs.

Fellowship placements are expected to begin in early or mid-2022 and to last 6 or 12 months (depending on the fellowship category), with potential renewal for a second term. Fellowship opportunities are available for both entry-level and mid-career applicants, and for people both with and without prior policy experience.

The application deadline has now passed.”

Funding for Study and Training Related to AI Policy Careers 

“This program aims to provide flexible support for individuals who want to pursue or explore careers in AI policy1 (in industry, government, think tanks, or academia) for the purpose of positively impacting eventual societal outcomes from “transformative AI,” by which we mean potential future AI that precipitates a transition at least as significant as the industrial revolution (see here). This program is part of our grantmaking focus area related to transformative AI (explained here).

The application window is now closed.”

From the EA Groups Newsletter:

"Group Support Funding is now MUCH faster and easier! Please apply!

You can now apply for a lump sum to spend flexibly – no need to have your budget sorted in advance. Or you can apply for rapid funding to get funds within 2 or 7 days.

Group Support Funding covers event costs like food and venue hire, as well as advertising, books, software, subscriptions, and other group costs. These funds are available for all EA groups that are not already on a CEA Community Building Grant, including city groups, university groups, national groups, online groups, and specialist groups such as cause or career specific groups.

We believe most groups could usefully spend more money than they have in the past, so we encourage you to apply

Group Support Grant

The main type of funding is now the flexible Group Support Grant

You apply for a lump sum of money to pay for future expenses for up to 12 months. You don’t need to know your expenses or your total budget in advance – just apply for a lump sum and use our guidelines to help you decide how to spend it. 

You can reapply when your funding gets low, and if you don’t spend all your funds you can return the money or ask to use the funds in your next application. 

There is no specific limit on the amount of money groups can request, but we’ve set some suggested amounts, which range from 3000 USD per year for small groups, through to 12,000 USD per year for larger groups. 

The application and review processes require far less information than in the past.

Groups will get their funds within 2 weeks (US, UK), or 3 weeks (other countries), but you can request for your grant to be processed faster if needed. 

We think most groups should apply for a Group Support Grant! 

Rapid Group Support Funding

We are also offering Rapid Group Support Funding, which is for urgent requests for reimbursement or for costs that need to be paid within 1 month. You should get your funds within 2 working days (US, UK groups), or 7 working days (other countries)."

Here's more info on Open Phil's 4 requests for proposals for certain areas of technical AI safety work, copied from the Alignment Newsletter

Request for proposals for projects in AI alignment that work with deep learning systems (Nick Beckstead and Asya Bergal) (summarized by Rohin): Open Philanthropy is seeking proposals for AI safety work in four major areas related to deep learning, each of which I summarize below. Proposals are due January 10, and can seek up to $1M covering up to 2 years. Grantees may later be invited to apply for larger and longer grants.

Rohin's opinion: Overall, I like these four directions and am excited to see what comes out of them! I'll comment on specific directions below.


RFP: Measuring and forecasting risks (Jacob Steinhardt) (summarized by Rohin): Measurement and forecasting is useful for two reasons. First, it gives us empirical data that can improve our understanding and spur progress. Second, it can allow us to quantitatively compare the safety performance of different systems, which could enable the creation of safety standards. So what makes for a good measurement?

1. Relevance to AI alignment: The measurement exhibits a failure mode that becomes worse as models become larger, or tracks a potential capability that may emerge with further scale (which in turn could enable deception, hacking, resource acquisition, etc).

2. Forward-looking: The measurement helps us understand future issues, not just those that exist today. Isolated examples of a phenomenon are good if we have nothing else, but we’d much prefer to have a systematic understanding of when a phenomenon occurs and how it tends to quantitatively increase or decrease with various factors. See for example scaling laws (AN #87).

3. Rich data source: Not all trends in MNIST generalize to CIFAR-10, and not all trends in CIFAR-10 generalize to ImageNet. Measurements on data sources with rich factors of variation are more likely to give general insights.

4. Soundness and quality: This is a general category for things like “do we know that the signal isn’t overwhelmed by the noise” and “are there any reasons that the measurement might produce false positives or false negatives”.

What sorts of things might you measure?

1. As you scale up task complexity, how much do you need to scale up human-labeled data to continue to maintain good performance and avoid reward hacking? If you fail at this and there are imperfections in the reward, how bad does this become?

2. What changes do we observe based on changes in the quality of the human feedback (e.g. getting feedback from amateurs vs experts)? This could give us information about the acceptable “difference in intelligence” between a model and its supervisor.

3. What happens when models are pushed out of distribution along a factor of variation that was not varied in the pretraining data?

4. To what extent do models provide wrong or undesired outputs in contexts where they are capable of providing the right answer?

Rohin's opinion: Measurements generally seem great. One story for impact is that we have a measurement that we think is strongly correlated with x-risk, and we use that measurement to select an AI system that scores low on such a metric. This seems distinctly good and I think would in fact reduce x-risk! But I want to clarify that I don’t think it would convince me that the system was safe with high confidence. The conceptual arguments against high confidence in safety seem quite strong and not easily overcome by such measurements. (I’m thinking of objective robustness failures (AN #66) of the form “the model is trying to pursue a simple proxy, but behaves well on the training distribution until it can execute a treacherous turn”.)

You can also tell stories where the measurements reveal empirical facts that then help us have high confidence in safety, by allowing us to build better theories and arguments, which can rule out the conceptual arguments above.

Separately, these measurements are also useful as a form of legible evidence about risk to others who are more skeptical of conceptual arguments.


RFP: Techniques for enhancing human feedback (Ajeya Cotra) (summarized by Rohin): Consider a topic previously analyzed in aligning narrowly superhuman models (AN #141): how can we use human feedback to train models to do what we want in cases where the models are more knowledgeable than the humans providing the feedback? A variety of techniques have been proposed to solve this problem, including iterated amplification (AN #40), debate (AN #5), recursive reward modeling (AN #34), market making (AN #108), and generalizing from short deliberations to long deliberations. This RFP solicits proposals that aim to test these or other mechanisms on existing systems. There are a variety of ways to set up the experiments so that the models are more knowledgeable than the humans providing the feedback, for example:

1. Train a language model to accurately explain things about a field that the feedback providers are not familiar with.

2. Train an RL agent to act well in an environment where the RL agent can observe more information than the feedback providers can.

3. Train a multilingual model to translate between English and a foreign language that the feedback providers do not know.


RFP: Interpretability (Chris Olah) (summarized by Rohin): The author provides this one sentence summary: We would like to see research building towards the ability to “reverse engineer" trained neural networks into human-understandable algorithms, enabling auditors to catch unanticipated safety problems in these models.

This RFP is primarily focused on an aspirational “intermediate” goal: to fully reverse engineer some modern neural network, such as an ImageNet classifier. (Despite the ambition, it is only an “intermediate” goal because what we would eventually need is a general method for cheaply reverse engineering any neural network.) The proposed areas of research are primarily inspired by the Circuits line of work (AN #142):

1. Discovering Features and Circuits: This is the most obvious approach to the aspirational goal. We simply “turn the crank” using existing tools to study new features and circuits, and this fairly often yields an interesting result that makes progress towards reverse engineering a neural network.

2. Scaling Circuits to Larger Models: So far the largest example of reverse engineering is curve circuits, with 50K parameters. Can we find examples of structure in the neural networks that allow us to drastically reduce the amount of effort required per parameter? (As examples, see equivariance and branch specialization.)

3. Resolving Polysemanticity: One of the core building blocks of the circuits approach is to identify a neuron with a concept, so that connections between neurons can be analyzed as connections between concepts. Unfortunately, some neurons are polysemantic, that is, they encode multiple different concepts. This greatly complicates analysis of the connections and circuits between these neurons. How can we deal with this potential obstacle?

Rohin's opinion: The full RFP has many, many more points about these topics; it’s 8 pages of remarkably information-dense yet readable prose. If you’re at all interested in mechanistic interpretability, I recommend reading it in full.

This RFP also has the benefit of having the most obvious pathway to impact: if we understand what algorithm neural networks are running, there’s a much better chance that we can catch any problems that arise, especially ones in which the neural network is deliberately optimizing against us. It’s one of the few areas where nearly everyone agrees that further progress is especially valuable.


RFP: Truthful and honest AI (Owain Evans) (summarized by Rohin): This RFP outlines research projects on Truthful AI (summarized below). They fall under three main categories:

1. Increasing clarity about “truthfulness” and “honesty”. While there are some tentative definitions of these concepts, there is still more precision to be had: for example, how do we deal with statements with ambiguous meanings, or ones involving figurative language? What is the appropriate standard for robustly truthful AI? It seems too strong to require the AI system to never generate a false statement; for example it might misunderstand the meaning of a newly coined piece of jargon.

2. Creating benchmarks and tasks for Truthful AI, such as TruthfulQA (AN #165), which checks for imitative falsehoods. This is not just meant to create a metric to improve on; it may also simply perform as a measurement. For example, we could experimentally evaluate whether honesty generalizes (AN #158), or explore how much truthfulness is reduced when adding in a task-specific objective.

3. Improving the truthfulness of models, for example by finetuning models on curated datasets of truthful utterances, finetuning on human feedback, using debate (AN #5), etc.

Besides the societal benefits from truthful AI, building truthful AI systems can also help with AI alignment:

1. A truthful AI system can be used to supervise its own actions, by asking it whether its selected action was good.

2. A robustly truthful AI system could continue to do this after deployment, allowing for ongoing monitoring of the AI system.

3. Similarly, we could have a robustly truthful AI system supervise its own actions in hypothetical scenarios, to make it more robustly aligned.

Rohin's opinion: While I agree that making AI systems truthful would then enable many alignment strategies, I’m actually more interested in the methods by which we make AI systems truthful. Many of the ideas suggested in the RFP are ones that would apply to alignment more generally and aren’t particularly specific to truthful AI. So it seems like whatever techniques we used to build truthful AI could then be repurposed for alignment. In other words, I expect that the benefit to AI alignment of working on truthful AI is that it serves as a good test case for methods that aim to impose constraints upon an AI system. In this sense, it is a more challenging, larger version of the ”never describe someone getting injured” challenge (AN #166). Note that I am only talking about how this helps AI alignment; there are also beneficial effects on society from pursuing truthful AI that I haven’t talked about here.

Is the Center on Long-Term Risk still taking grant applications? Their application links to this EA Forum post which says "Please submit your application before August 12, 2019."

Yes, the CLR Fund is still accepting applications. I will see that we clarify this in the appropriate places.

I'm pretty sure - they've definitely made grants since that date (though possibly just to things that applied before then?), and they've changed the fund management team recently (which would be odd if they're not taking applications). Though I'm not totally sure. 

You can also apply for a EA web3 grant here! 



[comment deleted]2
Curated and popular this week
Relevant opportunities