Hide table of contents

Summary

Open Philanthropy is launching a request for proposals to improve AI capability evaluations. We're looking to fund work on more demanding GCR-relevant benchmarks, better evaluation science, and improving third-party model access and infrastructure. 

Click here to apply

More details:

Below, we explain what we’re looking for, and why we think this work matters. 

Why capability evaluations matter

The ability to accurately evaluate AI capabilities is becoming increasingly important, for three main reasons:

1. Evaluations are key inputs to AI governance

Many current governance proposals rely heavily on knowing what AI systems can and cannot do. “If-then commitments” are one prominent example — companies agree to take specific actions (like pausing training) if their systems display certain capabilities. But for these approaches to work, we need reliable ways to measure those capabilities.

2. AI capabilities underpin key disagreements about AI risk

Many fundamental disagreements about AI risk stem from different beliefs about what AI systems can or will soon be able to do. For example, skepticism about certain loss-of-control scenarios often comes down to disagreement about whether AI systems could become effective autonomous agents with long-term planning capabilities. Better evaluations could help resolve some of these disagreements, or at least help us identify the key cruxes.

3. We need better situational awareness of what frontier models can and cannot do

Though seen some genuinely challenging, risk-relevant evaluations do exist (e.g. Cybench for AI cyberoffense capabilities, RE-Bench for AI R&D capabilities), but many crucial capabilities remain poorly measured, and benchmarks are saturating quickly. To respond appropriately, we need to understand what AI systems can and can’t do.

Three current problems with capability evaluations

Capability evaluations currently face three major challenges:

  1. Existing benchmarks for risk-relevant capabilities are inadequate. We need more demanding tests that can meaningfully evaluate frontier models' performance on tasks relevant to catastrophic risks, resist saturation even as capabilities advance, and rule in (not just rule out) serious risks.
  2. The science of capability evaluation remains underdeveloped. We don’t yet understand how many capabilities scale, the relationships between different capabilities, or how post-training enhancements will affect performance. This makes interpreting current evaluation results and predicting future results challenging.
  3. Third-party evaluators already face significant access constraints, and increasing security requirements will make access harder. Maintaining meaningful independent scrutiny will require advances in technical infrastructure, evaluation and audit protocols, and access frameworks. 

What we're looking to fund

To address these challenges, we're seeking proposals in three areas:

GCR-relevant capability evaluations for AI agents

We want to fund new evaluations that:

  1. Test agentic, risk-relevant capabilities, such as AI R&D, situational awareness, and adaptation to novel adversarial environments
  2. Are extremely challenging, ideally taking world-class experts multiple days

For more on why we think this is important, what we're looking for, and previous work we think is useful, see this section of our RFP.

Improving the science of capabilities development and evaluations

Current capability evaluations are more like snapshots than predictive tools: they tell us what models can do now, but not what they're likely to do next. We want to improve understanding of questions such as:

  1. How capabilities scale with different inputs
  2. Relationships between different capabilities
  3. Best practices for evaluation methodology

For open questions here we think are important, and past work we've found useful, see this.  

Improving third-party model access and evals infrastructure

Independent evaluations are crucial for reliably assessing AI capabilities. As the stakes get higher, we can't trust AI companies to verify their own claims. But as security requirements increase, getting meaningful external access will become harder.

We're looking for approaches to resolve the tension between security requirements and meaningful external oversight, including:

  1. Understanding necessary access requirements and how to secure them
  2. Improving evaluation infrastructure
  3. Developing verifiable auditing techniques

For open questions here we think are important, and past work we've found useful, see this.

How to engage

Even if you're not planning to apply for funding, this RFP contains many open research questions that we think are important for the field — we encourage you to read the full RFP if you're interested in capability evaluation. Consider applying if you have relevant expertise or ideas, and please share with others who might be interested. 

Anyone is eligible to apply. Applications will be open until 1st April. 

Click here to apply

37

0
2

Reactions

0
2

More posts like this

Comments3


Sorted by Click to highlight new comments since:

Flag that I didn't catch that this was an important announcement, and I think that's because it's posted by one user with initials. Hard to explicate exactly what's going on, but that made me think it was one anonymous user's reactions to an OP announcement rather than the real deal.

By contrast, the technical AIS RFP has three co-authors with full names, and I recognised them as people who work on that team. I'd guess posts with multiple full-name co-authors are more likely to be understood as important and therefore get more reach :) 

This seems to be of questionable effectiveness. Brief answers/challenges: 

Evaluations are key input to ineffective governance. The safety frameworks presented by the frontier labs are "safety-washing", more appropriately considered roadmaps towards an unsurvivable future.

Disagreement on AI capabilities underpin performative disagreements on AI Risk. As far as I know, there's no recent published substantial such disagreement - I'd like sources for your claim, please.  

We don't need more situational awareness of what current frontier models can and cannot do in order to respond appropriately. No decision-relevant conclusions can be drawn from evaluations in the style of Cybench and Re-Bench. 

Hi Søren, 

Thanks for commenting. Some quick responses:

> The safety frameworks presented by the frontier labs are "safety-washing", more appropriately considered roadmaps towards an unsurvivable future

I don’t see the labs as the main audience for evaluation results, and I don’t think voluntary safety frameworks should be how deployment and safeguard decisions are made in the long-term, so I don’t think the quality of lab safety frameworks is that relevant to this RFP.

> I'd like sources for your claim, please. 

Sure, see e.g. the sources linked to in our RFP for this claim: What Are the Real Questions in AI? and What the AI debate is really about.

I’m surprised you think the disagreements are “performative” – in my experience, many sceptics of GCRs from AI really do sincerely hold their beliefs.

> No decision-relevant conclusions can be drawn from evaluations in the style of Cybench and Re-Bench.

I think Cybench and RE-Bench are useful, if imperfect, proxies for frontier model capabilities at cyberoffense and ML engineering respectively, and those capabilities are central to threats from cyberattacks and AI R&D. My claim isn’t that running these evals will tell you exactly what to do: it’s that these evaluations are being used as inputs into RSPs and governance proposals more broadly, and provide some evidence on the likelihood of GCRs from AI, but will need to be harder and more robust to be relied upon.

More from cb
Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Dr Kassim
 ·  · 4m read
 · 
Hey everyone, I’ve been going through the EA Introductory Program, and I have to admit some of these ideas make sense, but others leave me with more questions than answers. I’m trying to wrap my head around certain core EA principles, and the more I think about them, the more I wonder: Am I misunderstanding, or are there blind spots in EA’s approach? I’d really love to hear what others think. Maybe you can help me clarify some of my doubts. Or maybe you share the same reservations? Let’s talk. Cause Prioritization. Does It Ignore Political and Social Reality? EA focuses on doing the most good per dollar, which makes sense in theory. But does it hold up when you apply it to real world contexts especially in countries like Uganda? Take malaria prevention. It’s a top EA cause because it’s highly cost effective $5,000 can save a life through bed nets (GiveWell, 2023). But what happens when government corruption or instability disrupts these programs? The Global Fund scandal in Uganda saw $1.6 million in malaria aid mismanaged (Global Fund Audit Report, 2016). If money isn’t reaching the people it’s meant to help, is it really the best use of resources? And what about leadership changes? Policies shift unpredictably here. A national animal welfare initiative I supported lost momentum when political priorities changed. How does EA factor in these uncertainties when prioritizing causes? It feels like EA assumes a stable world where money always achieves the intended impact. But what if that’s not the world we live in? Long termism. A Luxury When the Present Is in Crisis? I get why long termists argue that future people matter. But should we really prioritize them over people suffering today? Long termism tells us that existential risks like AI could wipe out trillions of future lives. But in Uganda, we’re losing lives now—1,500+ die from rabies annually (WHO, 2021), and 41% of children suffer from stunting due to malnutrition (UNICEF, 2022). These are preventable d
 ·  · 8m read
 · 
In my past year as a grantmaker in the global health and wellbeing (GHW) meta space at Open Philanthropy, I've identified some exciting ideas that could fill existing gaps. While these initiatives have significant potential, they require more active development and support to move forward.  The ideas I think could have the highest impact are:  1. Government placements/secondments in key GHW areas (e.g. international development), and 2. Expanded (ultra) high-net-worth ([U]HNW) advising Each of these ideas needs a very specific type of leadership and/or structure. More accessible options I’m excited about — particularly for students or recent graduates — could involve virtual GHW courses or action-focused student groups.  I can’t commit to supporting any particular project based on these ideas ahead of time, because the likelihood of success would heavily depend on details (including the people leading the project). Still, I thought it would be helpful to articulate a few of the ideas I’ve been considering.  I’d love to hear your thoughts, both on these ideas and any other gaps you see in the space! Introduction I’m Mel, a Senior Program Associate at Open Philanthropy, where I lead grantmaking for the Effective Giving and Careers program[1] (you can read more about the program and our current strategy here). Throughout my time in this role, I’ve encountered great ideas, but have also noticed gaps in the space. This post shares a list of projects I’d like to see pursued, and would potentially want to support. These ideas are drawn from existing efforts in other areas (e.g., projects supported by our GCRCB team), suggestions from conversations and materials I’ve engaged with, and my general intuition. They aren’t meant to be a definitive roadmap, but rather a starting point for discussion. At the moment, I don’t have capacity to more actively explore these ideas and find the right founders for related projects. That may change, but for now, I’m interested in