Hide table of contents

The most important questions in forecasting tend to resist precise definitions — questions like, “Will we be ready for AGI when it arrives?” or “How are China’s AI capabilities trending overall?” These big-picture questions are more consequential than “How will Alibaba’s Qwen model perform on the Chatbot Arena,” but where would you even begin to operationalize a thing like AGI readiness in any rigorous way?

We’re experimenting with an idea that bypasses this issue, which we’re calling: Indexes.

An index takes a vague question, like “How ready will we be for AGI in 2030?” and gives a quantitative answer on a -100 to 100 scale. We get this quantitative answer by taking forecasts on a set of questions, identified by the index author (a person or group) as collectively pointing at a nebulous but important concept like AGI readiness, and combining them into a single dimensionless number. The index author assigns a weight to each question indicating a) how informative they are relative to one another and b) whether learning that the resolution is Yes or No should make the Index go up or down. What you see when you look at a Metaculus Index, then, is in some sense a forecast of what the Index will be when all the questions are resolved. When the index goes up, that means forecasters believe, as a whole, things are looking better for 2030 than they previously expected.

Enough theory. We’ve launched our flagship index, which you can forecast on now: AGI Readiness Index

Example AGI Readiness Index
These are the tentative questions and weights from an index we developed at The Curve. The data are fake — this is just to illustrate what an index will look like. The index value is the weighted average of its component questions.

The questions and weights in this figure are the preliminary results of a workshop we conducted at The Curve in November. We asked each participant to answer the prompt:

Assume AGI arrives in 2030. If you could ask an oracle three questions about the world right before AGI arrives, what would they be?

The rest of the workshop was spent specifying and winnowing the questions until finally we had a set of eight questions that were rated most informative by participants. There was, of course, disagreement — about how likely a question was to resolve positively (after all, if you think you know what the answer will be, that would be a waste of an oracle question), about how correlated questions were with each other (i.e. you want a set of questions that are as complementary / independent from each other as possible), and about what constituted readiness. Once we had all the questions on the table, participants had the opportunity to critique and defend each other’s questions, and finally assign weights. Each participant had ten points to distribute, at which point eight questions emerged as the key axes of readiness. Tied in first were legislation requiring compliance with RSPs, third-party auditing, and transparency; and a public list of incidents related to AI safety.

We hope you disagree with these weights! It would be a sorry index that received no push-back. What questions are missing? Our dream is for indexes to be a forum to talk about the big picture. Where do we see ourselves in 2030? Are we headed in the right direction? How would we know? At the time of writing, we have two indexes in the pipeline: AI for Public Good, in collaboration with AI Palace, and a China Capabilities Index, with the Simon Institute for Longterm Governance.

Do you have an idea for an index? Get in touch!

Credits

This project is partly inspired by the Forecasting Research Institute’s report Conditional Trees: A Method for Generating Informative Questions about Complex Topics, which presents a metric and method for identifying “high-value” forecasting questions with respect to an ultimate question. Our goal here is to curate these questions quickly, without putting the index author through the exercise of conditional forecasting and without necessarily having to strictly define the ultimate question. There’s nothing stopping the question weights in an index from being based on a quantitative metric like value of information, but they can also be less rigorous, or even completely "vibes-based."

We’ve also been heavily influenced by Cultivate Labs’ Issue Decomposition approach, used by the RAND Forecasting Initiative, in which top-level questions are broken down into drivers, sub-drivers, and finally forecasting questions aka "signals." Our indexes are one way to recompose forecasting questions into something that can inform strategy and give us a look at the overall trajectory.

Comments1


Sorted by Click to highlight new comments since:

Executive summary: The concept of "Indexes" is introduced as a method to quantify vague yet crucial forecasting questions, such as AGI readiness, by aggregating weighted answers to a curated set of sub-questions, enabling actionable insights into nebulous topics. 

Key points:

  1. Indexes aim to operationalize vague, consequential questions (e.g., AGI readiness) by providing a numerical scale (-100 to 100) based on weighted forecasts of related sub-questions.
  2. Index construction involves selecting, specifying, and weighting sub-questions deemed informative by index authors, ensuring complementary and independent insights.
  3. A flagship example, the "AGI Readiness Index," uses eight key axes such as AI legislation, transparency, and incident reporting, derived from expert workshops.
  4. Indexes are intended to provoke discussion and critique, fostering collaboration to refine questions, weights, and perspectives.
  5. Upcoming indexes include "AI for Public Good" and "China Capabilities Index," aiming to broaden the scope of big-picture insights.
  6. Inspired by methodologies like the Forecasting Research Institute’s "Conditional Trees" and Cultivate Labs’ decomposition approach, Indexes balance rigor with practical, flexible implementation.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Curated and popular this week
 ·  · 1m read
 · 
Garrison
 ·  · 7m read
 · 
This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race to build Machine Superintelligence. Consider subscribing to stay up to date with my work. Wow. The Wall Street Journal just reported that, "a consortium of investors led by Elon Musk is offering $97.4 billion to buy the nonprofit that controls OpenAI." Technically, they can't actually do that, so I'm going to assume that Musk is trying to buy all of the nonprofit's assets, which include governing control over OpenAI's for-profit, as well as all the profits above the company's profit caps. OpenAI CEO Sam Altman already tweeted, "no thank you but we will buy twitter for $9.74 billion if you want." (Musk, for his part, replied with just the word: "Swindler.") Even if Altman were willing, it's not clear if this bid could even go through. It can probably best be understood as an attempt to throw a wrench in OpenAI's ongoing plan to restructure fully into a for-profit company. To complete the transition, OpenAI needs to compensate its nonprofit for the fair market value of what it is giving up. In October, The Information reported that OpenAI was planning to give the nonprofit at least 25 percent of the new company, at the time, worth $37.5 billion. But in late January, the Financial Times reported that the nonprofit might only receive around $30 billion, "but a final price is yet to be determined." That's still a lot of money, but many experts I've spoken with think it drastically undervalues what the nonprofit is giving up. Musk has sued to block OpenAI's conversion, arguing that he would be irreparably harmed if it went through. But while Musk's suit seems unlikely to succeed, his latest gambit might significantly drive up the price OpenAI has to pay. (My guess is that Altman will still ma
 ·  · 5m read
 · 
When we built a calculator to help meat-eaters offset the animal welfare impact of their diet through donations (like carbon offsets), we didn't expect it to become one of our most effective tools for engaging new donors. In this post we explain how it works, why it seems particularly promising for increasing support for farmed animal charities, and what you can do to support this work if you think it’s worthwhile. In the comments I’ll also share our answers to some frequently asked questions and concerns some people have when thinking about the idea of an ‘animal welfare offset’. Background FarmKind is a donation platform whose mission is to support the animal movement by raising funds from the general public for some of the most effective charities working to fix factory farming. When we built our platform, we directionally estimated how much a donation to each of our recommended charities helps animals, to show users.  This also made it possible for us to calculate how much someone would need to donate to do as much good for farmed animals as their diet harms them – like carbon offsetting, but for animal welfare. So we built it. What we didn’t expect was how much something we built as a side project would capture peoples’ imaginations!  What it is and what it isn’t What it is:  * An engaging tool for bringing to life the idea that there are still ways to help farmed animals even if you’re unable/unwilling to go vegetarian/vegan. * A way to help people get a rough sense of how much they might want to give to do an amount of good that’s commensurate with the harm to farmed animals caused by their diet What it isn’t:  * A perfectly accurate crystal ball to determine how much a given individual would need to donate to exactly offset their diet. See the caveats here to understand why you shouldn’t take this (or any other charity impact estimate) literally. All models are wrong but some are useful. * A flashy piece of software (yet!). It was built as