Hide table of contents

TLDR

We are creating a database to collect base rates for various categories of events. You can find the database here and can suggest new base rate categories for us to look into here.

Project Summary

The base rate database project collects base rates for different categories of events and makes them available to researchers, forecasters and philanthropic organisations. Its main goals are to develop better intuitions about the potential and limitations of reference class forecasting and to provide useful information to the public. The data will enable research that enhances our understanding of the kinds of circumstances in which reference forecasting is a promising approach, what kinds of methods of reference forecasting work best, how to construct reasonable reference classes, and what potential caveats and pitfalls are. In addition to the raw data we will collect qualitative feedback on individual reference classes and on the overall process of building a base rate database, adding context to the data and developing comprehensive knowledge to build upon in the future. We aim to select categories of base rates in a way that makes the information we collect useful to decision makers and philanthropic organisations. 

 

Introduction

If one wants to predict whether some event will happen in the future, it is often helpful to look at the past. One can ask: "Ignoring all the specifics of the current event I'm trying to predict, what would I predict just by looking at the base rate of similar events happening in the past?". This is called reference class forecasting and helps forecasters to obtain an 'outside view' on the forecasting question at hand. This outside view, of course, is usually complemented by the 'inside view': what are the specifics of the current event at hand that distinguish it from other events? 

Reference class forecasting is widely used among forecasters. To this date, however, there has been little systematic research done into how effective base rates are for forecasting future events, how they can best be used and what limitations apply. We aim to facilitate this research. 

 

Project outline

Goal

The main goal of this project is to develop a better understanding of the merits and limitations of reference class forecasting. 

A secondary goal is to collect information that may be useful for forecasters and EA stakeholders in the future. 

 

What we'll do

We want to achieve our goals by

  • asking experienced forecasters to compile a public database with base rates for various categories of events
  • collecting qualitative feedback on the process of collecting base rates, as well as the base rates themselves
  • using the database to conduct and facilitate quantitative and qualitative research, especially with regards to the performance of various reference class forecasting approaches 
  • inviting others (you!) to suggest base rate categories that we should look into through this form

 

Categories that we want to look into

We intend to look into categories as diverse as 

  • Violent and non-violent protests that have (or have not) led to regime change
  • Elections with small margins of victory
  • Zoonotic spillover events
  • Development of new antibiotics
  • ... 

You can find a list of all the categories on our radar here. You can suggest new categories here

 

Specific research questions

The database is meant to be a resource for anyone who is interested in reference class forecasting. Please do feel free to use it for your own research as well as to reach out to us. 

So far, we have thought of the following quantitative analyses we think may be promising: 

  • Comparison of the predictive performance of several reference forecasting approaches, for example:
    • Naive Laplace's rule with different priors (uniform, Jeffrey, Haldane)
    • Time invariant Laplace with different ways of treating the exponent
  • Analysing how useful reference forecasting is overall, for example by
    • constructing a reference class forecast based on the first x observations and scoring the forecast based on the last (n-x) observations
    • arriving at an estimate for the robustness of reference class forecasting by obtaining a distribution of scores for different approaches across different base rate categories  
    • checking how robust forecasts / estimates are to changes in the observation period / the number of data points used. 
    • specifically investigating the relationship between accuracy and the number of data points available by constructing a forecast based on the first X data points and subsequently adding more data points to check consistency. The distribution of robustness would itself provide a base rate for how useful base rates are. 
    • Identifying patterns that make a base rate useful or less useful (e.g. if there is a dynamic over time, simply looking at the base rate may not be enough)

We also aim to obtain a better qualitative understanding of reference class forecasting by asking that forecasters who collect the base rates to reflect on the process as well as the individual base rate categories, for example

  • How clear are criteria for inclusion / exclusion and the period that was looked at
  • How trustworthy is the data?
  • Are there any trends that can be identified? 
  • General thoughts / lessons learned

How you can help

Suggesting new categories

You can suggest new categories to include in the database here. Suggested categories should ideally be at least one of the following: 

  • helpful / useful / interesting
  • easy to collect

Providing feedback

If you have thoughts on anything presented here, please let us know in the comments or get in touch directly.  

 

 

 

Comments8


Sorted by Click to highlight new comments since:

Hey!

FYI, this pattern matches to an elegant EA meta project that sometimes goes wrong. Forecasting is not my domain and I have no object level opinions, just pointing out the potential failure mode just in case.

Upvoted, and I hope this goes well!

FWIW I'm one of the future users of this project and regularly chatting to this team.

My use case is for research, eg validating this approach with empirical data .

I expect this database will be useful in the future as a benchmark to test similar approaches, and the program probably justifies its (low) costs in those grounds alone.

Nice! I defer to your opinion

Love the idea of this! You could potentially turn the main table into a pivot table, to make it easier to filter for things 

Do you mean have the table be in a long-format rather than a wide-format? 

There is an inherent aim in this project to bring to surface salient data that can be recombined and organized for a radically new information-based layer of decision making

But there are some pros and cons to crowd source, you get stability & good work (wikipedia) but your signal-to-noise is bounded by the leaders or organizing principles of legitimacy. there will be factions that develop trying to skew data with some bias, making the overall project less legitimate even though there's transparency.

what is the target size of individuals working on this project?
is there a definite or indefinite timeline or termination of this project?
what can be learned about governance structure to isolate the distractions and optimize the bandwidth of the participants?

Currently there are 4 people (including me) working on the project. I focus on coordination, the other three are professional forecasters and focus on the data collection. At the moment we're aiming for wide feedback from anyone who would be interested in certain base rates, but we're not actively crowd-sourcing the collection process. 

[comment deleted]1
0
0
Curated and popular this week
Paul Present
 ·  · 28m read
 · 
Note: I am not a malaria expert. This is my best-faith attempt at answering a question that was bothering me, but this field is a large and complex field, and I’ve almost certainly misunderstood something somewhere along the way. Summary While the world made incredible progress in reducing malaria cases from 2000 to 2015, the past 10 years have seen malaria cases stop declining and start rising. I investigated potential reasons behind this increase through reading the existing literature and looking at publicly available data, and I identified three key factors explaining the rise: 1. Population Growth: Africa's population has increased by approximately 75% since 2000. This alone explains most of the increase in absolute case numbers, while cases per capita have remained relatively flat since 2015. 2. Stagnant Funding: After rapid growth starting in 2000, funding for malaria prevention plateaued around 2010. 3. Insecticide Resistance: Mosquitoes have become increasingly resistant to the insecticides used in bednets over the past 20 years. This has made older models of bednets less effective, although they still have some effect. Newer models of bednets developed in response to insecticide resistance are more effective but still not widely deployed.  I very crudely estimate that without any of these factors, there would be 55% fewer malaria cases in the world than what we see today. I think all three of these factors are roughly equally important in explaining the difference.  Alternative explanations like removal of PFAS, climate change, or invasive mosquito species don't appear to be major contributors.  Overall this investigation made me more convinced that bednets are an effective global health intervention.  Introduction In 2015, malaria rates were down, and EAs were celebrating. Giving What We Can posted this incredible gif showing the decrease in malaria cases across Africa since 2000: Giving What We Can said that > The reduction in malaria has be
LewisBollard
 ·  · 8m read
 · 
> How the dismal science can help us end the dismal treatment of farm animals By Martin Gould ---------------------------------------- Note: This post was crossposted from the Open Philanthropy Farm Animal Welfare Research Newsletter by the Forum team, with the author's permission. The author may not see or respond to comments on this post. ---------------------------------------- This year we’ll be sharing a few notes from my colleagues on their areas of expertise. The first is from Martin. I’ll be back next month. - Lewis In 2024, Denmark announced plans to introduce the world’s first carbon tax on cow, sheep, and pig farming. Climate advocates celebrated, but animal advocates should be much more cautious. When Denmark’s Aarhus municipality tested a similar tax in 2022, beef purchases dropped by 40% while demand for chicken and pork increased. Beef is the most emissions-intensive meat, so carbon taxes hit it hardest — and Denmark’s policies don’t even cover chicken or fish. When the price of beef rises, consumers mostly shift to other meats like chicken. And replacing beef with chicken means more animals suffer in worse conditions — about 190 chickens are needed to match the meat from one cow, and chickens are raised in much worse conditions. It may be possible to design carbon taxes which avoid this outcome; a recent paper argues that a broad carbon tax would reduce all meat production (although it omits impacts on egg or dairy production). But with cows ten times more emissions-intensive than chicken per kilogram of meat, other governments may follow Denmark’s lead — focusing taxes on the highest emitters while ignoring the welfare implications. Beef is easily the most emissions-intensive meat, but also requires the fewest animals for a given amount. The graph shows climate emissions per tonne of meat on the right-hand side, and the number of animals needed to produce a kilogram of meat on the left. The fish “lives lost” number varies significantly by
Neel Nanda
 ·  · 1m read
 · 
TL;DR Having a good research track record is some evidence of good big-picture takes, but it's weak evidence. Strategic thinking is hard, and requires different skills. But people often conflate these skills, leading to excessive deference to researchers in the field, without evidence that that person is good at strategic thinking specifically. I certainly try to have good strategic takes, but it's hard, and you shouldn't assume I succeed! Introduction I often find myself giving talks or Q&As about mechanistic interpretability research. But inevitably, I'll get questions about the big picture: "What's the theory of change for interpretability?", "Is this really going to help with alignment?", "Does any of this matter if we can’t ensure all labs take alignment seriously?". And I think people take my answers to these way too seriously. These are great questions, and I'm happy to try answering them. But I've noticed a bit of a pathology: people seem to assume that because I'm (hopefully!) good at the research, I'm automatically well-qualified to answer these broader strategic questions. I think this is a mistake, a form of undue deference that is both incorrect and unhelpful. I certainly try to have good strategic takes, and I think this makes me better at my job, but this is far from sufficient. Being good at research and being good at high level strategic thinking are just fairly different skillsets! But isn’t someone being good at research strong evidence they’re also good at strategic thinking? I personally think it’s moderate evidence, but far from sufficient. One key factor is that a very hard part of strategic thinking is the lack of feedback. Your reasoning about confusing long-term factors need to extrapolate from past trends and make analogies from things you do understand better, and it can be quite hard to tell if what you're saying is complete bullshit or not. In an empirical science like mechanistic interpretability, however, you can get a lot more fe
Relevant opportunities
20
Eva
· · 1m read