The Rationale-Shaped Hole At The Heart Of Forecasting

dschwarz; Peter Mühlbacher; Lawrence Phillips; FutureSearch; Daniel Hnyk

Comments 15

Sorted by

New & upvoted

The trust point above, that demand for rationales is partly distrust of the numbers, is the one I'd build on. A lot of that demand isn't really a request for more words. It's a way of asking "should I believe this wasn't reverse-engineered from what you already expected?" That's a question about a rationale's integrity, not its length, and it's also why "now it's just LLM hallucinations" feels like a fair worry.

Which points at a job distinct from producing rationales: evaluating them. For each load-bearing claim, was it actually established by the evidence, or only suggested by it and then written up as established? Two rationales of equal length can differ enormously on that, and that's where most of the persuasive weight should sit.

The cleanest way I've found to make it checkable is to seal the assessment before the outcome and judge it only on what was knowable at the time. Then hindsight can't quietly relabel a lucky call as sound reasoning, and a reader can verify that for themselves.

I work on this in pharma R&D, where unlike AGI the outcomes are dated and land in a year or two, so the discipline is testable against reality on a real clock. Different domain, same hole. Glad to share a worked example if it's useful.

Molly Hickman

FRI went further and quantitatively estimated how important each crux was - a great starting point towards an adversarially-collaborated synthesis.

And you can too! We evaluated cruxes on two axes: "value of information" (VOI) and "value of discrimination" (VOD). Essentially: VOI is how much someone expects to gain by finding out the answer to a given crux question (with respect to an ultimate question), and VOD is how much two people expect to converge on the ultimate question when they find out the answer to the crux question.

There's a google sheets calculator, as well as an R library, which will be released on CRAN at some point.

BenjaminTereick

Hi Dan,

Thanks for writing this! Some (weakly-held) points of skepticism:

I find it a bit nebulous what you do and don't count as a rationale. Similarly to Eli,* I think on some readings of your post, “forecasting” becomes very broad and just encompasses all of research. Obviously, research is important!
Rationales are costly! Taking that into account, I think there is a role to play for “just the numbers” forecasting, e.g.:
1. Sometimes you just want to defer to others, especially if an existing track record establishes that the numbers are reliable. For instance, when looking at weather forecasts, or (at least until last year) looking at 538’s numbers for an upcoming election, it would be great if you understood all the details of what goes into the numbers, but the numbers themselves are plenty useful, too.
2. Even without a track record, just-the-number forecasts give you a baseline of what people believe, which allows you to observe big shifts. I’ve heard many people express things like “I don’t defer to the Metaculus on AGI arrival, but it was surely informative to see by how much the community prediction has moved over the last few years”.
3. Just-the-number forecasts let you spot disagreements with other people, which helps finding out where talking about rationales/models is particularly important.
I’m worried that in the context of getting high-stakes decision makers to use forecasts, some of the demand for rationales is due to lack of trust in the forecasts. Replying to this demand with AI-generated rationales might shift the skeptical take from “they’re just making up numbers” to “it’s all based on LLM hallucinations” that I’m not sure really addresses the underlying problem.

*OTOH, I think Eli is also hinting at a definition of forecasting that is too narrow. I do think that generating models/rationales is part of forecasting as it is commonly understood (including in EA circles), and certainly don't agree that forecasting by definition means that little effort was put into it!
Maybe the right place to draw the line between forecasting rationales and “just general research” is asking “is the model/rationale for the most part tightly linked to the numerical forecast?" If yes, it's forecasting, if not, it's something else.

Peter Mühlbacher

[Disclaimer: I'm working for FutureSearch]

on some readings of your post, “forecasting” becomes very broad and just encompasses all of research.

To add another perspective: Reasoning helps aggregating forecasts. Just consider one of the motivating examples for extremising, where, IIRC, some US president is handed the several (well-calibrated, say) estimates around ≈70% for P(head of some terrorist organisation is in location X)—if these estimates came from different sources, the aggregate ought to be bigger than 70%, whereas if it's all based on the same few sources, 70% may be one's best guess.

This is also something that a lot of forecasters may just do subconsciously when considering different points of view (which may be something as simple as different base rates or something as complicated as different AGI arrival models).

So from an engineering perspective there is a lot of value in providing rationales, even if they don't show up in the final forecasts.

dschwarz

Yeah, I do like your four examples of "just the numbers" forecasts that are valuable: weather, elections, what people believe, and "where is there lots of disagreement? I'm more skeptical that these are useful, rather than curiosity-satisfying.

Election forecasts are a case in point. People will usually prepare for all outcomes regardless of the odds. And if you work in politics, deciding who to choose for VP or where to spend your marginal ad dollar, you need models of voter behavior.

Probably the best case for just-the-numbers is probably your point (b), shift-detection. I echo your point that many people seem struck by the shift in AGI risk on the Metaculus question.

I’m worried that in the context of getting high-stakes decision makers to use forecasts, some of the demand for rationales is due to lack of trust in the forecasts.

Undoubtedly some of it is. Anecdotally, though, high-level folks frequently take one (or zero) glances at the calibration chart, nod, and then say "but how I am supposed to use this?", even on questions I pick to be highly relevant to them, just like the paper I cited finding "decision-makers lacking interest in probability estimates."

Even if you're (rightly) skeptical about AI-generated rationales, I think the point holds for human rationales. One example: Why did DeepMind hire Swift Centre forecasters when they already had Metaculus forecasts on the same topics, as well as access to a large internal prediction market?

elifland

I do think that generating models/rationales is part of forecasting as it is commonly understood (including in EA circles), and certainly don't agree that forecasting by definition means that little effort was put into it!
Maybe the right place to draw the line between forecasting rationales and “just general research” is asking “is the model/rationale for the most part tightly linked to the numerical forecast?" If yes, it's forecasting, if not, it's something else.

Thanks for clarifying! Would you consider OpenPhil worldview investigations reports such Scheming AIs, Is power-seeking AI an existential risk, Bio Anchors, and Davidson's takeoff model forecasting? It seems to me that they are forecasting in a relevant sense and (for all except Scheming AIs maybe?) the sense you describe of the rationale linked tightly to a numerical forecast, but wouldn't fit under the OP forecasting program area (correct me if I'm wrong).

Maybe not worth spending too much time on these terminological disputes, perhaps the relevant question for the community is what the scope of your grantmaking program is. If indeed the months-year-long reports above wouldn't be covered, then it seems to me that the amount of effort spent is a relevant dimension of what counts as "research with a forecast attached" vs. "forecasting as is generally understood in EA circles and would be covered under your program". So it might be worth clarifying the boundaries there. If you indeed would consider reports like worldview investigations ones under your program, then never mind but good to clarify as I'd guess most would not guess that.

BenjaminTereick

I think it’s borderline whether reports of this type are forecasting as commonly understood, but would personally lean no in the specific cases you mention (except maybe the bio anchors report).

I really don’t think that this intuition is driven by the amount of time or effort that went into them, but rather the percentage of intellectual labor that went into something like “quantifying uncertainty” (rather than, e.g. establishing empirical facts, reviewing the literature, or analyzing the structure of commonly-made arguments).

As for our grantmaking program: I expect we’ll have a more detailed description of what we want to cover later this year, where we might also address points about the boundaries to worldview investigations.

elifland

Thanks for writing this up, and I'm excited about FutureSearch! I agree with most of this, but I'm not sure framing it as more in-depth forecasting is the most natural given how people generally use the word forecasting in EA circles (i.e. associated with Tetlock-style superforecasting, often aggregation of very part-time forecasters' views, etc.). It might be imo more natural to think of it as being a need for in-depth research, perhaps with a forecasting flavor. Here's part of a comment I left on a draft.

However, I kind of think the framing of the essay is wrong [ETA: I might hedge wrong a bit if writing on EAF :p] in that it categorizes a thing as "forecasting" that I think is more naturally categorized as "research" to avoid confusion. See point (2)(a)(ii) at https://www.foxy-scout.com/forecasting-interventions/ ; basically I think calling "forecasting" anything where you slap a number on the end is confusing, because basically every intellectual task/decision can be framed as forecasting.

It feels like this essay is overall arguing that AI safety macrostrategy research is more important than AI safety superforecasting (and the superforecasting is what EAs mean when they say "forecasting"). I don't think the distinction being pointed to here is necessarily whether you put a number at the end of your research project (though I think that's usually useful as well), but the difference between deep research projects and Tetlock-style superforecasting.

I don't think they are necessarily independent btw, they might be complementary (see https://www.foxy-scout.com/forecasting-interventions/ (6)(b)(ii) ), but I agree with you that the research is generally more important to focus on at the current margin.

[...] Like, it seems more intuitive to call https://arxiv.org/abs/2311.08379 a research project rather than forecasting project even though one of the conclusions is a forecast (because as you say, the vast majority of the value of that research doesn't come from the number at the end).

dschwarz

Agreed Eli, I'm still working to understand where the forecasting ends and the research begins. You're right, the distinction is not whether you put a number at the end of your research project.

In AGI (or other hard sciences) the work may be very different, and done by different people. But in other fields, like geopolitics, I see Tetlock-style forecasting as central, even necessary, for research.

At the margin, I think forecasting should be more research-y in every domain, including AGI. Otherwise I expect AGI forecasts will continue to be used, while not being very useful.

Arepo

I found this interesting, and a model I've recently been working on might be relevant - I've emailed you about it. One bit of feedback:

Please reach out to [email protected] if you want to get involved!

You might want to make it more clear what kind of collaboration you're hoping to receive.

dschwarz

I suppose I left it intentionally vague :-). We're early, and are interested in talking to research partners, critics, customers, job applicants, funders, forecaster copilots, writers.

We'll list specific opportunities soon, consider this to be our big hello.

Seth Herd

I think a major issue is that the people who would be best at predicting AGI usually don't want to share their rationale.

Gears-level models of the phenomenon in question are highly useful in making accurate predictions. Those with the best models are either worriers who don't want to advance timelines, or enthusiasts who want to build it first. Neither has an incentive to convince the world it's coming soon by sharing exactly how that might happen.

The exceptions are people who have really thought about how to get from AI to AGI, but are not in the leading orgs and are either uninterested in racing or want to attract funding and attention for their approach. Yann LeCun comes to mind.

Imagine trying to predict the advent of heavier-than-air flight without studying either birds or mechanical engineering. You'd get predictions like the ones we saw historically - so wild as to be worthless, except those from the people actually trying to achieve that goal.

(copied from LW comment since the discussion is happening over here)

dschwarz

This seems plausible, perhaps more plausible 3 years ago. AGI is so mainstream now that I imagine there are many people who are motivated to advance the conversation but have no horse in the race.

If only the top cadre of AI experts are capable of producing the models, then yes, we might have a problem of making such knowledge a public good.

Perhaps philanthropists can provide bigger incentives to share than their incentives not to share.

SummaryBot

Executive summary: The forecasting ecosystem produces accurate predictions but lacks sufficient focus on generating knowledge through facts, reasons, and models, especially for important questions like those related to AGI.

Key points:

Recent high-profile forecasting efforts on AGI and other topics provide forecasts but lack detailed rationales and models.
Elite forecasters face time constraints in tournaments, limiting their ability to deeply explore questions and build comprehensive models.
Published forecasts should cite key facts and primary sources to support their conclusions.
Adversarial collaborations where dissenting forecasters write up a shared view could help resolve debates and persuade the public.
Quantitative models, even if imperfect, can help decompose questions, generate probabilities, and allow for inspection and adjustment.
Focusing on facts, reasons, and models is especially important for AGI forecasting, where accuracy remains low and the stakes are high.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Jared T Peterson

A couple years ago I was wondering why all the focus is on Superforecasters when really we should be emphasizing the best arguments or the best persuaders. It seems like knowing who is best at forecasting is is less useful to me that knowing what (or who) would persuade me to change my mind (since I only care about forecasts in so far as they change my mind, anyways).

The incentive system for this seems simple enough. Imagine instead of upvoting a comment, the comment has a "update your forecast" button. Comments that are persuasive get boosted by the algorithm. Authors who create convincing arguments can get prestige. Authors who create convincing argument that, on balance, lead to people making better forecasts, get even more prestige.

It could even be a widget that you embed at the beginning and end of off-site articles. That way we could find the "super-bloggers" or "super-journalists" or whatever you want to call them.

Heck, you could even create another incentive system for the people who are best at finding arguments worth updating on.

The point is, you need to incentivize more than good forecasts. You need an entire knowledge generation economy.

There is probably all kinds of ways this gets gamed. But it seems at least worth exploring. Forecasts by themselves are just not that useful. Explanations, not probabilities, are what expert decision-makers rely on. At least that is the case within my field of Naturalistic Decision Making, and also seems true in Managerial Decision Making - managers don't seem to use probabilities in order to do Expected Utility calculations, but rather to try and understand the situation and its uncertainties.

This is the conclusion Dominic Cummings came to during the pandemic, as well. Summarized here

> During the pandemic, Dominic Cummings said some of the most useful stuff that he received and circulated in the British government was not forecasting, it was qualitative information explaining the general model of what’s going on, which enabled decision-makers to think more clearly about their options for action and the likely consequences. If you’re worried about a new disease outbreak, you don’t just want a percentage probability estimate about future case numbers, you want an explanation of how the virus is likely to spread, what you can do about it, how you can prevent it. Not the best estimate for how many COVID cases there will be in a month, but why forecasters believe there will be X COVID cases in a month.
https://www.samstack.io/p/five-questions-for-michael-story

Comments

The Rationale-Shaped Hole At The Heart Of Forecasting

The Curious Case of the Missing Reasoning

Those Who Seek Rationales, And Those Who Do Not

So What Do Elite Forecasters Actually Know?

The Rationale-Shaped Hole At The Heart Of Forecasting

Facts: Cite Your Sources

Reasons: So You Think You Can Persuade With Words

Models: So You Think You Can Model the World

There Is No Microeconomics of AGI

700 AI questions you say? Aren’t We In the Age of AI Forecasters?

Towards “Towards Rationality Engines”

Sample Forecasts With Reasons and Models