Proposal: Connect Metaculus to the EA Forum to Incentivize Better Research

Damien Laird

This is a linkpost for https://damienlaird.substack.com/p/application-connect-metaculus-to

Summary

I vaguely describe a hypothetical feature of both Metaculus and the EA forum, connecting them so that users can see which pieces of written content influenced accurate forecasts. This would enable more incentivization of more useful research, which would in turn enable higher quality and more accurate forecasts. I am not in a position to implement any aspect of this proposal, but I’m surfacing the concept for community consideration and feedback because I personally believe it would be both extremely valuable and feasible.

Introduction

I believe that improving the creation and curation of relevant knowledge in forms useful to generalist forecasters is a key lever in improving our ability to forecast Global Catastrophic Risks (GCRs), as I describe here. Furthermore, I think current incentive schemes for rewarding research are pretty weak. Within academia, the focus ends up on number of citations and number of publications. On the internet, rewards typically take the form of upvotes/karma or engagement. Both have a circular aspect to them, where the research that other researchers think is useful gets rewarded. This effect can spiral into traps like extensive and unnecessary jargon, negative-sum competition in the form of adversarial peer review or just lack of recognition/collaboration. Online research targeted at a more general audience, combined with deliberate norms, can circumvent some of these failure modes. Then the rewards go from what researchers think is good research to what a general audience thinks is good research.

But was that research actually useful? Did it make our model of the world more or less accurate? Maybe there was a critical error in the calculations (or just fraudulent data underlying the analysis), such that an otherwise beautifully polished paper is actually moving our knowledgebase in the wrong direction. To some extent errors can be caught by careful readers, or we can attempt to replicate studies to confirm their findings, but not all research lends itself to empirical validation. Less dramatically, research can just be chock full of jargon, otherwise poorly written, or about an irrelevant topic and still end up functionally useless.

What if we connected research to forecasting? Forecasts, especially in prediction polling environments, often have rationales attached that explain the reasoning and sources behind a particular prediction. If those rationales contained citations for their sources, you could actually track over time which sources of information tended to be used in more accurate forecasts. If those sources themselves contained legible citations of their sources, this information could flow along the citation chains, and you could see which scaffolds of knowledge were being built in a direction that improved our model of reality and which weren’t. Where research negatively impacted accuracy, you’d have reason to look for errors in data or reasoning that could produce this effect. In such a system it would be trivial to also show how the research of a given author/publisher stands up relative to others, allowing us to place some trust in their work even before it has been used in forecasts.

Furthermore, these forecast rationales are really just additional pieces of research that could themselves be cited, further enriching the relevant knowledgebase.

Reality

To make this new research paradigm a reality, we would need to implement the following:

A lasting link between forecasts and their informational sources
A lasting link between sources and their sources
A reward that flows along these links to sources that are cited in accurate forecasts

Metaculus seems like the best existing platform for implementing a link between forecasts and citations. As I mention above, I think this almost has to be implemented in the context of prediction polling rather than prediction markets. In prediction polling, forecasters have to be explicit about their confidence level in a claim, and are free to set this confidence level however they want. In a prediction market, forecaster confidence is somewhat encoded in the volume of a given trade, but this can be confounded by a lack of available funds or low trading skill. I also believe the incentive systems of prediction polling platforms can be more accepting of information sharing between participants, a critical prerequisite if we want forecasters to thoroughly and honestly explain their reasoning at the time of a forecast. The default for prediction markets is information hoarding to maintain a competitive edge over other traders in the zero sum environment, but it’s possible that this is not intrinsic to the format and that a play money market system like Manifold could get around it. If this is the case, I haven’t seen a proposed mechanism for doing so in my review of research. Regardless, this seems like a harder journey than Metaculus would have to make, so I’m limiting the focus of my hypothetical implementation details.

Metaculus already has a fairly robust comment section on each forecastable question. Users are prompted to share a rationale in this section with each of the forecasts, and there is even a simple markdown language for formatting these rationales. A small but critical gap is that there is no markdown feature for citations/footnotes. Implementing this well should both encourage citations and make them easy to track, and I expect doing so is a prerequisite to the rest of this proposal.

If this is done, Metaculus forecasters would have the ability to cite sources in their rationales. These are presumably typically associated with links to webpages… but how do we associate those with particular researchers? What if multiple links lead to the same study? Attribution is necessary to score useful research, and is straightforward for citations of other forecast rationales hosted on Metaculus, but otherwise seems painful.

My initial idea to solve this problem was to make Metaculus more friendly for posting longer form research to. There are already discussion threads and forums implemented on the site in some cases, though these are really just forecasting questions without something to forecast, and use the same commenting mechanisms. This means that almost all of the written content on Metaculus is hidden under individual questions/discussions/forums and can’t be easily discovered or searched. Obviously this could be remedied, but this seems like a big lift that would greatly increase the complexity of the site.

Eventually, I realized that I was trying to recreate the EA Forum. This forum is almost exclusively focused already on the kinds of information that I expect forecasters to find useful, has pre-existing norms for clear communication, honesty, and reasoning transparency, and even already has a space dedicated to the topic of forecasting. In fact, when I participated in a tournament that heavily incentivized well-cited forecasting rationales, I found many posts on the EA forum worth using. It even has a very strong set of formatting options including easy citations.

Instead of trying to build Metaculus into something it currently isn’t, I’m proposing a novel connection between Metaculus and the EA Forum.

This simplifies the attribution problem for Metaculus. It only needs to worry about two kinds of links: those that point to comments on Metaculus, and those that point to posts on the EA Forum. These both already have unique identifiers and are intrinsically connected to individual authors. On the EA Forum, tracking the citation of forum posts within other forum posts should be straightforward.

All of the above should cover items #1 and #2 from my initial checklist in this section. We now have a way (that seems quite technically feasible to me, a layman on the outside) to capture the linkages between forecasts and their sources and between sources and their sources, given the self imposed constraints that we only consider content posted on Metaculus or the EA Forum. This seems like a reasonable outcome to me, given that the EA Forum already encourages link posting from other sources with summaries.

This leaves an open question for #3, the actual reward that flows along citation chains to both indicate and incentivize research that is used in accurate forecasts.

It seems obvious to me that this should primarily be some form of “internet points” analogous to the existing “points” currency awarded for accuracy on Metaculus or the “karma” rewarded for upvotes on the EA Forum. I want to be careful with this proposal to minimize the chance of degrading the existing function of either platform, as I think they’re both already doing important things well. To that end, I believe a separate imaginary currency is warranted. This seems worth the slight increase in complexity on both platforms to preserve the integrity of their current systems.

As a stand-in, let’s call these “research points” (to be clear, I think this is a terrible name). Metaculus users will now earn both their current points for accurately forecasting, and '“research points” for their rationales being cited by other forecasters who in turn forecast accurately, or for their rationales being cited by other rationales who are in turn cited by accurate forecasts, and so on. This creates a novel incentive for forecasters to actually write clear and accurate rationales for why they believe what they believe, whereas currently they’re just prompted with no possibility for additional reward. If a forecaster only cares about accumulating the original points and being accurate without having to show their work, they can continue doing so with no real change to their experience. In my mind, this will create a new niche for a kind of forecaster dedicated to clear communication in addition to accuracy. I think we see this kind of forecasting done now by dedicated individuals or top forecasting teams, but it’s unfortunately rare.

Similarly, on the EA Forum, users will now have an ability to accumulate “research points” alongside their existing karma. Karma will indicate what other forum users think is worth rewarding, while “research points” indicate some genuinely manifested value in increasing the accuracy of someone’s beliefs. I expect this metric to be exciting in the EA/rationalist community, but forum posters will be free to ignore this new metric if they want and continue to post content as they did before, not specifically targeted at being useful for forecasters.

How will this metric be calculated? I expect others (especially those who work at these platforms) to have better ideas on this than I do, but just as a concrete example:

A forecast generates a number of “research points” equal to its traditional accuracy points.
These are divided equally between each of its cited sources that are from either Metaculus or the EA Forum.
The “research points” allocated to a source are then divided in half, with half going to that source itself, and the other half being divided equally between each of its cited sources from either Metaculus or the EA Forum.
So on and so forth, to the end of each citation chain. Maybe rounding each division up to the nearest whole number of points to preserve the value of long chains?
These “research points” are continually updated as the traditional accuracy points for Metaculus forecasts are continually updated (per my understanding) and displayed in aggregate in three places:
- On each source itself. Either Metaculus comments or EA Forum posts, alongside traditional upvotes/karma.
- On user profiles, on either platform, for the total of all the content they’ve generated that’s been cited.

Note that the traditional points on Metaculus can be negative in order to disincentive inaccurate forecasts, which allows a signal to flow through the system when sources are counterproductive.

I believe the above proposal represents a sort of minimum viable system that creates the incentives I’m talking about. It also leaves the door open for a lot of future possibilities, many of which I’m sure haven’t even occurred to me. Some that have include…

Different ways to display the information conveyed by “knowledge points” rather than raw point totals. Leaderboards of the most positively influential sources/posters? Percentile scores alongside or instead of point totals? Does this makes sense to be broken up by topic?
Visualizations of citation chains/graphs. These already exist for academic citations, but my intuition is that they get a lot more interesting/useful when you can see which threads of research have actually made forecasting models more accurate.
Cash incentives or other rewards tied to “research points’. Does this enable tournaments or grants to create the most “useful” research, in a way that I expect can be judged much less subjectively than the status quo?

Red Teaming

What could go significantly wrong with the above proposal?

These features could be impossible or much more expensive/difficult to implement as described. If this is the case, I’m not going to figure it out from my armchair, so I’m hoping to get this proposal to people who know much more. (~20% chance this is the case?)
I could be wrong about this being useful. Maybe high quality information actually isn’t a bottleneck for important forecasting, or maybe no one would be interested in writing the kinds of rationales or content intended to be incentivized by this system. (~10% chance this is the case?)
Maybe the points system is trivially gameable/exploitable in some way and won’t incentivize the behaviors I expect. I believe it is gameable, but I think the existing karma and points systems of the two platforms already are to a large extent, and the reason they are mostly useful is more due to the nature of the associated communities and their objective than some intense reliability of their point systems. (~20% this is the case? In which case I think the proposal is likely recoverable with mechanism tweaks)
It could degrade the existing functionality of these platforms in some way. I’ve tried my best to minimize this possibility, but again, I don’t think I can do much better than this from my armchair and will endeavor to get this idea to people with more relevant insights. (~15% chance this is the case? But may still be recoverable with tweaks)

MaxRaMar 25 20237

Glad you're thinking about improving research and forecasting! My very quick take after only skimming your post is that it just happens too rarely that an EA forum post significantly informs specific Metaculus questions. Consequently investing in such a feature seems not sufficiently useful. (Though that might change of course if there were many more Metaculus questions that relate to topics in EA.) Maybe you could list some concrete examples for such EA forum - Metaculus connections? (Sorry if I missed you listing examples!)

Some other random ideas about forecasting on the forum:

Display relative Brier Scores from Metaculus in EAF profiles -> incentivize forecasting, enable forum readers to roughly weigh comments by epistemic track records
Hook up the forum to the GPT-4 API and allow forum writers to generate forecasting questions at the end of their posts.
- Scott Alexander does this regularly and I find it useful to get a concrete bottom line of his uncertainties and concrete things he's forecasting based on the post, e.g. see here.

Damien LairdMar 25 20233

Good points.

This post comes to mind, which I cited in my nuclear GCR forecasts here, along with many other posts from that series. In general I expect posts from Rethink Priorities to be relevant. I've seen similar quality posts for AI risks and pandemics here. Most of my familiarity is with GCR's but I expected there to be strong overlap between popular forecasting topics and popular EA forum topics more generally. There are lots of GCR related questions on Metaculus, and you can find many cited in that link with my forecasts.

Still, I think you're right that this wouldn't be applicable to the majority of EA forum posts. Maybe it's only even displayed once a post is cited in a forecast, or only a particular tag is eligible for this in order to simplify the implementation.

I do think making people's forecasting performance more obvious in different contexts would be very useful for the community (re: your brier scores in EAF profiles idea), and would love a central site that's sort of like a minimum viable linkedin that consolidates relevant metrics for an individual across the top forecasting platforms and has an API that makes it easy to connect to other accounts, or use with discord bots etc. I may write about this soon.

Generating forecasts associated with a post is interesting and I'm sure there are UX opportunities to make this easier / more common, but I need to think more about it.

Thanks for the thoughtful response!

Charlie_GuthmannMar 25 20232

Few Things

https://forum.effectivealtruism.org/graphiql if people want to scrape.
I only skimmed your post (let me know if I'm misunderstanding) but I have an issue with this idea. Many forecasts require complicated mathematical models to describe. You can't simply link to sources. You also need to link to a model. Blog posts/txt files, which are essentially what the forum is, are extremely hard to scrape and parse unless everyone starts adopting conventions. So you max you functionality out at linking, this isn't very automated.
If you are recommending connecting a full mathematical model from the forum, let me suggest that rather than connecting Metaculus to the forum, you connect it to https://www.getguesstimate.com/models, as this is much more scalable and clear.
thank you for thinking about these things, it inspired me to make my own post.

Damien LairdMar 25 20232

I would say it a little differently. I would say that "judgmental" forecasting, the kind typically done on Metaculus or Good Judgement Open or similar platforms, CAN involve mathemtical models, but oftentimes people are just doing some simple math, if any at all. In cases where people do use models, sure it would make sense to link to them as sources, and I agree that would also be valuable to track for similar reasons. Guesstimate seems like the obvious place to do that.

I think that is separate from the proposition I intended to communicate for primarily text based research.

I also wasn't anticipating any need to do scraping if this was implemented by the two platforms themselves. It should be easy enough for them to tell if a citation is linking to an EA forum post? Metaculus doesn't have a footnote/citation formatting tool today like the EA Forum's. (Although if you were to scrape, finding EA forum links within citations on this forum seems pretty well defined and achievable? idk, I don't write much code, thus me floating this out here for feedback)

Thanks for the thoughts!

I would say we are basically on the exact same page in terms of the overall vision. I'm also trying to get at these logical chains of information that we can travel backwards through to easily sanity check and also do data analysis.

Where I think we break is if there is no underlying structure to these logical chains outside of a bunch of arrows pointing between links, it reduces our ability to automate and take away insights.

A few examples

you link to a ea forum post with multiple claims. In order to build logical chains, we now need a database to store each claim in each post. In order to do this, we now need to convince everyone to use certain formatting on claims or try to use an LLM to parse.
you link multiple sources, which themselves link multiple sources. Since linking is just drawing arrows in an abstract sense, I have no ability to discern how much each source went into the guess. I assume we would just use a uniform distribution to model how much each source went into the final guess? but this is clearly terribly off in many cases so we lose a lot of information.
- If we link to models we hold a lot more information down the chain.

Overall I wouldn't say my proposition isn't a full substitute for your idea, but I think there is overlapping functionality.

Effective Altruism Forum
EA Forum

Proposal: Connect Metaculus to the EA Forum to Incentivize Better Research

19

Summary

Introduction

Reality

Red Teaming

19

Reactions

More posts like this