Here's my rejected FTX proposal (with Abel Brodeur) to solve the replication crisis by hiring full-time replicators. (I left out the budget details.) [Added some edits in brackets for more context.]

Please describe your project in under 100 words.

We will actually solve the replication crisis in social science by hiring a “red team” of quantitative researchers to systematically replicate new research. Currently, there are few penalties for academics and journals that publish unreliable research, because few replications are attempted. We will fundamentally change academic incentives by making researchers know that their work will be scrutinized, which will motivate them to improve research design, or else face a loss of reputation. By fixing scientific institutions now, we can reap the compounding benefits of reliable knowledge over the long-term future.

If the project has a website, what’s the URL?

https://i4replication.org/

Please describe what you are doing very concretely—not just goals and long-term vision, but specifically what you are doing in the next few months.

Currently, the Institute for Replication is using volunteers to systematically reproduce and replicate new studies from leading journals in economics and political science. With funding from FTX, we can hire a Project Scientist (Michael Wiebe), post-docs, and research assistants to massively scale-up reproductions and replications. [Definitions: "reproduce" = being able to run the code and obtain the results that are in the paper; "replicate" = re-analyzing the paper using different methods and/or different data. Note that most social science papers use observational data and are not lab experiments.]

We can also launch a cash prize for completed replications, to incentivize even more replications. This can be implemented in several ways; for example, giving a prize of ~$1000 for high quality replications completed using the Social Science Reproduction Platform, as judged by a panel of experts. [We're open to revising this number, eg. to $5k; most grad students replicate papers for coursework, so it might not take much to incentivize them to submit.]

What’s the case for your project?

Social science is facing a replication crisis. Researchers produce unreliable findings that often do not replicate, and the root problem is the lack of replications. 

Academics have basically no incentive to perform replications, since they usually do not yield original findings, and are not valued by journals. Since they do not lead to publications, replications do not help academics get tenure, and hence few are attempted. The replications that are done are conducted by volunteers in their spare time, and can even have negative career effects if they upset powerful academics. 

The rareness of replications causes peer review to be an inadequate form of quality control. Knowing that research won’t be closely scrutinized, journals and referees have little incentive to check for data quality issues, coding errors, or robustness. If a paper with unreliable findings gets published, the journal suffers no loss in reputation, because no one will replicate the paper to expose its flaws. Hence, referees take empirical results at face value, and focus instead on framing the research question and appropriately citing the literature. 

Knowing that their work will not be reproduced nor replicated, most researchers don’t invest time in preparing replication packages, and don’t check for data or coding errors. The result is entire fields with serious reproducibility problems. 

We can fix these incentives by investing heavily in reproduction and replication, and making a big push to systematically replicate new research. With a team of full-time replicators and cash prizes for completed replications, researchers will now expect their work to be immediately scrutinized as a regular practice. This scrutiny will put researchers’ reputations on the line: if their findings are not robust, their work will not be cited (or worse, be retracted), ultimately affecting their promotion and tenure outcomes. At the same time, high-quality work will be rewarded. A big push will attract widespread attention to amplify these reputation effects. Hence, researchers will put more effort into better research design and fixing errors before submitting for publication. To avoid a reputation for publishing unreliable findings, journals will improve their peer review standards. The end result is a scientific literature containing reliable knowledge, to help guide our species through the long-term future.

How long have you been working on this project, and how much has been spent on it?

The Institute for Replication was launched in January, with no funding raised yet.

What has been achieved so far?

We are collaborating with a large team of researchers to reproduce and replicate studies in economics and political science. We have already reproduced over 200 studies, and are currently working with about 50 independent researchers to replicate 30 studies. (See here for precise definitions of ‘reproduce’ and ‘replicate’.) We have built a large network of researchers interested in reproductions and replications. Our collaborators include journal editors, data editors, and reproducibility analysts at selected journals. We have already put together many special journal issues dedicated to replications. We have also conducted a survey of editors of leading outlets in economics, finance, and political science to help replicators identify journals interested in publishing replications.

Do you have any reservations about your project? Is there any way it could cause major harm? If so, what are you going to do to prevent that?

We expect failure to look like a null effect: no one pays attention. We would publish negative replications, but researchers and editors would not change their practices, and departments would not change their tenure and promotion decisions. One possible harm is executing the project badly and giving replication a bad reputation. We can prevent this by giving prizes only to top quality replications, and requiring transparency by allowing the original authors to publicly respond to replications of their work. I4R already has a conflict of interest policy
Another possible harm is negative replications being taken as evidence of cheating by researchers, as opposed to honest mistakes; this could lead to a backlash against replication. We can prevent this by encouraging a culture of charitability, with replicators giving authors the benefit of the doubt when discussing problems and errors.

What will it look like if your project has gone poorly / just OK / well at that time?

Poorly: we are unable to hire some of the post-docs; replications are low quality; less than 100 replications completed. (We are currently at 30 ongoing replications accepted since mid-January, so we naively expect 100 per year.) 

Just OK: we successfully hire 5 post-docs and RAs; replications are high quality, adding important robustness checks (with positive or negative results); 250 replications completed; some media coverage; some replications and retractions published in original journals. 

Well: we successfully hire 10+ postdocs and RAs; high quality replications; 500 replications completed; widespread media coverage; replications published in original journals; negative replications lead to retractions from journals; journals implement new peer review standards; departments account for replications/retractions in tenure/promotion decisions.

 

85

New Comment
19 comments, sorted by Click to highlight new comments since: Today at 10:14 AM

Project looks really cool. I appreciate you sharing this. I hope this project continues to grow.

I really want to know what FTX ended up funding since the rejected grants I know of looked really promising to me.

What was the approximate budget? When I read this my first thought was 'did they ask for a super ton of money and get rejected on that basis'?

Around a million.

I am really happy to see someone doing something about the replication crisis. Sorry that you didn't get funded. I know very little about FTX or grantmaking in general and so I can't comment on the nature of your proposal or how to make it better. But now that I see someone doing something about the replication crisis I have done an update on the Tractability of this cause area and I am excited to learn more!

This excitement lead to some small actions from my end:

  1. I visited the Institute for Replication website and found it to be very helpful. I really appreciate the effort that went into making the Teaching tab on the website. I will try to make time in the near future (within a month or so) to go through the resources carefully.
  2. I subscribed to the BITSS YouTube Channel and skimmed through a couple of chapters of the open source textbook, Reproducible Data Science.
  3. I looked for material on the replication crisis elsewhere on this forum. I found this panel discussion from EA Global 2016 and... thats about it! Since, IMO, not enough EA material is there on this cause area, I put down a comment in the What posts do you want someone to write? in the hopes that someone wading through it for ideas will decide to write more about it.

One thing still unclear to me - are there career opportunities here or just volunteer opportunities? In the proposal, you mentioned "reproducibility analysts at selected journals" - I had no idea that was a thing that people did! But it sounds like a very interesting role to me considering the Scale of the problem. How many people do it and is there a high demand for it? What sort of degree does someone need to do it?

All the best with the project! I sincerely hope someone else will fund it and it will be successful.

At this point, 'reproducibility analyst' = undergrad RAs; see this talk by AEA data editor Lars Vilhuber.

Otherwise, the replications are currently done by academics volunteering in their spare time, which is why it would help to have full-time paid replicators.

Looks like a great idea, very glad someone is pursuing the roll-up-your-sleeves method here.

I think the best addition to this that you could make is a business plan—basically, how much would it cost to replicate how many studies, how would you best choose studies for replication to maximize efficiency / impact, how much / how long until you were replicating 1 or 10% of top studies, etc. I'd also personally like to see a different version of "what has been achieved" that didn't lean as much on collaborations / work of collaborators, as I find these basically meaningless.

The budget section (omitted here) has more of these details.

Re: selection, the idea is systematically replicating all new research in top journals, to change researchers' expectations from (a) expecting to have basically no one scrutinize their work to (b) expecting at least some post-publication  review. This incentivizes researchers to improve the quality of their work.

Re: collaborators, I4R currently works by asking academics to volunteer to replicate papers.

This would be cool to fund as a bet on success, e.g., to give you/your early stage funders a $10M price if you "actually solve the replication crisis in social science" (or a much lower amount if you hit your milestones but no transformative change occurs). This would allow larger funders for whom you are less legible to create incentives for others who are more familiar with your work to fund you.

This definitely seems interesting! I'm curious whether you would also be interested in seeing how other, later studies have used any findings that you cannot replicate, and thus get a sense of any "epistemic contagion" in the literature? Or would the studies you try to replicate be too new for that to make sense? (Or do you simply think that's better left to other researchers?)

It at least seems to me that if you had a good sense of "which findings/studies would involve high amounts of epistemic contagion if they do fail to replicate" then that might help with choosing which studies to focus on.

I wrote an EA Forum post describing the concept of epistemic mapping (with pictures) here, but I'll avoid going into detail on that. I just bring it up because one of the reasons that I've thought that epistemic mapping may be valuable is that it could potentially help with understanding research/epistemic contagion: i.e., how flawed datasets, regression analyses, experimental findings, or other inputs might produce inaccurate findings in the broader research literature.

I guess if you found reproducibility problems in a bunch of related papers, that would point to a common cause. In fact, I found a case like this in my dissertation: the entire literature on meritocratic promotion in China is unreliable, and is based on a highly-cited 2005 article.

Michael, I love your work (blog). Other than FTX, have you tried other avenues for funding this?

I've applied to Emergent Ventures and ACX funds for smaller scale versions of this idea (eg. my writing a replication blog), but didn't get anything. FTX inspired me to think of the maximal scale version.

Ah frustrating! I'm surprised Tyler didn't say yes, given your previous blog posts. 

Random thought - maybe it's worth applying to EAF/LTFF for replicating EA specific papers?

Yeah, I've tried to think of empirical EA-related papers that would be informative to replicate; so far it looks like air pollution might be a good topic. The problem is that many EA-relevant papers are theoretical and hence not amenable to my style of replication.

Would the Institute for Replication incorporate insights/methods from replication markets?

Possibly! Anna Dreber is on the board of both.

Thanks for sharing your proposal Michael. The institute looks great. Finding ways to incentivise replication is something I consider to be really important. 

A couple of questions. I am curious what probability you would place on the Institute significantly increasing acceptances of replications in top journals? More abstractly, I wonder if a dedicated instituted could help  change social norms in academia around replication. Do you have any thoughts about this? 

Lastly, did you receive any feedback from FTX? 

I'm not sure if top journals would publish replications. They seem to get prestige from publishing original research, but maybe if replication was higher status, they would do it. I mainly see the benefit of systematic replication in inducing researchers to improve the quality of their research, so we'd actually see fewer negative replications. (Another issue is that only negative replications are 'interesting'.)

I think changing norms is possible. A lot of journals now have a data editor who ensures 'push-button' reproducibility: the data and code are available, and you can run a script that produces all of the results in the paper. This is a big improvement over 10-15 years ago when code wasn't available, or didn't reproduce results.

I didn't get any feedback from FTX.

Using prediction markets, we could set up markets on whether a paper will be retracted or have a comment published about it (for example, this). If the price is low, replicators could profit by using insider information: by scrutinizing the paper and writing up a comment, you can make the event realize 'Yes'.