Hide table of contents


The replication crisis is a huge problem in modern science, with 30%-60% of papers published in top journals not getting replicated for their main result. Even Among those replicated, the effect size is 75% of the original, on average[1]. It is estimated that 28 billion $ are spent annually on non-reproducible preclinical studies, in the US alone. [2]

A related problem is the citing of bad research - research that has been shown to not replicate well, or has been retracted. According to Serra-Garcia & Gneezy (2021), only 12% of research that cites papers that have failed to replicate, acknowledges the failure.[1] In addition, even retracted papers still get 40% of the their previous rate of citations[3], on average, with many citations failing to acknowledge the problems in the paper.[4] 

The problem of citing bad research is much more solvable and neglected, than the one of increasing reproducibility. 

Here, I propose building a website and/or web extension (nicknamed Replicato) that can scan a list of references, and among them find papers that have been replicated or retracted. Another option will be to find papers that were included in reviews and meta-analyses, but I won't estimate its impact.

This can make it much easier for scientists & others to identify problems with their own papers, and others' papers. I believe this will be relatively cheap to develop & advertise, and could be extremely an cost-effective contribution to science. Science is one of the greatest contributors to society, and very central to EA, so this idea might be approximately as cost-effective as other causes in EA. 



According to Freedman et al. (2015)[2]

"An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States alone."

Can we model the estimated impact of Replicato? Its impact is likely to come from 2 channels - increased awareness to bad research, and changed incentives. Increased awareness will obviously come from Replicato itself, while changed incentives may come from researchers knowing it is "harder to get away" with bad papers. 


Increased awareness

A legitimate concern could be that it will be very hard to make Replicato widely used. There are 2 counterpoints. First of all, it doesn't need to be widely used to have an impact. Since nonreplicable and retracted results get at least thousands of citations annually (as can be estimated from Serra-Garcia & Gneezy, 2021[1]), with probably a few thousands more from papers that failed to replicate in the Many Labs project, and some more from independent replications (unrelated to major projects). Even implementation in 1% of papers a year will be translate to dozens or more of papers not citing bad research annually. 

Also, it's important to note that the average number of people that participate in a paper pre-publication is quite large. The average/median number of authors changes depending on estimate and field. It generally ranged from 4.4 to nearly 10[5][6][7], but a good and convenient estimate is 6 average authors. With a conservative 4 average people that see the paper in peer-review, we get 10 people. Thus, if the probability that any 1 researcher will use Replicato is p, the chance it will be implemented in the paper is 1-(1-p)^10.[8] 

Percent of papers with Replicato being used (k)Percent of scientists using Replicato (p)


In short, with just 6.7% of scientists consistently using Replicato, 50% of papers will rid of unknowingly citing bad research. Even just with just 0.1% using it, 1% of papers will rid of that, which as said before equals to dozens of papers per year. This doesn't change significantly with 8 or 9 reviewers for paper (for k=50%, p=8.3% or p=7.4%, respectively).


Changed incentives

My assumptions for this part will be quite arbitrary, and somewhat conservative. I won't use a probabilistic model, but you can Squiggle the hell out of this if you want. 

So I'll assume that only 20% of nonreplicable & retracted publications come from incentives, and the rest are due to honest mistakes & problems in the process.

I'll assume that if researchers & journals had a complete assurance that their publication would never be positively cited again, would they be nonreplicated and/or retracted, that would hurt their incentive of publishing these results by 1% (as replication attempts & retractions are indeed quite rare).

That would mean that if Replicato will be used in 100% of papers, that would reduce the number of nonreplicable & retracted by 0.2% (keep in mind, that still equals 56M$ per year, just in the preclinical research in the US[2], and obviously much more in other areas). 

I'll assume Replicato is going to be mildly successful, and have 5% of the userbase of the largest web extension for scientists, such as Mendeley, Zotero, EndNote Click, and Google Scholar. They have 2-3 million users according to the Chrome store. Chrome composes about 2/3 of browser users[9], so multiply that by 1.5 - we get 3-4.5 million. 

5% of that is 150,000-225,000 users. I'll use 188k, the average. 

In 2018, there were an estimated 8.8M scientists in the world, according to UNESCO.[10] The reported rate of growth in the article is X 1.137 in the previous 4 years, so X 1.0326 in one year. That would give us 10.33M scientists today, 5 years past. 

188k out of 10.3M is 1.82%. According to my model[8], that would mean 16.8% of the published papers will use Replicato. 

That means that Replicato will reduce nonreplicable & retracted papers by 16.8%*0.2%=0.0336%. That will equal 9.4M$ in preclinical research in the US alone.[2] 


How much will Replicato's development cost?

I don't know, but the requirements are basically: creating and maintaining a dataset of all replication studies to date (a few thousands I think[11]), and linking Replicato to the existing database of retractions by RetractionWatch, and to the review and meta-analysis search in PubMed and in other places. Then create software that can identify the publication based on the strings of text from the references, and link those together. From the little knowledge I have in software, I assume the initial development will be up to 50k $, and the annual cost of maintenance will be a few 10k dollars/year. For incentivizing research of tens of millions of $ per year, and helping rid of hundreds of bad citations, that's very good.  

Since Replicato will come from inside EA, I assume it will be widespread inside EA research, thus reaching a critical mass, and helping our research even more than others. 


Should it be a for-profit?

Probably not. RetractionWatch don't want to give their database for commercial uses. In addition, making it non-free is likely to make the share of users significantly smaller. It will also be harder to ask for free advertisement. 

How could it be advertised? 

It's really in the incentive of everyone that cares about science to use Replicato - researchers and publishers alike. It's possible that journals and scientists will agree to advertise it for free, maybe also Sci-Hub. If not, other money from EA could be used to advertise it.

Is the specific name Replicato Important? 

Nope. I just liked it. 

Additional description of how Replicato will work:

Basically, there will be a website and a web extension. With the website, a user could copy/paste any reference list, and Replicato will find any replications and retractions of any of the papers, and display them. A user could also check if any of the references was mentioned in a meta-analysis (and which), or reviews in general, and in replication prediction markets. The web extension will be similar, just that it will let users do it by marking the references, and then clicking on the extension icon (analog to the Google Translate extension).

Why won't I build it?

I have no knowledge of programming.


I'll be happy for any feedback. Thanks in advance.




  1. ^

    Serra-Garcia, M., & Gneezy, U. (2021). Nonreplicable publications are cited more than replicable ones. Science advances, 7(21), eabd1705. https://doi.org/10.1126/sciadv.abd1705

  2. ^

    Freedman, L. P., Cockburn, I. M., & Simcoe, T. S. (2015). The Economics of Reproducibility in Preclinical Research. PLoS biology, 13(6), e1002165. https://doi.org/10.1371/journal.pbio.1002165

  3. ^

    Kühberger, A., Streit, D., & Scherndl, T. (2022). Self-correction in science: The effect of retraction on the frequency of citations. PloS one, 17(12), e0277814. https://doi.org/10.1371/journal.pone.0277814

  4. ^

    Bar-Ilan, J., & Halevi, G. (2017). Post retraction citations in context: a case study. Scientometrics, 113(1), 547–565. https://doi.org/10.1007/s11192-017-2242-0

  5. ^


  6. ^


  7. ^


  8. ^

    Chance it won't be implemented by any 1 is 1-p

    Not implemented by 10 is (1-p)^10

    Implemented by at least 1 is 1-(1-p)^10 = k

    also: p=1-(1-k)^0.1

  9. ^


  10. ^


  11. ^

    The most comprehensive source I could find is the ReplicationWiki, which is focused on social science, and specifically economics. It lists 790 studies that were replicated, although many are not exact replications. I arbitrarily assume the total number of replication is several times higher, hence a few thousands. 





More posts like this

No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities