This essay was submitted to Open Philanthropy's Cause Exploration Prizes contest.
If you're seeing this in summer 2022, we'll be posting many submissions in a short period. If you want to stop seeing them so often, apply a filter for the appropriate tag!
- A large proportion of scientific research does not reproduce. This is due to a variety of factors ranging from outright fraud and data falsification to p-hacking and publication bias.
- This leads directly to lost money, time, and delayed benefits of science (e.g. new medical therapies).
- This leads indirectly to inefficiencies in many other cause areas such as global health, animal welfare (lab animals), and existential biorisk.
- Currently quality control in academia is limited to that done within individual labs and peer review that has been shown to be bad at finding errors.
- New mechanisms of quality control of science could include issuing of bounties for finding erroneous or falsified results in the published literature, promotion of fraud detection tools, and supporting of organizations dedicated to quality control and reproducibility in science.
- Importance: Poor quality control in science has effects on other cause areas and wastes billions every year - high
- Tractability: There is a lot of awareness of the problems but changing something as big and established as academia could be hard. Therefore new things need to be tried to force a change from the outside - medium
- Neglectedness: There is a small field of metascience and open/reproducible science researchers but and a few organizations that focus on tackling these problems from the inside. Few outside organizations try to tackle quality control directly. - high
This is a shallow investigation that was written in approximately 12 hours and many of the ideas require further investigation.
Waste in scientific research
It is often argued that science and technology is what will move our world forward towards a prosperous future. Although every year, billions of dollars are being used for scientific research all over the world, it has been shown that a large portion of this funding is wasted on irreproducible research leading to delays in the development of new technologies and understanding. A notable example is the recent discovery that multiple papers in Alzheimer’s research contained data manipulation, one of them a key paper by Lesne et al. from 2006 in Nature with over 2300 citations. In the 16 years since its publication multiple studies and clinical trials for therapies for Alzheimer's disease have been built on the unsubstantiated claims from that paper. A similar case, which has had detrimental effects on public health, is the 1998 Wakefield et al. paper published in the medical journal Lancet that linked MMR vaccines with autism in children. Upon investigation it was found that the data presented had been altered to fit a narrative that benefitted the author, and subsequent studies found no link between the MMR vaccine and autism. However, the damage of reducing public trust in vaccines had already been done and prevails to this day. Vaccine hesitancy has been shown to have played a role in the high mortality during the COVID-19 pandemic, as well as the re-appearance of measles in the US although it was deemed eliminated in 2000.
Similar examples of scientific misconduct and fraud are surprisingly common. A recent study of dutch scientists estimated that more than 50% or researchers regularly engage in questionable research practices and as many as 8% have committed fraud in their research. Another review reported that 33.7% of researchers admitted to using questionable research practices and 1.97% to data falsification. When asked about colleagues these numbers rose to 72% and 14.12% for questionable research practices and data falsification respectively.
However, it is not only outright fraud that leads to irreproducibility in science. Poorly recorded methods, leads to research that cannot be reproduced by others and natural variability might lead to false positive results. Positive results are more likely to be published and that leads to a publication bias that further skews the reliability of science. Other questionable research practices include p-hacking and designing experiments with a high amount of researcher degrees of freedom which allows a researcher to cherry pick their results.
In a recent reproducibility study of preclinical cancer biology Errington et al. failed to replicate 143 out of 193 planned experiments due to a variety of reasons including reagents not available or not described, and extreme protocol modifications needed when discussing the experiments with the original authors. Out of the 50 experiments they managed to run, less than 50% were deemed to have been replicated in Errington’s experiments. In these cases there was no direct suspicion of fraud, but nonetheless the findings failed to reproduce and thus failed at improving our understanding of the fundamental processes involved in cancer.
Whatever the cause, irreproducibility comes at a cost of both time and money. In 2015, Freedman et al. estimated that 50% of preclinical research in the US is irreproducible, which equals a waste of approximately $28 billion dollars per year. Not only is money wasted, but academic science is slowed down and sent down false paths, and public trust in science can be diminished.
Lack of quality control in science
There is a severe lack of quality control in academic science. In general, quality control of how research is conducted comes down to the individual labs led by principal investigators (PIs) that often have minimal to no training in people and project management, and quality control of the research output (papers) is done by the process of peer review.
Peer review is an insufficient mechanism for quality control in science. It has been shown in studies that introduced large intentional errors in the manuscripts that the errors are often missed by reviewers. Additionally, poor reproducibility of inter-observer review has been observed. Many researchers agree that peer review is insufficient but few alternatives to quality control have been proposed, and even fewer accepted in the academic community. One proposed review mechanism is post-publication review where peer review is performed after a paper is accepted, but this does not address the problem of reviewing quality, rather the problem of papers being unavailable until the often lengthy review process is concluded.
Publishing is the bread and butter for any scientist and that has led to perverse incentives of publishing quantity over quality, as well as incentives for questionable research practices including ‘hyping up’ your results to make them sound as impressive as possible to get accepted into ‘high prestige’ journals. Null-results and negative results are often not published.
Increasing cost of replication/quality assurance
Another problem related to the issue of lacking quality control is that many studies simply are hard to assess without redoing a large portion of the work that went into them to start with. As peer review is an unpaid activity that is generally not rewarded in terms of career progression, active academics that have the pressure to publish themselves cannot spend much time on repeating experiments and checking results carefully. Instead they are left with evaluating the research based on what is written in a long PDF document (which is also not always an optimal tool for communicating a research finding) at face value. Taking time to do a reproducibility study is also often not worthwhile since reproductions often are harder to publish because they are not seen as novel.
As scientific questions get harder and harder and require more and more data and compute power, reproducibility can also become more expensive. For example, to fully replicate an experiment that was conducted on a supercomputer you must have access to a similar enough supercomputer yourself. In the past it might have been possible to quickly re-run an analysis whilst doing peer-review, but if the computational experiments take hours to days to run and require a range of hard to setup or proprietary software it is unfeasible to expect peer reviewers to check that the analysis was done as it is reported in the paper. This is likely to get worse in the future.
The number of papers published every year is also growing exponentially with an estimate of 1.8 million papers being published in 2014, which might be an underestimate based on other estimates reporting 1 million papers in biomedicine alone in 2016, and 4.7 million articles accepted in 2020. The above references also estimate the growth rate of publications as 8-9% per year and an estimated 130 million hours were spent on peer review in 2020. In terms of labor cost in the US alone, that is an estimated 1.5 billion dollars per year spent on peer-review, a quality control mechanism that has been shown to be ineffective. The sheer amount of research output is thus a problem in itself as it is harder for individual researchers to stay informed about latest trends within their fields and quality control of this vast amount of publications is severely lacking.
Focusing on biomedical research alone and using the rough estimated numbers above, we can estimate how big this problem is. Out of 1 million papers, we can estimate that 50% are somewhat flawed due to questionable research practices and 8% due to outright fraud and data falsification. Assuming that 0.01-0.1% of papers have enough interest to have a significant impact on the direction of a research field (this is a rough guesstimate based on data in a 2008 study on citation metrics showing the proportion of publications with more than 100 citations 10 years after publication) and that questionable research practices and fraud is equally common in high impact and low impact research, 50 to 500 high impact papers every year would be based on questionable research practices, and 8-80 high impact papers contain falsified data. And since the number of articles is rising every year and the competition for a career in academia (publish or perish) gets fiercer the total number of publications with errors and/or fraud will only rise and without improved quality control it will only get harder to distinguish which paths can lead to real new insight and which paths are dead ends.
Other cause areas would also benefit from improvements in efficiency of science. For example global health efforts could be better supported with therapies being developed faster, animal welfare would be improved if fewer animals were used on irreproducible research, and existential risks like biorisk would also benefit from more rigor and care in science stopping too accelerated and careless experiments.
Supporting ‘scientific bounty hunters’
Inspired by the work by microbiologist Dr. Elisabeth Bik, I propose a mechanism for incentivising people to detect and flag erroneous and/or falsified data in the literature. Dr. Bik regularly posts on Twitter and PubPeer (a website for open peer-review and discussion of papers after publication) evidence, mostly images, of data falsification in peer reviewed journals. However, Bik does this work mostly unpaid and relies on donations and honoraria for giving talks related to her work. I believe one mechanism to incentivise more scientists (or ex-scientists) to do similar work would be to issue bounties for fraud detection.
The cases caught by Dr. Bik tend to be obvious image modifications where one part of an image or plot has been duplicated and presented as an independent measurement. This kind of obvious scientific misconduct is the easiest to spot so it is easy to think that this is very rare and only appears in lower impact journals. However, this is not the case, and the recent example of the 2006 Alzheimer’s disease paper contained such outrageous copy-paste jobs. On her website, Bik states she works mainly with her eyes to detect duplications, but also mentions software tools that have been developed for such use (FotoForensics, Forensically, or ImageTwin).
Making more people into scientific bounty hunters could raise the bar for fraud and thus increase the cost of such behavior. This approach still takes papers at face value, but could at least catch the most obviously fraudulent cases.
Supporting the development of better tools for peer review
Rather than waiting for papers to be published and then caught for containing falsified data it would be better to stop these papers from ever being published. One approach toward this goal would be to develop tools that can help journal editors and/or reviewers assess figures for duplications. Again, this only focuses on the most obvious fraudulent cases, which is the tip of the iceberg.
The adversarial research institute and replication researchers
Ideally, the two recommendations above could be implemented by a central ‘adversarial research institute’ (name and core idea from https://nintil.com/new-science-models/) which could organize a workforce of ‘reproducibility researchers’, who would be scientists who don’t work on original research, but instead are focused on quality assurance of other’s findings, potentially both pre- and post publication.
With full time researchers and a research institute, larger studies could be replicated and papers would no longer have to be taken at face value.
There is a group of people that would be well suited to become science bounty hunters or reproducibility researchers and that is the many many PhD holders and postdocs that inevitably do not have a career within traditional academia due to the oversupply of qualified people to a relatively small number of faculty positions. Many of these researchers are highly qualified but underpaid and often on short term contracts. If there would be alternative stable careers, there is a high likelihood that many of these researchers would be interested. There is already a steady flow of academic researchers into industry.
There is now a very small number of people that dedicate their time to finding and flagging fraud in published papers (like Dr. Bik). One concern on the tractability of scientific bounty hunters is that Bik has reported resistance from journals to accepting the low quality of many publications evident and that the retraction rate has been low and slow even after concerns like duplication in images or plots have been flagged. There is however hope that if the number of people engaged in quality assessment it would change the tone from being an annoyance to a force that needs to be taken seriously.
Advocacy for wider use of anti-fraud software tools could be a cheap way of improving science, but requires journals to be open to the use of these tools. Unfortunately, many traditional journals have been slow to embrace modern methods for assessment of science (for example open or post publication review). Rather than gaining support from journals, it might be wise to engage other science funders to support improved quality control measures and require it for the research they support. An example of a funder promoting change in science is the Wellcome Trust that is pushing for all their research to be published solely in open-access journals. Similarly one could envision that funders could push for their research to be assessed by quality control tools prior to publication.
Who is already working in this space?
The reproducibility crisis has been widely recognised in academia since the early 2010’s, first in psychological science and biomedical science, but evidence of irreproducibility has also been found in other fields. Efforts to counteract it, including more widespread training in statistical methods, have been introduced in some fields of science mostly by grassroot organizations.
The field of meta-science has also developed to do ‘science on science’ to research flaws in the way science is done and propose better ways of doing science. For example the development of research tools such as pre-registration of studies, registered reports have been proposed and implemented. It is however still a small field with only a few researchers, many of whom are associated with the Center for Open Science or the Meta-Research Innovation Center at Stanford, as well as smaller centers within university departments. The 2021 Metascience conference showcased just over 200 speakers and moderators in the field, many of whom do meta-science on the side of their other research.
The Institute for Replication was recently (January 2022) founded to facilitate reproducibility studies. At the moment they are focused on economics and political science but they have the potential of growing into an institute similar to the proposed ‘Adversarial Research Institute’. However, at the moment they seem to be operating on a voluntary basis and they do not employ reproducibility scientists, rather they facilitate replications and provide some guidance to researchers that perform them.
There are also efforts to reform peer-review. One such project is the ‘Unjournal’, which will enable more open review and enable work to be presented and evaluated in dynamic document/literate coding formats, where the full data to output pipeline is shown. Elife, which is a more traditional journal, is also experimenting with new models of review, in particular post-publication review.
Other proposals similar to the ones presented here have also been tossed around in the decentralized science (DeSci) community, which is a very small field that proposes the use of web3 and blockchain technology to counter the reproducibility crisis. Proposals to use decentralized autonomous organizations (DAO’s) to facilitate payment for reproducibility of both original research and the researchers performing the replication studies. This could be utilized for the proposed bunty system, but further investigation would be required.
There are also organizations that propose that science should break free from the old ways of doing research with its skewed incentive system and old academic institutions where academics work grant to grant and have to keep up the publish or perish culture to keep and progress in their career. Examples of these include Arcadia Science and New Science.
Overall the issues related to irreproducibility are relatively well known and accepted within the scientific community and some efforts to counteract it already exist. However, there is definitely a coordination problem and a reluctance to change old ways of doing things.
The field of metascience is still small and could benefit from more funding and resources. Reproducibility scientist is not a job title and there would be room for that to be a full time job of many people!
Uncertainties and questions that would need to be answered in further investigations
- How is the waste in research distributed?
- Are misconduct and fraud equally common in high impact journals as lower impact ones? This matters because the value of finding errors in influential papers is much higher than finding errors in papers that have little impact on other people’s research (even though they contribute to wasted resources). The tractability of this cause depends on this.
- Is biomedical science the most important field to fix or should all of science be targeted?
- How should the bounties be structured to correctly incentivize the bounty hunters while also being well-aligned with the research’s importance to society?
- Survey of graduate students to elicit minimum prices?
- Could bounties be based on a paper’s perceived quality in terms of for example number of citations, and journal impact factor?
- Can we create stronger consequences for those found guilty of fraud?
- With journals often being slow and unresponsive to allegations of fraud and misconduct, how do you disincentivize these practices? What about the funders' role?