This anonymous essay was submitted to Open Philanthropy's Cause Exploration Prizes contest and posted with the author's permission.
If you're seeing this in summer 2022, we'll be posting many submissions in a short period. If you want to stop seeing them so often, apply a filter for the appropriate tag!
Can we trust science?
The question which I pose and the funding which I propose that Open Philanthropy fund in the future is related to this simple (albeit terrifying for a scientist) query: can we trust the science upon which more science is founded and upon which we make policy decisions?
I write this in the still expanding aftermath of bombshell findings of fabrication and fraud in the field of Alzheimer’s research. As discussed in an investigation piece in Science (Piller, 2022), Vanderbilt University neuroscientist and professor Dr. Matthew Schrag, in the process of investigating an Alzheimer’s drug under consideration by the FDA, uncovered potential misconduct with its roots deep in the literature on Alzheimer’s disease. Without extensively reiterating the tremendous investigation and piece published in Science, the issue boils down to one of misconduct decades ago: image manipulation in a 2006 Nature paper (Lesné et al., 2006). Alzheimer’s disease was identified in 1906 and by 2006, research around the disease had somewhat stagnated. The cause of the disease – and hence any therapies to treat or cure it – were not clear. But, the 2006 piece reinvigorated the field by identifying “the first substance … in brain tissue … that has been shown to cause memory impairment” (Piller, 2022). The field was completely redirected, in line with investigating this substance. As is quoted from a researcher in Piller (2022), the finding “…kind of turned the field on its head.” Work proceeded investigating this substance, with other possible explanations for Alzheimer’s becoming minority perspectives in the field.
Decades of research later, Dr. Schrag enters thisstory. Dr. Schrag uncovered potential misconduct in Lesné et al. (2006) – the very paper which redirected the field decades before. Specifically, Dr. Schrag identified what appears to be image manipulation in a number of published images in the paper. At this time, Dr. Schrag does not accuse the authors of Lesné et al. (2006) of misconduct or fraud. He says that would require review of the original, complete, and unpublished images, as well as raw numerical data, from the original research. According to Piller (2022), Dr. Schrag says that he focuses on “…what we can see in the published images, and describe them as red flags, not final conclusions...The data should speak for itself.” However, in a broader review, conducted as part of the Piller (2022) investigation, external reviewers noted that the authors of the Lesné et al. (2006), “…appeared to have composed figures by piecing together parts of photos from different experiments”. The expert goes on to say that “The obtained experimental results might not have been the desired results, and that data might have been changed to…better fit a hypothesis.”
One can easily see the redirection of the research on Alzheimer’s disease around this misconduct: the NIH spent $1.6 billion dollars (approximately half of the total funds spend on Alzheimer’s research) on projects related to the substance “uncovered” in Lesné et al. (2006) in this fiscal year. This pecuniary cost, however, says nothing about the time and effort of scientists spent in chasing down a false trail, as well as the opportunity cost of the decades spent studying something with little to no prospect of actually leading to a cure or therapy for Alzheimer’s. And it says nothing about the cost of the lost of trust in science, the loss of hope for families, and the reverberations of these impacts across the policy and political space.
Of course, this is an egregious case and one that touches a nerve for anyone who has long been hoping for a cure for Alzheimer’s – and for anyone who has hoped for a treatment for any disease. If we cannot trust the research about Alzheimer’s, can we trust the research about cancer, about diabetes, about heart disease, or about any other disease or illness? This of course, extends to more recent and more debated disease as well, such as COVID-19. And, further beyond the field of medicine, can we trust the science in other fields, in particular the “big things” like climate change, genetically modified foods, child development, or education?
According to Pew Research Center, Americans’ trust in scientists has been trending downwards in recent years, currently falling below pre-pandemic levels. Only 29 percent of U.S. adults said that they have a great deal of confidence in medical science, down from 40 percent in November of 2020. These trends hold for scientists in general, for whom only 29 percent of U.S. adults said they have a great deal of confidence, compared with 39 percent in November of 2020 (Kennedy et al., 2022). While large majorities of Americans continue to have, at least, a fair amount of confidence in medical scientists and scientists to act in the best interest of the public (78 and 77 percent, respectively), incidents like the case of Alzheimer’s undermine that confidence and trust (Kennedy et al., 2022).
Trust in science in imperative for the betterment of lives, livelihoods, and society through research. Trust is foundational to any relationship, but for the benefit of the advancement of science and for those who support it and benefit from it, trust is essential. To quote Parikh (2021): “…a scientific endeavor that is not trusted by the public cannot adequately contribute to society and will be diminished as a result.” Scientific misconduct, falsification, and fraud all undermine this trust. This then, raises my initial posed question: can we trust the science upon which more science is founded and upon which we make policy decisions?
Improving Science and Increasing Trust
This question and related concerns, naturally, raises more questions, in particular: how can we increase trust in science? Recent decades have seen increasing pushes for open science, improving transparency through replicability and reproducibility of scientific research. But, neglected in this area is the actual work of replicating and reproducing science.
In this proposal, I present that trust in science is essential and that to bolster and increase trust, we must create incentives to reproduce and replicate science. I further pose that incentives in science are misaligned for this to happen through the process of science and research itself, but that funding is required to achieve this and – by extension – to achieve trust in science for the long term. So, I ask the reader to move forward with this idea in mind: we must incentivize scientists to replicate each other’s work, in order to increase trust in science and to improve science, for the public good.
Manipulations in (and out of) the Bounds of Propriety
In the humorous and informative How to Lie with Statistics, Darrell Huff writes:
A great many manipulations and even distortions are possible within the bounds of propriety. Often the statistician must choose among methods, a subjective process, and find the one that he will use to represent the facts.
Although focused directly on the manipulation of statistics (and not other potential types of manipulation or fraud, like the manipulation of images as in Lesné et al. (2006)), Huff’s book exposes numerous cases of statistical manipulation – often within, though sometimes outside of – the bounds of propriety and acceptable use. These manipulations are typically used to justify or defend a particular position, held by the author (either by their own conviction or through a gained conviction through payment). The book, which was enormously popular when first published in the 1950s (and still a delightful and charming read today), iterates how human biases can result in the slow skewing of science – driving bad policy and bad outcomes. People can – and do – drive bad science, through our own predispositions, deliberate and accidental, in the lines of propriety and acceptability, and outside of them. Because of these human frailties, it is not possible to entirely eliminate misconduct, fabrication, and fraud in science.
Regardless of the possibility for misconduct, most people around the world concur that the cultivation and generation of knowledge is important for the long-term improvement of our lives and livelihoods. Although human nature and our species’ inherent shortcomings ultimately result in the misalignment of incentives to do good science, it is possible to investigate and discover that science that may not be good. But, what is necessary for these investigations is not simply a cultural change in the culture of scientific research, but a shift in incentives for uncovering misconduct.
In the following three subsections, I use the ITN framework to present the case that this is an opportune topic to be supported by Open Philanthropy, by discussing the importance, neglectedness, and tractability of uncovering misconduct.
In order for science to improve lives, it must be trusted. Misconduct, whether due to deliberate fraud or simply sloppiness, undermines this. Trust is not just undermined, in many cases, but also hope. Trust and hope are difficult concepts to actually quantify, but have non-pecuniary value to individual humans and society at large. The misconduct uncovered in the Alzheimer’s study discussed in the Trust and Science section exemplifies this: one can easily imagine a family, eagerly awaiting a treatment for a beloved mother – who now finds out that the foundation on which that treatment was based is likely fraudulent and that treatment will not help their mum. And, not just that, but due to the direction of that field on a fraudulent foundation other possible therapies and treatments are even further away. This medical example is almost visceral: who among us cannot imagine this, if not for Alzheimer’s, then for another disease or condition faced by ourselves or a family member.
So, to further demonstrate this importance, beyond the field of medicine, consider another study, a case of proven misconduct: that of data fabrication by Michael LaCour. In 2014, then graduate student Michael LaCour and Dr. Donald Green published a study titled “When contact changes mind: An experiment on 1 In an ironic aside, Huff himself received funds from the Tobacco Institute to counter the Surgeon General’s report that smoking is bad for one’s health (Gelman, 2012). He wrote a book, which was never published, attempting to show that smoking was not bad for people, in direct conflict with medical scientists and doctors. Talk about a manipulation of statistics to a specific purpose! 2 This is, in a large way, already underway, with respect to improving open science, in particular. I discuss this more in the Neglectedness subsection. transmission of support for gay equality” in Science. LaCour was a graduate student at UCLA and Green was (and is) a professor at Columbia University. Their study presented a remarkable finding: going door to door to persuade people to support same-sex marriage is effective, and is particularly effective in cases where the canvasser delivering the persuasive message is themselves gay. Further spillover effects were identified suggesting that people who lived with those who had spoken to a gay canvasser became more supportive of same-sex marriage. An incredible finding! A tremendous discovery for science, for communication, for information, for persuasion! The research was widely covered in the media and immediately piqued the interest of policymakers, who regularly seek to change people’s minds on a multitude of topics.
But, the paper was retracted in 2015, as subsequent work demonstrated that LaCour fabricated the data. The fraud was uncovered by two UC Berkeley graduate students, David Broockman and Josh Kalla. Broockman and Kalla identified that LaCour must have taken a preexisting survey, added some statistical noise and then passed it off as the findings of a canvassing experiment (Brookman et al., 2015). When suspicions rose, it was found that the study's raw data could not be traced to the survey platform LaCour claimed to have used. LaCour said he'd deleted the source files by accident, but the survey platform found no evidence that happened. LaCour declined to share contact information for the survey respondents. The entire experiment was a fraud.
The Lesné et al. paper and the LaCour and Green paper are simply two examples of misconduct – and, at that, just two examples of misconduct that was actually discovered. Due to the reaches of science and research into so much our existence, it is possible for such misconduct to permeate the lives of – not to be dramatic – everyone. Even the most fortunate of us will have a brush with disease, through ourselves or our families. We all live in a world influenced by research-driven policy. We all end up touched by science and rely on it to be true to improve our lives and livelihoods. The importance of being able to trust science is difficult to understate.
Incentives are simply not aligned for researchers to dedicate time to performing replications and reproductions. There is some effort and some work to replicate papers within fields, but it remains patchy, segmented, and individualized. For those of us in social science, there is a great deal of talk about the replication crisis, but the actual work of replication still seems woefully incomplete. There are some bodies which support replications, however. In development, the International Initiative for Impact Evaluation (3IE) funded replications, in order to fill knowledge gaps in areas with limited evidence. This has generally been a minority of their funding portfolio and requests for replications are directed. Replications tend to focus on a specific area of interest for the organization, rather than investigating science at large, or popular or influential science within a variety of fields.
Examining the current field of replications and reproductions, it is a much more diffuse, individualized, and thus, informal network at present. There are researchers who do it, often motivated by curiosity. However, due to the required time, low payoff, and potential for retribution, there are low and few incentives for this work to be undertaken in any systematic way.
It is worth noting the individualized efforts of people who have undertaken replications that have made a large impact: Broockman and Kalla identified LaCour’s fraud (Broockman et al., 2015); Tim van der Zee, Jordan Anaya, and Nicholas Brown discovered misconduct by economist Dr. Brian Wansink; Andrew Gelman’s blog frequently digs into similar ideas; and Data Colada from Uri Simonsohn, Leif Nelson, and Joe Simmons regularly publishes their own investigations, as well as provides a location for others to share their discoveries (sometimes anonymous for the protection of the replication authors who may fear retribution). These are just a few examples and ones which are selected by this author, as cases where the replication has been influential to my own thinking or work. But, these people are, in all cases, employed elsewhere, doing this important work as a hobby and/or in their “free” time. This is hardly a recipe for a successful long-term strategy for uncovering misconduct! While not altogether neglected, it is also not work that is being systematically and methodologically conducted.
With literal billions (trillions?) of dollars on the line, this is somewhat surprising – particularly due to the involvement of the federal government in providing so many of those dollars. Due to the neglect of this problem, though, there is a tremendous opportunity for significant impact per dollar. As an exercise in a back of the envelope calculations, it is worth evaluating. For this, consider a very simplified case: the financials of what it cost to uncover the Alzheimer’s fraud, by paying Dr. Schrag ($18,000), compared with the funding of the NIH for Alzheimer’s research in this avenue of investigation in a single fiscal year ($1.6 billion).
18,000 / 1,600,000,000 = 0.00001125 = 0.001125%
This hundredth of a percent cost to uncover fraud versus fund research based on a fraud represents only a single year. Now, let’s consider that the NIH expended funds in this fraudulent direction every year since its publication in 2006 (15 years, considering the funding through 2021). Assuming that funding likely increased over time, I will conservatively estimate that over the course of the decade and a half since the publication of Lesné et al. (2006), the NIH funded $800,000 per year in the direction of the fraud:
18,000 / (800,000,000*15) = 0.0000015 = 0.00015%
This represents what is likely an underestimate of NIH funding and fails to capture the opportunity cost of researchers, as well as other non-pecuniary and other pecuniary costs associated with putting, as the saying goes, good money after bad.
Funding for fraud investigations is virtually non-existent at present and so the ability to make an impact through providing funds to researchers working on replicating and reproducing research, even with relatively small monies, is tremendous. A large, systematic, and regularized investment in misconduct investigations would have a commensurate impact with the size of that investment.
It is worth mentioning here the parallel work which is done – and is more supported – within the replicability and reproduction environment. This includes work which supports improving open science. Much of this is supported, or at the very least required, by federal funding agencies, who seek to impose regulations on the work they fund in order to ensure its quality (e.g. the NIH RePORT). But this also includes agencies like the Berkeley Initiative for Transparency in Social Sciences, which supports workshops on improving open science in fields of psychology, economics, and other social sciences. These are important and parallel movements – but disparate for work directly funded to uncover misconduct. Though the two go hand in hand, support for direct discovery of misconduct through replications, is not widely supported.
While there are people working on replications, the work is largely patchy and disconnected. It is also often self-supported and self-funded, which creates little incentive for more researchers to take it on. Much of the current work of replication is supported simply by researchers’ own interest and personal altruism. To some extent, this is solvable with money – and thus an inherently tractable problem. Realigning incentives to do this work would create a market for it to be done – and as people are already doing it for nothing, with funds, it is likely to attract even more people do to it. To more concretely discuss the tractability of this, I present in the next section three possible mechanisms for funding, at different scales, which could address this problem.
Fundable Ideas for a Fix
There are a number of ways in which funding could help to improve science, through the mechanism of supporting replications and reproductions. I present three for consideration. These are vastly simplified ideas, but provide a basis for what I could imagine being scaled into a full, large-scale proposal, if the idea of uncovering misconduct to improve trust in science were to be a funding opportunity for Open Philanthropy.
1. Creation of a misconduct discovery “think tank”. Although I was working on a version of this proposal when I saw his tweet, I want to credit Dr. Ryan Briggs for putting forward the idea of “a little think tank full of Andrew Gelman or David Roodman clones with free coffee”. This would be the most formal and likely the most expensive, but also most effective, way to systematically uncover and address research misconduct. Working on a pre-specified set of criteria (e.g. in X field, any paper with more than Y citations), researchers could systematically replicate papers and/or methods, with ample opportunity to dig into questionable or curious results. This option would provide funding for both scale and depth – investigating misconduct at a larger scale, in a variety of fields, for as many papers as necessary.
2. Creation of a community of practice, centered around misconduct. This would take a slightly less structured approach to the “think tank” but be similarly modeled, with a variety of individuals working in a variety of fields, on a certain set of criteria, in order to under misconduct in science and to provide additional support for trustworthy, good science. This could be executed in any number of ways, but I envision it as a network of graduate students, funded by Open Philanthropy, guided by professors in their respective field. This would achieve a few goals. First, it would have the effect of uncovering misconduct and providing support for good science. Next, it would support graduate study for students, both giving them the opportunity to study, while working on replications and becoming expert at conducting them. This would make their training and potential investment in the future of science and could work to cultivate more individuals and scientists working on replications for the long term. While students and professors in the community of practice would not necessarily be at the same university or institution, they would be formally connected through this community. The community of practice would convene regularly asynchronously (e.g. via Slack), as well as on regular intervals in a face-to-face interaction (e.g. annual conferences or meetings).
3. Development of a misconduct bounty system. This system would be the least formal, but would serve to improve the incentives for those interested in uncovering and investigating misconduct. Researchers could be paid a nominal sum for the discovery of misconduct and be paid a similar sum for proving good science, in both cases through replicating and reproducing work. It would be imperative that both discovered misconduct is rewarded, as well as cases in which misconduct is not discovered. As we hope most science is actually good science, we would not want to discourage replicators who do this work successfully – proving the good work – by only rewarding cases of found fraud.
Science is For All
Science is for all of us. We can all be scientists. We can all benefit from science. However, for a variety of reasons, over time, misconduct in science permeates the community, eroding trust. Somehow, scientists have found ourselves unguarded. We have long operated in a scenario of trust – but it seems that is has been unearned. We now find ourselves in a circumstance of questioning quis custodiet ipsos custodes (who will guard the guards themselves)? But, when we examine the situation: no one is guarding us – not even ourselves.
In this proposal, I have recommended the creation of guardrails – a metaphorical creation of a guard – and proposed several funding opportunities which might be of interest to Open Philanthropy, all in the interest of improving trust in science, through discovery of misconduct. Science is for all and we must earn the trust that, so far, has simply been given to us.
Aiken, A.M., C. Davey, J.R. Hargreaves, R. J. Hayes. (2015). Re-analysis of health and educational impacts of a school-based deworming programme in western Kenya: a pure replication. International Journal of Epidemiology 44 (5): 1572 – 1580
Broockman, D., J. Kalla, P. Aronow (2015). Irregularities in LaCour (2014). Mimeo, Stanford University
Davey, C., A.M. Aiken, R.J. Hayes, J.R. Hargreaves. (2015). Re-analysis of health and educational impacts of a school-based deworming programme in western Kenya: a statistical replication of a cluster quasi-randomized stepped-wedge trial. International Journal of Epidemiology 44 (5): 1581 – 1592
Evans, D. (2015). Worm wars: The anthology. World Bank Blogs, Development Impact: https://blogs.worldbank.org/impactevaluations/worm-wars-anthology
Huff, D. (1954). How to life with statistics. W.W. Norton & Company: New York
Gelman, A. (2012). Ethics and statistics: Statistics for cigarette sellers. Chance 25 (3): 43 – 46
Kennedy, B., A. Tyson, C. Funk. (2022). Americans’ trust in scientists, other groups declines. Pew Research Center: https://www.pewresearch.org/science/2022/02/15/americans-trust-in-scientists-othergroups-declines/
Lesné, S., M. Teng Koh, L. Kotilinek, R. Kayed, C.G. Glabe, A. Yang, M. Gallagher, K.H. Ashe. (2006). A specific amyloid-β protein assembly in the brain impairs memory. Nature 440: 352 – 357
Miguel, E., M. Kremer. (2004). Worms: Identifying impacts on education and health in the presence of treatment externalities. Econometrica 72 (1): 159 – 217
Parikh, S. (2021). Why we must rebuild trust in science. PEW: Trend Magazine: https://www.pewtrusts.org/en/trend/archive/winter-2021/why-we-must-rebuild-trust-inscience.
Piller, C. (2022). Blots on a field: A neuroscience image sleuth finds signs of fabrication in scores of Alzheimer’s articles, threatening a reigning theory of the disease. Science 377 (6604): https://www.science.org/content/article/potential-fabrication-research-images-threatens-keytheory-alzheimers-disease
In an ironic aside, Huff himself received funds from the Tobacco Institute to counter the Surgeon General’s report that smoking is bad for one’s health (Gelman, 2012). He wrote a book, which was never published, attempting to show that smoking was not bad for people, in direct conflict with medical scientists and doctors. Talk about a manipulation of statistics to a specific purpose!
This is, in a large way, already underway, with respect to improving open science, in particular. I discuss this more in the Neglectedness subsection.
3 Perhaps their most famous supported replication was that of Miguel and Kremmer (2004), published as Aiken et al. (2015) and Davey et al. (2015). These replications created the infamous “Worm Wars”, which both demonstrated the importance of replications for science – but also the low understanding of many scientists and the public at large, about replications. For an excellent discussion of the papers themselves and the subsequent brouhaha around the replication, see Evans (2015).