“Getting it right means regularly revisiting past assumptions and past results and finding new ways to test them. The only way science is successful and credible is if it is self-critical.”
THE INTRODUCTIONScience Alert:
A report on the issue, published in Nature this May, found that about 90 percent of some 1,576 researchers surveyed now believe there is a reproducibility crisis in science.
Effectively, this is due to the reporting of 'false discoveries' – hard-to-reproduce results that are kind of like noise in scientific data, but which are singled out for reporting by scientists in their papers because they're new, sensational, or somehow surprising.
These kinds of findings capture our human interest because of their novelty and shock factor – but they risk damaging the credibility of science, especially since scientists feel under pressure to embellish or skew their papers towards making these kinds of impressions.
But it's a vicious cycle, because these sorts of remarkable studies create a lot of attention and help researchers get published, which in turn helps them get grants from institutions to conduct more research.
"As part of the scientific enterprise," says Ramal Moonesinghe and colleagues, "we know that replication--the performance of another study statistically confirming the same hypothesis--is the cornerstone of science and replication of findings is very important before any causal inference can be drawn. The authors say that their new demonstration "should be encouraging news to researchers in their never-ending pursuit of scientific hypothesis generation and testing."
"Obtaining absolute 'truth' in research, say Djulbegovic and Hozo, "is impossible, and so society has to decide when less-than-perfect results may become acceptable."
THE EVIDENCEAccording to work presented in Science, fewer than half of 100 studies published in 2008 in three top psychology journals could be replicated successfully. The international effort included 270 scientists who re-ran other people's studies as part of The Reproducibility Project: Psychology, led by Brian Nosek of the University of Virginia.
Last summer, Leonard P. Freedman, a scientist who worked for years in both academia and big pharma, published a paper with two colleagues on “the economics of reproducibility in preclinical research.” After reviewing the estimated prevalence of each of these flaws and fault-lines in biomedical literature, Freedman and his co-authors guessed that fully half of all results rest on shaky ground, and might not be replicable in other labs. These cancer studies don’t merely fail to find a cure; they might not offer any useful data whatsoever.
In 2011, a team from Bayer had reported that only 20 to 25 percent of the studies they tried to reproduce came to results “completely in line” with those of the original publications. There’s even a rule of thumb among venture capitalists, the authors noted, that at least half of published studies, even those from the very best journals, will not work out the same when conducted in an industrial lab.
In 2012, the former head of cancer research at Amgen, Glenn Begley, brought wide attention to this issue when he decided to go public with his findings in a piece for Nature. Over a 10-year stretch, he said, Amgen’s scientists had tried to replicate the findings of 53 “landmark” studies in cancer biology. Just six of them came up with positive results.
False DiscoveriesThey are false-positive findings and lead to the erroneous perception that a definitive scientific discovery has been made.
This high rate occurs because the studies that are published often have low statistical power to identify a genuine discovery when it is there, and the effects being sought are often small.
Despite entreaties to increase statistical power, for example by collection of more observations, it has remained consistently low for the last 50 years.
In some fields, it averages only 20 to 30 percent. Natural academic selection has favoured publication of a result, rather than generation of new knowledge.
The impact of Darwinian selection among scientists is amplified when government support for science is low, growth in the scientific literature continues unabated, and universities produce an increasing number of PhD graduates in science.
We hold an idealised view that science is rarely fallible, particularly biology and medicine. Yet many fields are filled with publications of low-powered studies with perhaps the majority being wrong.
Bad PracticesFurther, dubious scientific practices boost the chance of finding a statistically significant result, usually at a probability of less than one in 20. In fact, our probability threshold for acceptance of a discovery should be more stringent, just as it is for discoveries of new particles in physics.
The English mathematician and the father of computing Charles Babbage noted the problem in his 1830 book Reflections on the Decline of Science in England, and on Some of Its Causes. He formally split these practices into "hoaxing, forging, trimming and cooking".
In the current jargon, trimming and cooking include failing to report all the data, all the experimental conditions, all the statistics, and reworking the probabilities until they appear significant. The frequency of many of these indefensible practices is above 50 percent, as reported by scientists themselves when they are given some incentive for telling the truth.
Publish or Perish"The cultural evolution of shoddy science in response to publication incentives requires no conscious strategising, cheating, or loafing on the part of individual researchers," writes Paul Smaldino, a cognitive scientist who led the work at the University of California, Merced. "There will always be researchers committed to rigorous methods and scientific integrity. But as long as institutional incentives reward positive, novel results at the expense of rigour, the rate of bad science, on average, will increase."
And the problem is only compounded further by quantitative measures designed to rate the importance of researchers and their papers – as these kinds of measures, such as the controversial p-value – can be misleading and exploited, creating all kinds of false impressions that ultimately hurt science.
"I agree that the pressure to publish is corrosive and anti-intellectual," neuroscientist Vince Walsh from University College London in the UK, who wasn't part of the study, told The Guardian. "Scientists are just humans, and if organizations are dumb enough to rate them on sales figures, they will do discounts to reach the targets, just like any other sales person."
THE VERDICTSo, what's the solution? Well, it won't be easy, but Smaldino says we need to move away from assessing scientists quantitatively at an institutional level.
"Unfortunately, the long-term costs of using simple quantitative metrics to assess researcher merit are likely to be quite great," the researchers write in their paper. "If we are serious about ensuring that our science is both meaningful and reproducible, we must ensure that our institutions incentivise that kind of science."
In the meantime, studies like this that shine a critical spotlight on science – which are fairly 'novel' and attention-grabbing in themselves – may help to keep people aware of just how big of an issue this really is. The more people who are aware of the problems in science, and who are committed to improving its institutions," Smaldino told The Guardian, "the sooner and more easily institutional change will come."
» Merton, 1968: "The Matthew Effect in Science"
» Retraction Watch: "Tracking retractions as a window into the scientific process"
» Ioannidis, 2005: "Why Most Published Research Findings Are False"
» Science Alert: "80% of data in Chinese clinical trials have been fabricated"
» The Guardian: "The first imperative: Science that isn’t transparent isn’t science"