September 15, 2007
High Error Rates Found In Published Scientific Research
Epidemiologist John Ioannidis says most claimed research findings are wrong due to mistakes made by researchers in design and in analysis of results.
Dr. Ioannidis is an epidemiologist who studies research methods at the University of Ioannina School of Medicine in Greece and Tufts University in Medford, Mass. In a series of influential analytical reports, he has documented how, in thousands of peer-reviewed research papers published every year, there may be so much less than meets the eye.
These flawed findings, for the most part, stem not from fraud or formal misconduct, but from more mundane misbehavior: miscalculation, poor study design or self-serving data analysis. "There is an increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims," Dr. Ioannidis said. "A new claim about a research finding is more likely to be false than true."
The hotter the field of research the more likely its published findings should be viewed skeptically, he determined.
Should government agencies that hand out research grants hand out parallel grants to independent groups to do parallel analyzes of data produced by funded researchers to find errors?
I suspect peer reviewers lack the time, data, and incentives needed to catch errors in research papers. Given the relative cheapness of data storage and data transmission technologies at least for some types of research independent parties should be given a crack at analyzing the same data to see if the results drawn by original researchers are warranted.
Another thought: Could some types of studies have agreed standard data formats with standard analyzes written ahead of time so that different studies could be compared automatically and so that bias would not enter as much into the analysis phase?
Ioannidis published a paper in Plos Medicine in August 2005 entitled Why Most Published Research Findings Are False.
There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.
Step up a level and use the kind of thinking a business executive steeped in the wisdom of W. Edwards Deming would bring to a manufacturing quality problem: Huge amounts of labor goes into conducting scientific studies and therefore a high error rate represents a huge amount of waste of labor and supplies. Science needs more automation and other process improvements to raise quality control and reduce waste.
Shhhh! Mustn't let out any secrets, eh?
If people understood this, they might have to think for a change, instead of simply declaring, "I believe the scientists," in an effort to settle a debate that isn't really settled.
I've had this feeling for a while. As a medical professional with some cursory training in statistics as well as some experience in actually putting together studies while in training, my feeling is that many published studies are crap (and are simply churned out for resume padding). People select projects that are cheap, easy, and quick even if they will fail to answer an important question adequately. Additionally, in medical research at least, there are so many confounding variables and places for systematic bias to creep in undetected that getting a clean study can be hard to obtain, even if a good effort is made.
Even if people are thinking, the complexity of the interacting variables makes drawing good conclusions difficult unless the magnitude of the differences from treatment options is very large. I would assume that this situation is true of any sufficiently complex system.
Another problem in medical studies is that for many drugs the effects are small. In other words, most of our therapies are f'ing useless. The more successful the therapy the less humans you need to prove your case. For example, melanoma metastasis do not spontaneously shrink. A sample of 3 has statistical significance in that case of an effective melanoma therapy. (Even the FDA grudgingly admits this. After a knock-down dirty fight our FDA consultant acknowledged that 5 dogs would prove our case and get us to a pre-IDE meeting.) This is why I toss out any study where thousands are used to test a drug. If the damn thing worked you wouldn't need a thousand patients. I also toss out any study that is less than 3 months in length. It takes 6 weeks alone for enzymes in the body to shift and then you need at least 6 weeks work of live data -- e.g. 3 months min.
High Error Rates Found In Published Scientific Research - including this study?
Well no doubt this is the end result of the "publish or perish" mentality...
Unless, of course, like some Prof.'s you publish studies that question the Bush administration - then it's publish AND perish!
Oh Homo Sapiens - what will it take for you to finally break out of your psuedo Type I civilization?
I've never really understood how public research is funded in any particular area. I don't know but I'm guessing it has a lot to do with the political connections of the principle investigators. To me, it seems that the result is a messy patchwork of studies that have limited statistical power to discover the unknown. When I read medical research papers, I'm kind of shocked at the culture of the experimental design. There's a great emphasis on a few "canned" types of experimental design that worship at the alter of Gauss. This culture emphasizes formality of a particular kind of investigation, at the expense of trying to discover the unexpected. It seems like there is little coordinated planning on how to record information in such a way as to make synthetic "meta-analysis" actually automated or convenient, much less useful and accurate. I worry that the culture of professors guiding underlings who need to show original research to complete academic requirements in a certain way is inhibiting genuine discovery.
I'm not aware of much philosophical work on the art experimental design for discovering the unknown that has influenced medical research. There is quite a bit of work in electrical engineering and in certain parts of OR-centric industrial engineering that have done quite a bit of work on the problem in general however. Consider the hidden Markov models or Kalman filtering extensions that electrical engineers have to learn in school. I remember in college where fresh after learning about Kalman filters, Markov models, and Bayesian methods in the EE classes, I then attended a very interesting lecture from George Box where it seemed like all these extremely powerful ideas that EE's are trained in could then be adapted to systematic empirical discovery of financial and medical models. The ideas seemed much more powerful than Black-Scholes or the simplified statistical methods that are taught in the life-sciences.
It would be interesting if one could develop a well-accepted method of making complex system models using this type of framework that mimic the complicated models we already know about in biological systems. We could then genericize the models and then we could then test various kinds of investigation and experimental designs to see how rapidly they converge on the correct hidden model. From the knowledge built up from that, places like the NIH or the HHMI could then make more intelligent funding decisions and set better standards for keeping and recording information.
Another thing about studies is that the published results create too much prose centered around discussion of the "significant" finding with the underlying valuable data archived in some arcane format that is inaccessible or incompatible with other works. Better standards for how the (unplublished) records are kept, organized, and made accessible could lead to much better derivative data mining efforts. Guidance from experimental design research could suggest which "schemas" would be best for derivative research rather than being a fool's errand for some SQL expert who knows little about the data he's organizing.