Followers

Monday, March 21, 2016

The awfull truth: I cannot reproduce my experiments!!! (Ir)reproducible science





One of the most popular statements on science journals goes like this: “This experiment was repeated three times with similar results”, but I ask you, have you ever really thought about it while reading it?

For most types of experiment, there is an unstated requirement that the work be reproducible, at least once, in an independent experiment, with a strong preference for reproducibility in at least three experiments. The assumption that experimental findings are reproducible is a key criterion for acceptance of a manuscript and it is generally insisted that scientist should include sufficient technical information to allow the experiments to be repeated. But what is reproducibility and is it really that important?








Reproducibility vs replicability
Although many biological scientists believe that the reproducibility of an experiment means that it can be replicated, they actually are two different terms. Reproducibility requires changes, whereas replicability avoids them. In other words, reproducibility refers to a phenomenon that can be predicted to recur even when experimental conditions may vary to some degree. On the other hand, replicability describes the ability to obtain an identical result when an experiment is performed under precisely identical conditions.

Scientists are generally interested in the reproducibility of results rather than the precise replication of experimental results. Why? Well, some variation of conditions is considered desirable because obtaining the same result without the exactly same experimental conditions implies a certain robustness of the original finding. When findings are so dependent on precise experimental conditions that replicability is needed for reproducibility, the result may be idiosyncratic and less important than a phenomenon that can be reproduced by a variety of independent, non-identical approaches.


The ‘cosmic vacuum cleaner’ case

So we talk about reproducibility, but what happens with one-time events? How reproducibility fit in those scientific observations? One-time events exist and thus are irreproducible, but not because of that they are less important, in fact they can still be a tremendously important source of scientific information. This is particularly true for observational sciences, in which inferences are made from events and processes not under an observer's control. Take for example, the collision of the comet ‘Shoemaker-Levy’ with Jupiter in July 1994. Jupiter’s strong gravitational influence leads to many small comets and asteroids colliding with the planet, with a rate of impact being between 2000 and 8000 times higher than on Earth, thus gaining the name of ‘cosmic vacuum cleaner’ (cool name ha?). Anyways, that event provided a huge amount of information on Jupiter’s atmospheric dynamics and evidence for the threat of meteorite and comet impacts. Consequently, the criterion of reproducibility is not an essential requirement for the value of scientific information, at least in some fields.

But in general, this is how published science ‘sees’ reproducibility:

1.       Published science is EXPECTED to be reproducible. But most scientists are not interested in replicating published experiments.
2.        Published science is ASSUMED to be reproducible. The assumption that science must be reproducible is implicit… yet seldom tested. In fact, the emphasis on reproducing experimental results becomes important only when work becomes controversial or called into doubt. Hence, the solidity of this bedrock assumption of experimental science lies largely in the realm of belief and trust in the integrity of the authors.




Why don’t we care about reproducibility?

Before trying to give an answer to this question, I would like to share a thought. It seems to me that it is very frequent in modern science that scientists ‘program’ their brains to expect a result, it’s kind of like they do not search for an answer for a given phenomena anymore, but instead they expect one, and only one result. And so, yes… they can obtain similar results in three independent experiments, but not tell that completely different results were observed in dozens of other experiments. It is easy to choose what to show right? So real reproducibility should be taken as a statistically significant result not in those three good-looking results, but in the dozens as a whole.

That said, the question is, do we really care about reproducibility? Scientists move ahead in their careers by publishing papers in top journals. But when they conduct experiments that don’t seem to show anything — when a technique fails or a hypothesis is not confirmed — they often publish nothing at all, even though the failure may be deeply informative. That thinking corrupts the way people do science itself. For their part, the journals that referee top research will rarely publish papers on experiments that didn’t work (besides the great Journal of Negative Results).
Some researchers even believe that as competition for limited funding and slots in leading journals has intensified, the pressures have increased to draw sweeping conclusions from evidence that may not fully support it.


Last year, I heard what for me was the most shocking consequences of irreproducible findings when Dong-Pyou Han altered blood samples to make it appear he had achieved a breakthrough toward a potential vaccine against HIV. He was sentenced to four and a half years in prison and to pay US$7.2 million to a federal government agency that funded the research. But, though academic fraud exists, most irreproducible findings simply result from the trial and error inherent in the scientific process. Things just happen.


Who cares about reproducibility?
The irreproducibility problem is being recognized at the highest levels; the US White House’s Office of Science and Technology Policy mentioned it last summer in a request for public comments on innovation strategy. “Given recent evidence of the irreproducibility of a surprising number of published scientific findings, how can the federal government leverage its role as a significant funder of scientific research to most effectively address the problem?” the document said.
Some proposals focus on the journals that publish research. Dozens of journals signed a pledge with the US National Institutes of Health (NIH) to adopt measures that will help enable scientists to repeat experiments easily, allowing them to determine more easily which findings are reproducible and which are not.
The NIH itself is also taking steps to improve reproducibility. In an announcement made last June, the agency describes four new criteria that grant reviewers will be asked They include the strength of the scientific premise that the proposed study builds on; the rigor of the study’s design; the proposal’s consideration of the sex of research animals or human subjects; and whether reagents have been authenticated. (http://grants.nih.gov/grants/guide/notice-files/NOT-OD-15-103.html).


How much money does irreproducibility cost?

$28 billion is spent in the United States each year on preclinical research that can’t be reproduced by other researchers. That’s the conclusion of a provocative analysis published in 2015 in part by economists who based it on past studies of error rates in biomedical studies.

To come up with that number, biologist Leonard Freedman, president of the nonprofit Global Biological Standards Institute (GBSI) in Washington, D.C. and economists Iain Cockburn and Timothy Simcoe of Boston University combed the literature for two dozen or so studies that tried to quantify how many biomedical papers are flawed because of specific problems such as a contaminated cell line. Looking across these data, they estimate that 53% of all preclinical studies have errors that mean they are not reproducible. The most common reasons included problems with reagents and reference materials (36%), study design (28%), data analysis and reporting (25%), and laboratory protocols (11%).

The 53% is roughly comparable to a handful of “top-down” studies that tried to reproduce a set of findings. For example, a widely cited analysis by Amgen found that only 11% of 53 preclinical cancer papers could be reproduced in the company’s labs (which I refered to as well on my previous post regarding the trustability of antibodies). Other studies found a rate closer to 50%. From there, calculating the economic impact was simple. Rounding the 53% to a “conservative” irreproducibility estimate of 50%, the researchers multiplied by the $56 billion a year that NIH and other U.S. public and private funders spend on preclinical research. That yields $28 billion in irreproducible preclinical research (not counting how much money it costs in non-preclinical research, so imagine how high that number can get).

When repeatability and reproducibility relies on instruments (don’t’ be afraid on statistics!)

I recently wrote a post about the importance of validating antibodies, but truth is not only reagents should be validated but every method should be as well to make sure that it is suitable for its intended purpose. One of the prerequisites for this is to verify that the instruments used generate reliable and consistent data. Now, one of the main parameters that need to be checked is precision, so never take this for granted when using or acquiring new instruments for your lab. Here is some useful information for this:

Experimental error is defined as the difference between an experimental value and the actual value of a quantity. This difference indicates the accuracy of the measurement.


The accuracy is a measure of the degree of closeness of a measured or calculated value to its actual value. The precision of a measurement is a measure of the reproducibility of a set of measurements. This is explained so good in the diagram below!




I always got scared about statistics, because there are so many names, so many formulas right? but I’ll explain what you need to know to understand precision in a very simple way. Measurement errors can be divided into two components: random error and systematic error. A random error is related to the precision of the instrument.


These are inherent errors that are dependent on the instrument and cannot be eliminated without changing the instrument. A systematic error is human error. These are errors related to imperfect experimental technique. Some examples include errors in experimental readings and imperfect instrument calibration. Of course systematic errors may be decreased as the lab techniques of the analyst get better.

Two statistical parameters used to analyze precision of an instrument are standard deviation (SD) and relative standard deviation (RSD), so whenever you get validating data for a given instrument pay attention to these parameters. The precision of a set of measurements can be determined by calculating the SD for a set of data. SD is a measure of dispersement or how much your data is spread out. Specifically, it shows you how much your data is spread out around the mean or average. For example, are all your scores close to the average? Or are lots of scores way above (or way below) the average score? On the other hand, the RSD tells you whether the “regular” SD is a small or large quantity when compared to the mean for the data set. For example, if a given RSD is 2.3% it means that SD is 2.3% of the mean, which is pretty small. In other words, the data is tightly clustered around the mean. On the other hand, if your RSD is large, say 55%, this would indicate your data is more spread out. The RSD is sometimes used for convenience but it can also give you an idea about how precise your data is in an experiment. The more precise your data, the smaller the RSD.

Below I show an example of these parameters of a given instrument (Prometheus NT.48 for analyzing protein stability and agreggation).



Table 1 gives an overview of the precision regarding temperature control (Tm) and fluorescence detection (F350/F330 Ratio) for the 48 sample positions across 10 instruments. The obtained values show the outstanding repeatability and intermediate precision of this instrument.


Limits of replicability and reproducibility

Although the ability of an investigator to confirm an experimental result is essential to good science, there are practical limits to the replicability and reproducibility of findings.
Replicability is likely to be inversely proportional to the number of variables in an experiment. Every variable contains a certain degree of error. Since error propagates linearly or nonlinearly depending on the system, one may conclude that the more variables involved, the more errors can be expected, thus reducing the replicability of an experiment. Scientists may attempt to control variables in order to achieve greater reproducibility but must remember that as they do so, they may progressively depart from the heterogeneity of real life. Moreover, statistical analysis would not be required if biological experiments were precisely replicatable right?
Although errors may be minimized by good experimental technique, they cannot be eliminated entirely. There are other sources of variation in the experiment that are more difficult to control. For example, mouse groups may differ, despite being matched by genetics, supplier, gender, and age, in such intangible areas as nutrition, stress, circadian rhythm, etc. The outcomes of complex processes such as infection and the host response do not often manifest simple dose-response relationships. Inherent stochasticity in biological processes and anatomic or functional bottlenecks provide additional sources of experiment-to-experiment variability.



“When papers are written and data are presented in public it looks like everything is just perfect, and that is not what science is; Science is an imperfect human activity that we try to do as best we can.” Bjorn Olsen, Harvard cell biologist.