One of the most popular
statements on science journals goes like this: “This experiment was repeated three times with similar results”,
but I ask you, have you ever really thought about it while reading it?
For most types of experiment,
there is an unstated requirement that the work be reproducible, at least once,
in an independent experiment, with a strong preference for reproducibility in
at least three experiments. The assumption that experimental findings are
reproducible is a key criterion for acceptance of a manuscript and it is
generally insisted that scientist should include sufficient technical
information to allow the experiments to be repeated. But what is reproducibility and is it really that
important?
Reproducibility vs replicability
Although many biological scientists believe that
the reproducibility of an experiment means that it can be replicated,
they actually are two different terms. Reproducibility requires changes, whereas
replicability avoids them. In other words, reproducibility refers to a
phenomenon that can be predicted to recur even when experimental conditions may
vary to some degree. On the other hand, replicability describes the ability to obtain
an identical result when an experiment is performed under precisely identical
conditions.
Scientists are generally
interested in the reproducibility of results rather than the precise
replication of experimental results. Why? Well, some variation of conditions is
considered desirable because obtaining the same result without the exactly same
experimental conditions implies a certain robustness of the original finding.
When findings are so dependent on precise experimental conditions that
replicability is needed for reproducibility, the result may be idiosyncratic
and less important than a phenomenon that can be reproduced by a variety of
independent, non-identical approaches.
The ‘cosmic vacuum cleaner’ case
So we talk about
reproducibility, but what happens with one-time events? How reproducibility fit
in those scientific observations? One-time events exist and thus are irreproducible,
but not because of that they are less important, in fact they can still be a tremendously
important source of scientific information. This is particularly true for
observational sciences, in which inferences are made from events and processes
not under an observer's control. Take for example, the collision of the comet ‘Shoemaker-Levy’
with Jupiter in July 1994. Jupiter’s strong gravitational influence leads to
many small comets and asteroids colliding with the planet, with a rate of
impact being between 2000 and 8000 times higher than on Earth, thus gaining the
name of ‘cosmic vacuum cleaner’ (cool name ha?). Anyways, that event provided a
huge amount of information on Jupiter’s atmospheric dynamics and evidence for
the threat of meteorite and comet impacts. Consequently, the criterion of
reproducibility is not an essential requirement for the value of scientific
information, at least in some fields.
But in general, this is how
published science ‘sees’ reproducibility:
1. Published science is EXPECTED to be reproducible. But most scientists are not interested in replicating
published experiments.
2. Published science is ASSUMED to be reproducible. The assumption that science must be reproducible
is implicit… yet seldom tested. In
fact, the emphasis on reproducing experimental results becomes important only
when work becomes controversial or called into doubt. Hence, the solidity of this bedrock
assumption of experimental science lies largely in the realm of belief and
trust in the integrity of the authors.
Why don’t we care about reproducibility?
Before trying to give an answer to
this question, I would like to share a thought. It seems to me that it is very
frequent in modern science that scientists ‘program’ their brains to expect a
result, it’s kind of like they do not search for an answer for a given
phenomena anymore, but instead they expect one, and only one result. And so,
yes… they can obtain similar results in three independent experiments, but not
tell that completely different results were observed in dozens of other
experiments. It is easy to choose what to show right? So real reproducibility
should be taken as a statistically significant result not in those three good-looking
results, but in the dozens as a whole.
That said, the question is, do we really care about
reproducibility? Scientists move ahead in their careers by publishing papers in
top journals. But when they conduct experiments that don’t seem to show
anything — when a technique fails or a hypothesis is not confirmed — they often
publish nothing at all, even though the failure may be deeply informative. That
thinking corrupts the way people do science itself. For their part, the
journals that referee top research will rarely publish papers on experiments
that didn’t work (besides the great Journal of Negative Results).
Some researchers even believe that as competition for
limited funding and slots in leading journals has intensified, the pressures
have increased to draw sweeping conclusions from evidence that may not fully
support it.
Last year, I heard what for me was the most shocking
consequences of irreproducible findings when Dong-Pyou Han altered blood
samples to make it appear he had achieved a breakthrough toward a potential
vaccine against HIV. He was sentenced to four and a half years in prison and to
pay US$7.2 million to a federal government agency that funded the research. But, though academic fraud exists, most irreproducible
findings simply result from the trial and error inherent in the scientific process.
Things just happen.
Who cares about reproducibility?
The
irreproducibility problem is being recognized at the highest levels; the US White
House’s Office of Science and Technology Policy mentioned it last summer in a
request for public comments on innovation strategy. “Given recent evidence of
the irreproducibility of a surprising number of published scientific findings,
how can the federal government leverage its role as a significant funder of
scientific research to most effectively address the problem?” the document
said.
Some
proposals focus on the journals that publish research. Dozens of journals
signed a pledge with the US National Institutes of Health (NIH) to adopt
measures that will help enable scientists to repeat experiments easily, allowing
them to determine more easily which findings are reproducible and which are
not.
The NIH itself is also taking
steps to improve reproducibility. In an
announcement made last June,
the agency describes four new
criteria that grant reviewers will be asked They include the strength of the
scientific premise that the proposed study builds on; the rigor of the study’s
design; the proposal’s consideration of the sex of research animals or human
subjects; and whether reagents have been authenticated. (http://grants.nih.gov/grants/guide/notice-files/NOT-OD-15-103.html).
How much money does irreproducibility
cost?
$28 billion is spent in the
United States each year on preclinical research that can’t be reproduced by
other researchers. That’s the conclusion of a provocative analysis published in
2015 in part by economists who based it on past studies of error rates in
biomedical studies.
To come up with that
number, biologist Leonard Freedman, president of the nonprofit Global
Biological Standards Institute (GBSI) in Washington, D.C. and economists
Iain Cockburn and Timothy Simcoe of Boston University combed the literature for
two dozen or so studies that tried to quantify how many biomedical papers are
flawed because of specific problems such as a contaminated cell line. Looking
across these data, they estimate that 53% of all preclinical studies have
errors that mean they are not reproducible. The most common reasons included
problems with reagents and reference materials (36%), study design (28%), data
analysis and reporting (25%), and laboratory protocols (11%).
The 53% is roughly comparable to a handful of “top-down”
studies that tried to reproduce a set of findings. For example, a widely cited
analysis by Amgen found that only 11% of 53 preclinical cancer papers could be
reproduced in the company’s labs (which I refered to as well on my previous
post regarding the trustability of antibodies). Other studies found a rate
closer to 50%. From there, calculating the economic impact was simple. Rounding
the 53% to a “conservative” irreproducibility estimate of 50%, the researchers
multiplied by the $56 billion a year that NIH and other U.S. public and private
funders spend on preclinical research. That yields $28 billion in irreproducible
preclinical research (not counting how much money it costs in non-preclinical
research, so imagine how high that number can get).
When repeatability and reproducibility relies on
instruments (don’t’ be afraid on statistics!)
I recently wrote a post about the importance of validating
antibodies, but truth is not only reagents should be validated but every method
should be as well to make sure that it is suitable for its intended purpose. One
of the prerequisites for this is to verify that the instruments used generate
reliable and consistent data. Now, one of the main parameters that need to be
checked is precision, so never take this for granted when using or acquiring
new instruments for your lab. Here is some useful information for this:
Experimental error is defined as the difference between an
experimental value and the actual value of a quantity. This difference
indicates the accuracy of the measurement.
The accuracy is a measure of the degree of closeness of a measured or
calculated value to its actual value. The precision
of a measurement is a measure of the reproducibility of a set of measurements. This
is explained so good in the diagram below!
I always got scared about statistics, because there are so
many names, so many formulas right? but I’ll explain what you need to know to
understand precision in a very simple way. Measurement errors can be divided
into two components: random error and systematic error. A random error is related to the precision of the instrument.
These are inherent errors that are dependent on the instrument
and cannot be eliminated without changing the instrument. A systematic error is human error. These are errors related to
imperfect experimental technique. Some examples include errors in experimental
readings and imperfect instrument calibration. Of course systematic errors may
be decreased as the lab techniques of the analyst get better.
Two statistical parameters used to analyze precision of an
instrument are standard deviation (SD)
and relative standard deviation (RSD),
so whenever you get validating data for a given instrument pay attention to these
parameters. The precision of a set of measurements can be determined by
calculating the SD for a set of data. SD is a
measure of dispersement or how much your data is spread out. Specifically, it
shows you how much your data is spread out around the mean or average.
For example, are all your scores close to the average? Or are lots of scores
way above (or way below) the average score?
On the other hand, the RSD tells you whether the
“regular” SD is a small or large quantity when compared to the mean for the data set. For example, if a given RSD is 2.3% it
means that SD is 2.3% of the mean, which is pretty small. In other words, the
data is tightly clustered around the mean. On the other hand, if your RSD is
large, say 55%, this would indicate your data is more spread out. The RSD is
sometimes used for convenience but it can also give you an idea about how
precise your data is in an experiment. The more precise your data, the smaller
the RSD.
Below I show an example of these parameters of a given
instrument (Prometheus NT.48 for analyzing protein stability and agreggation).
Table 1 gives an overview of the precision regarding
temperature control (Tm) and fluorescence detection (F350/F330 Ratio) for the
48 sample positions across 10 instruments. The obtained values show the
outstanding repeatability and intermediate precision of this instrument.
Limits of replicability and reproducibility
Although the ability
of an investigator to confirm an experimental result is essential to good
science, there are practical limits to the replicability and reproducibility of
findings.
Replicability is likely to be inversely proportional to the number
of variables in an experiment. Every variable contains a certain degree of
error. Since error propagates linearly or nonlinearly depending on the system,
one may conclude that the more variables involved, the more errors can be
expected, thus reducing the replicability of an experiment. Scientists may
attempt to control variables in order to achieve greater reproducibility but
must remember that as they do so, they may progressively depart from the
heterogeneity of real life. Moreover, statistical analysis would not be
required if biological experiments were precisely replicatable right?
Although errors may be minimized by good
experimental technique, they cannot be eliminated entirely. There are other
sources of variation in the experiment that are more difficult to control. For
example, mouse groups may differ, despite being matched by genetics, supplier,
gender, and age, in such intangible areas as nutrition, stress, circadian
rhythm, etc. The outcomes of complex processes such as infection and the host
response do not often manifest simple dose-response relationships. Inherent
stochasticity in biological processes and anatomic or functional bottlenecks
provide additional sources of experiment-to-experiment variability.
“When papers are written and data are presented in public it looks like everything is just perfect, and that is not what science is; Science is an imperfect human activity that we try to do as best we can.” Bjorn Olsen, Harvard cell biologist.
No comments:
Post a Comment