This is a primer on research standards.
It has three parts.
Intuitive (but insufficient) evaluation:
Standards of research quality:
Implementation of quality research:
“Evidence-based practice” is kind of like “the right outcome” of a trial.
Nobody's against it – but people have very different opinions of what it is!
But they're alike in another way: Not all opinions are created equal.
“Our program definitely works. Just look at Timmy! He went through it, and turned his whole life around!”
This is not, by itself, evidence of a program's effectiveness.
With enough participants and normal conditions, you're effectively guaranteed to have successes even in a bad program.
In other words:
Anecdotes aren't good evidence!
Imagine two programs doing the same thing.
We normally believe that the 80-unit program is better.
Numeric comparisons – X is bigger than Y – require statistical context.
Without it, they cannot serve as good evidence.
So what can?
MISS. CODE ANN. §27-103-159 gives some relevant definitions, including:
“Evidence-based program” shall mean a program or practice that has had multiple site random controlled trials across heterogeneous populations demonstrating that the program or practice is effective for the population.
An evidence-based program has had:
All of these ensure that some common-sense questions that we have are answered.
(We also want to know whether a program is cost-effective compared to available options, but that's another story)
The importance of effectiveness and generalizability should be pretty clear.
“Distinguishing from noise and error” is just a matter of providing the statistical context that simple numerical comparison does not.
But why randomized, controlled trials?
The short answer: RCTs are our best method of establishing that A causes B.
Imagine you’re a researcher for a shoe company; you’re testing a running shoe that is supposed to shave time off of your sprint.
So you set up a test: Runners in your shoes versus runners in some different shoe.
After statistical analysis, we find the group with your shoe crossed the finish line significantly before the other group.
But wait: You had your group running 100m, while the comparison group ran 200m!
This comparison wasn’t fair; even if the results are good, we can't say they were because of the shoe.
This is the essence of controlling for confounding variables: basic fairness in comparisons.
(Statistical) control = making sure everybody has the same starting line before comparing them.
There are several ways to control for confounding variables. For instance:
(Obviously this last is just for the sake of the example, and would not be appropriate in a real setting)
These methods of control can be very sophisticated. But there's a problem:
“… the golden rule of causal analysis: No causal claim can be established by a purely statistical method, be it propensity scores, regression, stratification, or any other distribution-based design.”
-Judea Pearl, “Causality,” p. 350
Well-conducted random sampling guarantees that all possible confounding variables are randomly distributed among conditions – which is to say, there’s no correlation between any trait and group membership.
Which means the groups, overall, start and finish on the same lines….
Which lets us assume that if they finish at different times, it's because of the program.
Randomized controlled trials are:
As compared to nonrandomized evaluation.
The MS standard for evidence-based practice is the gold standard. Research quality drops off dramatically the more of these standards you lose.
Gold is rare. What if we don't have any and still need to act?
MISS. CODE ANN. §27-103-159 provides some loose definitions of less rigorous alternatives:
But these definitions are very loose!
We've adopted an existing scale to rate research below the MS standard of evidence:
The Maryland Scientific Methods scale.
Described by Farrington et al. (2002) in Evidence-based Crime Prevention.
It's a five-point ordinal scale – 1 is the worst, 5 is the best!
It rates our general ability to draw conclusions from the study.
It's not safe to make inferences from any trial below level 3.
So that's where we've drawn our line for “high-quality research”…
(Although you should always want the gold standard if possible – cf. the earlier slide on the percentage of preliminary studies overturned by rigorous RCTs.)
Everything said so far assumes that the research is well-conducted.
There is a crisis of reproducibility in science, especially social science!
Some have gone so far as to suggest that most published research is false.
This problem affects random and nonrandom studies alike.
Here's where pre-registration and reproducibility come in.
A simplified overview of the process:
An excellent conceptual overview of the process is here.
It's strongly recommended reading even if you skip the articles already mentioned!
The report that you pre-register should conform to one of two existing, internationally accepted standards: CONSORT or TREND.
You will write up a report that includes every item on your checklist except those under 'results' and 'discussion' at the initial submission phase.
This initial writeup must be completed before any aspect of research begins – including even assigning subjects to conditions.
Writeups completed after any phase of research begins cannot meet the standards of this section.
After the research is done, you will finish your report, including 'results' and 'discussion', and resubmit.
Note that adequate answers to many of the checklist elements will require fairly technical decisions. This checklist is not a substitute for skilled research staff – it just makes their jobs easier and their results more trustworthy!
When submitting your final research paper:
The goal: The reader should be able to take
With no further manipulation necessary!
Coalition for Evidence-Based Policy (2013). Randomized Controlled Trials Commissioned by the Institute of Education Sciences Since 2002: How Many Found Positive Versus Weak or No Effects. Retrieved from http://coalition4evidence.org/wp-content/uploads/2013/06/IES-Commissioned-RCTs-positive-vs-weak-or-null-findings-7-2013.pdf
Farrington, D.P., Gottfredson, D.C., Sherman, L.W. & Welsh, B.C. (2002). The Maryland Scientific Methods Scale. In Farrington, D.P., MacKenzie. D. L., Sherman, L.W.,& Welsh, B.C. (Eds.), Evidence-Based Crime Prevention (pp. 13-21). London: Routledge.
Ioannidis, J.P.A. (2005). Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. Journal of the American Medical Association, 294(2), 218-228.
Manzi, J. (2012). Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society. New York: Perseus Books Group.
Pearl, J. (2009). Causality (2nd ed.). Cambridge: Cambridge University Press.
Zia, M. I., Siu, L. L., Pond, G. R., & Chen, E. X. (2005). Comparison of Outcomes of Phase II Studies and Subsequent Randomized Control Studies Using Identical Chemotherapeutic Regimens. Journal of Clinical Oncology, 23(28), 6982-6991.