P-hacking, the selective reporting of statistically significant analyses, continues to threaten the integrity of our discipline. P-hacking is inevitable whenever (1) a researcher hopes to find evidence for a particular result, (2) there is ambiguity about how exactly to analyze the data, and (3) the researcher does not perfectly plan out his/her analysis in advance. Although some mistakenly believe that accusations of p-hacking are tantamount to accusations of cheating, the truth is that accusations of p-hacking are nothing more than accusations of imperfect planning.
The best way to address the problem of imperfect planning is to plan more perfectly: to preregister your studies. Preregistrations are time-stamped documents in which researchers specify exactly how they plan to collect their data and to conduct their key confirmatory analyses. The goal of a preregistration is to make it easy to distinguish between planned, confirmatory analyses – those for which statistical significance is meaningful – and unplanned exploratory analyses – those for which statistical significance is not meaningful . Because a good preregistration prevents researchers from p-hacking, it also protects them from suspicions of p-hacking .
In the past five years or so, preregistration has gone from being something that no psychologists did to something that many psychologists are doing. In our view, this wonderful development is the biggest reason to be optimistic about the future of our discipline.
But if preregistration is going to be the solution, then we need to ensure that it is done right. After casually reviewing several recent preregistration attempts in published papers, we noticed that there is room for improvement. We saw two kinds of problems.
Problem 1. Not enough information
For example, we saw one "preregistration" that was simply a time-stamped abstract of the project; it contained almost no details about how data were going to be collected and analyzed. Others failed to specify one or more critical aspects of the analysis: sample size, rules for exclusions, or how the dependent variable would be scored (in a case for which there were many ways to score it). These preregistrations are time-stamped, but they lack the other critical ingredient: precise planning.
To decide which information to include in your preregistration, it may be helpful to imagine a skeptical reader of your paper. Let's call him Leif. Imagine that Leif is worried that p-hacking might creep into the analyses of even the best-intentioned researchers. The job of your preregistration is to set Leif's mind at ease . This means identifying all of the ways you could have p-hacked – choosing a different sample size, or a different exclusion rule, or a different dependent variable, or a different set of controls/covariates, or a different set of conditions to compare, or a different data transformation – and including all of the information that lets Leif know that these decisions were set in stone in advance. In other words, your job is to prevent Leif from worrying that you tried to run your critical analysis in more than one way.
This means that your preregistration needs to be sufficiently exhaustive and sufficiently specific. If you say, "We will exclude participants who are distracted," Leif could think, "Right, but distracted how? Did you define "distracted" in advance?" It is better to say, "We will exclude participants who incorrectly answered at least 2 out of our 3 comprehension checks." If you say, "We will measure happiness," Leif could think, "Right, but aren't there a number of ways to measure it? I wonder if this was the only one they tried or if it was just the one they most wanted to report after the data came in?" So it's better to say, "Our dependent variable is happiness, which we will measure by asking people 'How happy do you feel right now?' on a scale ranging from 1 (not at all happy) to 7 (extremely happy)."
If including something in a preregistration would make Leif less likely to wonder whether you p-hacked, then include it.
Problem 2. Too much information
A preregistration cannot allow readers and reviewers to distinguish between confirmatory and exploratory analyses if it is not easy to read or understand. Thus, a preregistration needs to be easy to read and understand. This means that it should contain only the information that is essential for the task at hand. We have seen many preregistrations that are just too long, containing large sections on theoretical background and on exploratory analyses, or lots of procedural details that on the one hand will definitely be part of the paper, and on the other, are not p-hackable. Don't forget that you will publish the paper also, not just the preregistration; you don't need to say in the preregistration everything that you will say in the paper. A hard-to-read preregistration makes preregistration less effective .
To decide which information to exclude in your preregistration, you can again imagine that a skeptical Leif is reading your paper, but this time you can ask, "If I leave this out, will Leif be more concerned that my results are attributable to p-hacking?"
For example, if you leave out the literature review from your preregistration, will Leif now be more concerned? Of course not, as your literature review does not affect how much flexibility you have in your key analysis. If you leave out how long people spent in the lab, how many different RAs you are using, why you think your hypothesis is interesting, or the description of your exploratory analyses, will Leif be more concerned? No, because none of those things affect the fact that your analyses are confirmatory.
If excluding something from a preregistration would not make Leif more likely to wonder whether you p-hacked, then you should exclude it.
Thus, a good preregistration needs to have two features:
- It needs to specify exactly how the key confirmatory analyses will be conducted.
- It needs to be short and easy to read.
We designed AsPredicted.org with these goals in mind. The website poses a standardized set of questions asking you only to include what needs to be included, thus also making it obvious what does not need to be. The OSF offers lots of flexibility, but they also offer an AsPredicted template here: https://osf.io/fnsb/ .
Still, even on AsPredicted, it is possible to get it wrong, by, for example, not being specific enough in your answers to the questions it poses. This table provides an example of how to wrongly and properly answer these questions.
We would like to thank Stephen Lindsay and Simine Vazire for taking time out of their incredibly busy schedules to give us invaluable feedback on a previous version of this post.
Subscribe to Blog via Email
- This is because conducting unplanned analyses necessarily inflates the probability that you will find a statistically significant relationship even if no relationship exists. [↩]
- For good explanations of the virtues of preregistration, see Lindsay et al. (2016) <.html>, Moore (2016) <.pdf>, and van't Veer & Giner-Sorolla (2016) <.pdf>. [↩]
- Contrary to popular belief, the job of your pre-registration is NOT to show that your predictions were confirmed. Indeed, the critical aspect of pre-registration is not the prediction that you register – many good preregistrations pose questions (e.g., "We are testing whether eating Funyuns cures cancer") rather than hypotheses (e.g., "We hypothesize that eating Funyuns cures cancer") – but the analysis that you specify. In hindsight, perhaps our preregistration website should have been called AsPlanned rather than AsPredicted, although AsPredicted sounds better. [↩]
- Even complex studies should have a simple and clear preregistration, one that allows a reader to casually differentiate between confirmation and exploration. Additional complexities could potentially be captured in other secondary planning documents, but because these are far less likely to be read, they shouldn't obscure the core basics of the simple preregistration. [↩]
- We recently updated the AsPredicted questions, and so this OSF template contains slightly different questions than the ones currently on AsPredicted. We advise readers who wish to use the OSF to answer the questions that are currently on https://AsPredicted.org. [↩]