It has become more common to publicly speculate, upon noticing a paper with unusual analyses, that a reported finding was obtained via p-hacking. This post discusses how authors can persuasively respond to such speculations.
Examples of public speculation of p-hacking
Example 1. A Slate.com post by Andrew Gelman suspected p-hacking in a paper that collected data on 10 colors of clothing, but analyzed red & pink as a single color [.html] (see authors' response to the accusation .html)
Example 2. An anonymous referee suspected p-hacking and recommended rejecting a paper, after noticing participants with low values of the dependent variable were dropped [.html]
Example 3. A statistics blog suspected p-hacking after noticing a paper studying number of hurricane deaths relied on the somewhat unusual Negative-Binomial Regression [.html]
First, the wrong response
The most common & tempting response to concerns like these is also the wrong response: justifying what one did. Explaining, for instance, why it makes sense to collapse red with pink or to run a negative-binomial.
It is the wrong response because when we p-hack, we self-servingly choose among justifiable analyses. P-hacked findings are by definition justifiable. Unjustifiable research practices involve incompetence or fraud, not p-hacking.
Showing an analysis is justifiable does not inform the question of whether it was p-hacked.
Right Response #1. “We decided in advance”
P-hacking involves post-hoc selection of analyses to get p<.05. One way to address p-hacking concerns is to indicate analysis decisions were made ex-ante.
A good way to do this is to just say so: “We decided to collapse red & pink before running any analyses” A better way is with a more general and verifiable statement: “In all papers we collapse red & pink" An even better way is: “We preregistered that we would collapse red & pink in this study” (see related Colada: "Preregistration: Not Just for the Empiro-Zealots")
Right Response #2. “We didn’t decide in advance, but the results are robust”
Often we don’t decide in advance. We don’t think of outliers till we see them. What to do then? Show the results don’t hinge on how the problem is dealt with. Show dropping >2SD, >2.5SD, >3SD, logging the dependent variable, comparing medians and running a non-parametric test. If the conclusion is the same in most of these, tell the blogger to shut up.
Right Response 3. “We didn’t decide in advance, and the results are not robust. So we run a direct replication.”
Sometimes the result will only be there if you drop >2SD and it will not have occurred to you to do so till you saw the p=.24 without it. One possibility is that you are chasing noise. Another possibility is that you are right. The one way to tell these two apart is with a new study. Run everything the same, exclude again based on >2SD.
If in your “replication” you now need a gender interaction for the >2SD exclusion to give you p<.05, it is not too late to read “False-Positive Psychology” (.html)
If a blogger raises concerns of p-hacking, and you cannot provide any of the three responses above: buy the blogger a drink. She is probably right.