[55] The file-drawer problem is unfixable, and that’s OK

The “file-drawer problem” consists of researchers not publishing their p>.05 studies (Rosenthal 1979 .pdf).
P-hacking consist of researchers not reporting their p>.05 analyses for a given study.

P-hacking is easy to stop. File-drawering nearly impossible.
Fortunately, while p-hacking is a real problem, file-drawering is not.

Consequences of p-hacking vs file-drawering
With p-hacking it’s easy to get a p<.05 [1].  Run 1 study, p-hack a bit and it will eventually “work”; whether the effect is real or not.  In “False-Positive Psychology” we showed that a bit of p-hacking gets you p<.05 with more than 60% chance (SSRN).

With file-drawering, in contrast, when there is no real effect, only 1 in 20 studies work. It’s hard to be a successful researcher with such low a success rate [2]. It’s also hard to fool oneself the effect of interest is real when 19 in 20 studies fail. There are only so many hidden moderators we can talk ourselves into. Moreover, papers typically have multiple studies. A four-study paper would require file-drawering 76 failed studies. Nuts.

File-drawering entire studies is not really a problem, which is good news, because the solution for the file-drawer is not really a solution [3].

Study registries: The non-solution to the file-drawer problem
Like genitals & generals, study registries & pre-registrations sound similar but mean different things.

A study registry is a public repository where authors report all studies they run. A pre-registration is a document authors create before running one study, to indicate how that given study will be run. Pre-registration intends to solve p-hacking. Study registries intend to solve the file-drawer problem.

Study registries sound great, until you consider what needs to happen for them to make a difference.

How the study registry is supposed to work
You are reading a paper and get to Study 1. It shows X. You put the paper down, visit the registry, search for the set of all other studies examining X or things similar to X (so maybe search by author, then by keyword, then by dependent variable, then by topic, then by manipulation), then decide which subset of the studies you found are actually relevant for the Study 1 in front of you (e.g., actually studying X, with a similarly clean design, competent enough execution, comparable manipulation and dependent variable, etc.). Then you tabulate the results of those studies found in the registry, and use the meta-analytical statistical tool of your choice  to combine those results with the one from the study still sitting in front of you.  Now you may proceed to reading Study 2.

Sorry, I probably made it sound much easier than it actually is. In real life, researchers don’t comply with registries the way they are supposed to. The studies found in the registry almost surely will lack the info you need to ‘correct’ the paper you are reading.  A year after being completed, about 90% of studies registered in ClinicalTrials.gov do not have the results uploaded to the database (NEJM, 2015 .pdf). Even for the subset of trials where posting results is ‘mandatory’  it does not happen (BMJ, 2012 .pdf), and when results are uploaded, they are often incomplete and inconsistent with the results in the published paper (Ann Int Medicine 2014 .pdf). This sounds bad, but in social science it will be way worse; in medicine the registry is legally required, for us it’s voluntary. Our registries would only include the subset of studies some social scientists choose to register (the rest remain in the file-drawer…).

Study registries in social science fall short of fixing an inconsequential problem, the file-drawer, they are burdensome to comply with, and to use.

Pre-registration: the solution to p-hacking
Fixing p-hacking is easy: authors disclose how sample size was set & all measures, conditions, and exclusions (“False Positive Psychology” SSRN). No ambiguity, no p-hacking.

For experiments, the best way to disclose is with pre-registrations.  A pre-registration consists of writing down what one wants to do before one does it. In addition to the disclosure items above, one specifies the hypothesis of interest and focal statistical analysis. The pre-registration is then appended to studies that get written-up (and file-drawered with those that don’t). Its role is to demarcate planned from unplanned analysis. One can still explore, but now readers know one was exploring.

Pre-registrations is an almost perfect fix to p-hacking, and can be extremely easy to comply with and use.

In AsPredicted it takes 5 minutes to create a pre-registration, half a minute to read it (see sample .pdf). If you pre-register and never publish the study, you can keep your AsPredicted private forever (it’s about p-hacking, not the file-drawer). Over 1000 people created AsPredicteds in 2016.

Summary
– The file-drawer is not really a problem, and study registries don’t come close to fixing it.
P-hacking is a real problem. Easy to create and evaluate pre-registrations all but eliminate it.
Wide logo


Uri’s note: post was made public by mistake when uploading the 1st draft.  I did not receive feedback from people I was planning to contact and made several edits after posting. Sorry.

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.


Footnotes.

  1. With p-hacking it also easy to get Bayes Factor >3; see “Posterior Hacking” http://DataColada.org/13. []
  2. it’s actually 1 in 40 since usually we make directional predictions and rely on two-sided tests []
  3. p-curve is a statistical remedy to the file-drawer problem and it does work .pdf []