[53] What I Want Our Field To Prioritize

When I was a sophomore in college, I read a book by Carl Sagan called The Demon-Haunted World. By the time I finished it, I understood the difference between what is scientifically true and what is not. It was not obvious to me at the time: If a hypothesis is true, then you can use it to predict the future. If a hypothesis is false, then you can’t. Replicable findings are true precisely because you can predict that they will replicate. Non-replicable findings are not true precisely because you can’t. Truth is replicability. This lesson changed my life. I decided to try to become a scientist.

Although this lesson inspired me to pursue a career as a psychological scientist, for a long time I didn’t let it affect how I actually pursued that career. For example, during graduate school Leif Nelson and I investigated the hypothesis that people strive for outcomes that resemble their initials. For example, we set out to show that (not: test whether) people with an A or B initial get better grades than people with a C or D initial. After many attempts (we ran many analyses and we ran many studies), we found enough “evidence” for this hypothesis, and we published the findings in Psychological Science. At the time, we believed the findings and this felt like a success. Now we both recognize it as a failure.

The findings in that paper are not true. Yes, if you run the exact analyses we report on our same datasets, you will find significant effects. But they are not true because they would not replicate under specifiable conditions. History is about what happened. Science is about what happens next. And what happens next is that initials don’t affect your grades.

Inspired by discussions with Leif, I eventually (in 2010) reflected on what I was doing for a living, and I finally remembered that at some fundamental level a scientist’s #1 job is to differentiate what is true/replicable from what is not. This simple realization forever changed the way I conduct and evaluate research, and it is the driving force behind my desire for a more replicable science. If you accept this premise, then life as a scientist becomes much easier and more straightforward. A few things naturally follow.

First, it means that replicability is not merely a consideration, but the most important consideration. Of course I also care about whether findings are novel or interesting or important or generalizable, or whether the authors of an experiment are interpreting their findings correctly. But none of those considerations matter if the finding is not replicable. Imagine I claim that eating Funyuns® cures cancer. This hypothesis is novel and interesting and important, but those facts don’t matter if it is untrue. Concerns about replicability must trump all other concerns. If there is no replicability, there is no finding, and if there is no finding, there is no point assessing whether it is novel, interesting, or important. [1] Thus, more than any other attribute, journal editors and reviewers should use attributes that are diagnostic of replicability (e.g., statistical power and p-values) as a basis for rejecting papers. (Thank you, Simine Vazire, for taking steps in this direction at SPPS <.pdf>). [2]

Second, it means that the best way to prevent others from questioning the integrity of your research is to publish findings that you know to be replicable under specifiable conditions. You should be able to predict that if you do exactly X, then you will get Y. Your method section should be a recipe for getting an effect, specifying exactly which ingredients are sufficient to produce it. Of course, the best way to know that your finding replicates is to replicate it yourself (and/or to tie your hands by pre-registering your exact key analysis). This is what I now do (particularly after I obtain a p > .01 result), and I sleep a lot better because of it.

Third, it means that if someone fails to replicate your past work, you have two options. You can either demonstrate that the finding does replicate under specifiable/pre-registered conditions or you can politely tip your cap to the replicators for discovering that one of your published findings is not likely to be true. If you believe that your finding is replicable but don’t have the resources to run the replication, then you can pursue a third option: Specify the exact conditions under which you predict that your effect will emerge. This allows others with more resources to test that prediction. If you can’t specify testable circumstances under which your effect will emerge, then you can’t use your finding to predict the future, and, thus, you can’t say that it is true.

Andrew Meyer and his colleagues recently published several highly powered failures to reliably replicate my and Leif’s finding (.pdf; see Study 13) that disfluent fonts change how people predict sporting events (.pdf; see Table A6). We stand by the central claims of our paper, as we have replicated the main findings many times. But Meyer et al. showed that we should not  – and thus we do not – stand by the findings of Study 13. Their evidence that it doesn’t consistently replicate (20 games; 12,449 participants) is much better than our evidence that it does (2 games; 181 participants), and we can look back on our results and see that they are not convincing (most notably, p = .03). As a result, all we can do is to acknowledge that the finding is unlikely to be true. Meyer et al.’s paper wasn’t happy news, of course, but accepting their results was so much less stressful than mounting a protracted, evidence-less defense of a finding that we are not confident would replicate. Having gone that route before, I can tell you that this one was about a million times less emotionally punishing, in addition to being more scientific. It is a comfort to know that I will no longer defend my own work in that way. I’ll either show you’re wrong, or I’ll acknowledge that you’re right.

Fourth, it means advocating for policies and actions that enhance the replicability of our science. I believe that the #1 job of the peer review process is to assess whether a finding is replicable, and that we can all do this better if we know exactly what the authors did in their study, and if we have access to their materials and data. I also believe that every scientist has a conflict of interest – we almost always want the evidence to come out one way rather than another – and that those conflicts of interest lead even the best of us to analyze our data in a way that makes us more likely to draw our preferred conclusions. I still catch myself p-hacking analyses that I did not pre-register. Thus, I am in favor of policies and actions that make it harder/impossible for us to do that, including incentives for pre-registration, the move toward including exact replications in published papers, and the use of methods for checking that our statistical analyses are accurate and that our results are unlikely to have been p-hacked (e.g., because the study was highly powered).

I am writing all of this because it’s hard to resolve a conflict when you don’t know what the other side wants. I honestly don’t know what those who are resistant to change want, but at least now they know what I want. I want to be in a field that prioritizes replicability over everything else. Maybe those who are resistant to change believe this too, and their resistance is about the means (e.g., public criticism) rather than the ends. Or maybe they don’t believe this, and think that concerns about replicability should take a back seat to something else. It would be helpful for those who are resistant to change to articulate their position. What do you want our field to prioritize, and why?

  1. I sometimes come across the argument that a focus on replicability will increase false-negatives. I don’t think that is true. If a field falsely believes that Funyuns will cure cancer, then the time and money that may have been spent discovering true cures will instead be spent studying the Funyun Hypothesis. True things aren’t discovered when resources are allocated to studying false things. In this way, false-positives cause false-negatives. []
  2. At this point I should mention that although I am an Associate Editor at SPPS, what I write here does not reflect journal policy. []