It has become more common to publicly speculate, upon noticing a paper with unusual analyses, that a reported finding was obtained via p-hacking. This post discusses how authors can persuasively respond to such speculations. Examples of public speculation of p-hacking Example 1. A Slate.com post by Andrew Gelman suspected p-hacking in a paper that collected…
[28] Confidence Intervals Don't Change How We Think about Data
Some journals are thinking of discouraging authors from reporting p-values and encouraging or even requiring them to report confidence intervals instead. Would our inferences be better, or even just different, if we reported confidence intervals instead of p-values? One possibility is that researchers become less obsessed with the arbitrary significant/not-significant dichotomy. We start paying more…
[27] Thirty-somethings are Shrinking and Other U-Shaped Challenges
A recent Psych Science (.pdf) paper found that sports teams can perform worse when they have too much talent. For example, in Study 3 they found that NBA teams with a higher percentage of talented players win more games, but that teams with the highest levels of talented players win fewer games. The hypothesis is easy enough…
[26] What If Games Were Shorter?
The smaller your sample, the less likely your evidence is to reveal the truth. You might already know this, but most people don’t (.html), or at least they don’t appropriately apply it (.html). (See, for example, nearly every inference ever made by anyone). My experience trying to teach this concept suggests that it’s best understood…
[25] Maybe people actually enjoy being alone with their thoughts
Recently Science published a paper concluding that people do not like sitting quietly by themselves (.html). The article received press coverage, that press coverage received blog coverage, which received twitter coverage, which received meaningful head-nodding coverage around my department. The bulk of that coverage (e.g., 1, 2, and 3) focused on the tenth study in…
[24] P-curve vs. Excessive Significance Test
In this post I use data from the Many-Labs replication project to contrast the (pointless) inferences one arrives at using the Excessive Significant Test, with the (critically important) inferences one arrives at with p-curve. The many-labs project is a collaboration of 36 labs around the world, each running a replication of 13 published effects in…
[23] Ceiling Effects and Replications
A recent failure to replicate led to an attention-grabbing debate in psychology. As you may expect from university professors, some of it involved data. As you may not expect from university professors, much of it involved saying mean things that would get a child sent to the principal's office (.pdf). The hostility in the debate has obscured an interesting…
[22] You know what's on our shopping list
As part of an ongoing project with Minah Jung, a nearly perfect doctoral student, we asked people to estimate the percentage of people who bought some common items in their last trip to the supermarket. For each of 18 items, we simply asked people (N = 397) to report whether they had bought it on…
[21] Fake-Data Colada: Excessive Linearity
Recently, a psychology paper (.html) was flagged as possibly fraudulent based on statistical analyses (.pdf). The author defended his paper (.html), but the university committee investigating misconduct concluded it had occurred (.pdf). In this post we present new and more intuitive versions of the analyses that flagged the paper as possibly fraudulent. We then rule…
[20] We cannot afford to study effect size in the lab
Methods people often say – in textbooks, task forces, papers, editorials, over coffee, in their sleep – that we should focus more on estimating effect sizes rather than testing for significance. I am kind of a methods person, and I am kind of going to say the opposite. Only kind of the opposite because it…