This is really just a post-script to Colada [2], where I described a final exam question I gave in my MBA marketing research class. Students got a year’s worth of iTunes listening data for one person –me– and were asked: “What songs would this person put on his end-of-year Top 40?” I compared that list…
[31] Women are taller than men: Misusing Occam’s Razor to lobotomize discussions of alternative explanations
Most scientific studies document a pattern for which the authors provide an explanation. The job of readers and reviewers is to examine whether that pattern is better explained by alternative explanations. When alternative explanations are offered, it is common for authors to acknowledge that although, yes, each study has potential confounds, no single alternative explanation…
[30] Trim-and-Fill is Full of It (bias)
Statistically significant findings are much more likely to be published than non-significant ones (no citation necessary). Because overestimated effects are more likely to be statistically significant than are underestimated effects, this means that most published effects are overestimates. Effects are smaller – often much smaller – than the published record suggests. For meta-analysts the gold…
[29] Help! Someone Thinks I p-hacked
It has become more common to publicly speculate, upon noticing a paper with unusual analyses, that a reported finding was obtained via p-hacking. This post discusses how authors can persuasively respond to such speculations. Examples of public speculation of p-hacking Example 1. A Slate.com post by Andrew Gelman suspected p-hacking in a paper that collected…
[28] Confidence Intervals Don't Change How We Think about Data
Some journals are thinking of discouraging authors from reporting p-values and encouraging or even requiring them to report confidence intervals instead. Would our inferences be better, or even just different, if we reported confidence intervals instead of p-values? One possibility is that researchers become less obsessed with the arbitrary significant/not-significant dichotomy. We start paying more…
[27] Thirty-somethings are Shrinking and Other U-Shaped Challenges
A recent Psych Science (.pdf) paper found that sports teams can perform worse when they have too much talent. For example, in Study 3 they found that NBA teams with a higher percentage of talented players win more games, but that teams with the highest levels of talented players win fewer games. The hypothesis is easy enough…
[26] What If Games Were Shorter?
The smaller your sample, the less likely your evidence is to reveal the truth. You might already know this, but most people don’t (.html), or at least they don’t appropriately apply it (.html). (See, for example, nearly every inference ever made by anyone). My experience trying to teach this concept suggests that it’s best understood…
[25] Maybe people actually enjoy being alone with their thoughts
Recently Science published a paper concluding that people do not like sitting quietly by themselves (.html). The article received press coverage, that press coverage received blog coverage, which received twitter coverage, which received meaningful head-nodding coverage around my department. The bulk of that coverage (e.g., 1, 2, and 3) focused on the tenth study in…
[24] P-curve vs. Excessive Significance Test
In this post I use data from the Many-Labs replication project to contrast the (pointless) inferences one arrives at using the Excessive Significant Test, with the (critically important) inferences one arrives at with p-curve. The many-labs project is a collaboration of 36 labs around the world, each running a replication of 13 published effects in…
[23] Ceiling Effects and Replications
A recent failure to replicate led to an attention-grabbing debate in psychology. As you may expect from university professors, some of it involved data. As you may not expect from university professors, much of it involved saying mean things that would get a child sent to the principal's office (.pdf). The hostility in the debate has obscured an interesting…