In this post I use data from the Many-Labs replication project to contrast the (pointless) inferences one arrives at using the Excessive Significant Test, with the (critically important) inferences one arrives at with p-curve. The many-labs project is a collaboration of 36 labs around the world, each running a replication of 13 published effects in…
[23] Ceiling Effects and Replications
A recent failure to replicate led to an attention-grabbing debate in psychology. As you may expect from university professors, some of it involved data. As you may not expect from university professors, much of it involved saying mean things that would get a child sent to the principal's office (.pdf). The hostility in the debate has obscured an interesting…
[22] You know what's on our shopping list
As part of an ongoing project with Minah Jung, a nearly perfect doctoral student, we asked people to estimate the percentage of people who bought some common items in their last trip to the supermarket. For each of 18 items, we simply asked people (N = 397) to report whether they had bought it on…
[21] Fake-Data Colada: Excessive Linearity
Recently, a psychology paper (.html) was flagged as possibly fraudulent based on statistical analyses (.pdf). The author defended his paper (.html), but the university committee investigating misconduct concluded it had occurred (.pdf). In this post we present new and more intuitive versions of the analyses that flagged the paper as possibly fraudulent. We then rule…
[20] We cannot afford to study effect size in the lab
Methods people often say – in textbooks, task forces, papers, editorials, over coffee, in their sleep – that we should focus more on estimating effect sizes rather than testing for significance. I am kind of a methods person, and I am kind of going to say the opposite. Only kind of the opposite because it…
[19] Fake Data: Mendel vs. Stapel
Diederik Stapel, Dirk Smeesters, and Lawrence Sanna published psychology papers with fake data. They each faked in their own idiosyncratic way, nevertheless, their data do share something in common. Real data are noisy. Theirs aren't. Gregor Mendel's data also lack noise (yes, famous peas-experimenter Mendel). Moreover, in a mathematical sense, his data are just as…
[18] MTurk vs. The Lab: Either Way We Need Big Samples
Back in May 2012, we were interested in the question of how many participants a typical between-subjects psychology study needs to have an 80% chance to detect a true effect. To answer this, you need to know the effect size for a typical study, which you can’t know from examining the published literature because it…
[17] No-way Interactions
This post shares a shocking and counterintuitive fact about studies looking at interactions where effects are predicted to get smaller (attenuated interactions). I needed a working example and went with Fritz Strack et al.’s (1988, .html) famous paper [933 Google cites], in which participants rated cartoons as funnier if they saw them while holding a…
[15] Citing Prospect Theory
Kahneman and Tversky's (1979) Prospect Theory (.html), with its 9,206 citations, is the most cited article in Econometrica, the prestigious journal in which it appeared. In fact, it is more cited than any article published in any economics journal. [1] Let's break it down by year. To be clear, this figure shows that just in 2013, Prospect Theory got about…
[13] Posterior-Hacking
Many believe that while p-hacking invalidates p-values, it does not invalidate Bayesian inference. Many are wrong. This blog post presents two examples from my new “Posterior-Hacking” (SSRN) paper showing selective reporting invalidates Bayesian inference as much as it invalidates p-values. Example 1. Chronological Rejuvenation experiment In “False-Positive Psychology" (SSRN), Joe, Leif and I run experiments to demonstrate how easy…