A recent Science-paper (.html) used a total sample size of N=40 to arrive at the conclusion that implicit racial and gender stereotypes can be reduced while napping. N=40 is a small sample for a between-subject experiment. One needs N=92 to reliably detect that men are heavier than women (SSRN). The study, however, was within-subject, for instance, its dependent…
Author: Uri Simonsohn
[36] How to Study Discrimination (or Anything) With Names; If You Must
Consider these paraphrased famous findings: “Because his name resembles ‘dentist,’ Dennis became one” (JPSP, .pdf) “Because the applicant was black (named Jamal instead of Greg) he was not interviewed” (AER, .pdf) “Because the applicant was female (named Jennifer instead of John), she got a lower offer” (PNAS, .pdf) Everything that matters (income, age, location, religion) correlates with…
[35] The Default Bayesian Test is Prejudiced Against Small Effects
When considering any statistical tool I think it is useful to answer the following two practical questions: 1. “Does it give reasonable answers in realistic circumstances?” 2. “Does it answer a question I am interested in?” In this post I explain why, for me, when it comes to the default Bayesian test that's starting to…
[34] My Links Will Outlive You
If you are like me, from time to time your papers include links to online references. Because the internet changes so often, by the time readers follow those links, who knows if the cited content will still be there. This blogpost shares a simple way to ensure your links live “forever.” I got the idea…
[33] "The" Effect Size Does Not Exist
Consider the robust phenomenon of anchoring, where people’s numerical estimates are biased towards arbitrary starting points. What does it mean to say “the” effect size of anchoring? It surely depends on moderators like domain of the estimate, expertise, and perceived informativeness of the anchor. Alright, how about “the average” effect-size of anchoring? That's simple enough….
[31] Women are taller than men: Misusing Occam’s Razor to lobotomize discussions of alternative explanations
Most scientific studies document a pattern for which the authors provide an explanation. The job of readers and reviewers is to examine whether that pattern is better explained by alternative explanations. When alternative explanations are offered, it is common for authors to acknowledge that although, yes, each study has potential confounds, no single alternative explanation…
[29] Help! Someone Thinks I p-hacked
It has become more common to publicly speculate, upon noticing a paper with unusual analyses, that a reported finding was obtained via p-hacking. This post discusses how authors can persuasively respond to such speculations. Examples of public speculation of p-hacking Example 1. A Slate.com post by Andrew Gelman suspected p-hacking in a paper that collected…
[28] Confidence Intervals Don't Change How We Think about Data
Some journals are thinking of discouraging authors from reporting p-values and encouraging or even requiring them to report confidence intervals instead. Would our inferences be better, or even just different, if we reported confidence intervals instead of p-values? One possibility is that researchers become less obsessed with the arbitrary significant/not-significant dichotomy. We start paying more…
[24] P-curve vs. Excessive Significance Test
In this post I use data from the Many-Labs replication project to contrast the (pointless) inferences one arrives at using the Excessive Significant Test, with the (critically important) inferences one arrives at with p-curve. The many-labs project is a collaboration of 36 labs around the world, each running a replication of 13 published effects in…
[23] Ceiling Effects and Replications
A recent failure to replicate led to an attention-grabbing debate in psychology. As you may expect from university professors, some of it involved data. As you may not expect from university professors, much of it involved saying mean things that would get a child sent to the principal's office (.pdf). The hostility in the debate has obscured an interesting…