Are hidden moderators a thing? Do experiments intended to be identical lead to inexplicably different results? Back in 2014, the "Many Labs" project (htm) reported an ambitious attempt to answer these questions. More than 30 different labs ran the same set of studies and the paper presented the results side-by-side. They did not find any…
Author: Uri Simonsohn
[62] Two-lines: The First Valid Test of U-Shaped Relationships
Can you have too many options in the menu, too many talented soccer players in a national team, or too many examples in an opening sentence? Social scientists often hypothesize u-shaped relationships like these, where the effect of x on y starts positive and becomes negative, or starts negative and becomes positive. Researchers rely almost…
[59] PET-PEESE Is Not Like Homeopathy
PET-PEESE is a meta-analytical tool that seeks to correct for publication bias. In a footnote in my previous post (.htm), I referred to is as the homeopathy of meta-analysis. That was unfair and inaccurate. Unfair because, in the style of our President, I just called PET-PEESE a name instead of describing what I believed was…
[58] The Funnel Plot is Invalid Because of This Crazy Assumption: r(n,d)=0
The funnel plot is a beloved meta-analysis tool. It is typically used to answer the question of whether a set of studies exhibits publication bias. That’s a bad question because we always know the answer: it is “obviously yes.” Some researchers publish some null findings, but nobody publishes them all. It is also a bad…
[57] Interactions in Logit Regressions: Why Positive May Mean Negative
Of all economics papers published this century, the 10th most cited appeared in Economics Letters , a journal with an impact factor of 0.5. It makes an inconvenient and counterintuitive point: the sign of the estimate (b̂) of an interaction in a logit/probit regression, need not correspond to the sign of its effect on the…
[56] TWARKing: Test-Weighting After Results are Known
On the last class of the semester I hold a “town-hall” meeting; an open discussion about how to improve the course (content, delivery, grading, etc). I follow-up with a required online poll to “vote” on proposed changes [1]. Grading in my class is old-school. Two tests, each 40%, homeworks 20% (graded mostly on a completion…
[55] The file-drawer problem is unfixable, and that’s OK
The “file-drawer problem” consists of researchers not publishing their p>.05 studies (Rosenthal 1979 .htm). P-hacking consist of researchers not reporting their p>.05 analyses for a given study. P-hacking is easy to stop. File-drawering nearly impossible. Fortunately, while p-hacking is a real problem, file-drawering is not. Consequences of p-hacking vs file-drawering With p-hacking it’s easy to…
[54] The 90x75x50 heuristic: Noisy & Wasteful Sample Sizes In The “Social Science Replication Project”
An impressive team of researchers is engaging in an impressive task: Replicate 21 social science experiments published in Nature and Science in 2010-2015 (.htm). The task requires making many difficult decisions, including what sample sizes to use. The authors' current plan is a simple rule: Set n for the replication so that it would have 90%…
[52] Menschplaining: Three Ideas for Civil Criticism
As bloggers, commentators, reviewers, and editors, we often criticize the work of fellow academics. In this post I share three ideas to be more civil and persuasive when doing so. But first: should we comment publicly in the first place? One of the best known social psychologist, Susan Fiske (.htm), last week circulated a draft of an invited opinion…
[51] Greg vs. Jamal: Why Didn’t Bertrand and Mullainathan (2004) Replicate?
Bertrand & Mullainathan (2004, .htm) is one of the best known and most cited American Economic Review (AER) papers [1]. It reports a field experiment in which resumes given typically Black names (e.g., Jamal and Lakisha) received fewer callbacks than those given typically White names (e.g., Greg and Emily). This finding is interpreted as evidence of racial discrimination…