Are hidden moderators a thing? Do experiments intended to be identical lead to inexplicably different results? Back in 2014, the "Many Labs" project (htm) reported an ambitious attempt to answer these questions. More than 30 different labs ran the same set of studies and the paper presented the results side-by-side. They did not find any…
[62] Two-lines: The First Valid Test of U-Shaped Relationships
Can you have too many options in the menu, too many talented soccer players in a national team, or too many examples in an opening sentence? Social scientists often hypothesize u-shaped relationships like these, where the effect of x on y starts positive and becomes negative, or starts negative and becomes positive. Researchers rely almost…
[61] Why p-curve excludes ps>.05
In a recent working paper, Carter et al (htm) proposed that one can better correct for publication bias by including not just p<.05 results, the way p-curve does, but also p>.05 results [1]. Their paper, currently under review, aimed to provide a comprehensive simulation study that compared a variety of bias-correction methods for meta-analysis. Although the…
[60] Forthcoming in JPSP: A Non-Diagnostic Audit of Psychological Research
A forthcoming article in the Journal of Personality and Social Psychology has made an effort to characterize changes in the behavior of social and personality researchers over the last decade (.htm). In this post, we refer to it as “the JPSP article” and to the authors as "the JPSP authors." The research team, led by…
[59] PET-PEESE Is Not Like Homeopathy
PET-PEESE is a meta-analytical tool that seeks to correct for publication bias. In a footnote in my previous post (.htm), I referred to is as the homeopathy of meta-analysis. That was unfair and inaccurate. Unfair because, in the style of our President, I just called PET-PEESE a name instead of describing what I believed was…
[58] The Funnel Plot is Invalid Because of This Crazy Assumption: r(n,d)=0
The funnel plot is a beloved meta-analysis tool. It is typically used to answer the question of whether a set of studies exhibits publication bias. That’s a bad question because we always know the answer: it is “obviously yes.” Some researchers publish some null findings, but nobody publishes them all. It is also a bad…
[57] Interactions in Logit Regressions: Why Positive May Mean Negative
Of all economics papers published this century, the 10th most cited appeared in Economics Letters , a journal with an impact factor of 0.5. It makes an inconvenient and counterintuitive point: the sign of the estimate (b̂) of an interaction in a logit/probit regression, need not correspond to the sign of its effect on the…
[56] TWARKing: Test-Weighting After Results are Known
On the last class of the semester I hold a “town-hall” meeting; an open discussion about how to improve the course (content, delivery, grading, etc). I follow-up with a required online poll to “vote” on proposed changes [1]. Grading in my class is old-school. Two tests, each 40%, homeworks 20% (graded mostly on a completion…
[55] The file-drawer problem is unfixable, and that’s OK
The “file-drawer problem” consists of researchers not publishing their p>.05 studies (Rosenthal 1979 .htm). P-hacking consist of researchers not reporting their p>.05 analyses for a given study. P-hacking is easy to stop. File-drawering nearly impossible. Fortunately, while p-hacking is a real problem, file-drawering is not. Consequences of p-hacking vs file-drawering With p-hacking it’s easy to…
[54] The 90x75x50 heuristic: Noisy & Wasteful Sample Sizes In The “Social Science Replication Project”
An impressive team of researchers is engaging in an impressive task: Replicate 21 social science experiments published in Nature and Science in 2010-2015 (.htm). The task requires making many difficult decisions, including what sample sizes to use. The authors' current plan is a simple rule: Set n for the replication so that it would have 90%…