Uri Simonsohn, Author at Data Colada

[58] The Funnel Plot is Invalid Because of This Crazy Assumption: r(n,d)=0

Posted on March 21, 2017February 12, 2020 by Uri Simonsohn

The funnel plot is a beloved meta-analysis tool. It is typically used to answer the question of whether a set of studies exhibits publication bias. That’s a bad question because we always know the answer: it is “obviously yes.” Some researchers publish some null findings, but nobody publishes them all. It is also a bad…

[57] Interactions in Logit Regressions: Why Positive May Mean Negative

Posted on February 23, 2017July 28, 2022 by Uri Simonsohn

Of all economics papers published this century, the 10th most cited appeared in Economics Letters , a journal with an impact factor of 0.5. It makes an inconvenient and counterintuitive point: the sign of the estimate (b̂) of an interaction in a logit/probit regression, need not correspond to the sign of its effect on the…

[56] TWARKing: Test-Weighting After Results are Known

Posted on January 3, 2017December 17, 2021 by Uri Simonsohn

On the last class of the semester I hold a “town-hall” meeting; an open discussion about how to improve the course (content, delivery, grading, etc). I follow-up with a required online poll to “vote” on proposed changes [1]. Grading in my class is old-school. Two tests, each 40%, homeworks 20% (graded mostly on a completion…

[55] The file-drawer problem is unfixable, and that’s OK

Posted on December 17, 2016February 12, 2020 by Uri Simonsohn

The “file-drawer problem” consists of researchers not publishing their p>.05 studies (Rosenthal 1979 .htm). P-hacking consist of researchers not reporting their p>.05 analyses for a given study. P-hacking is easy to stop. File-drawering nearly impossible. Fortunately, while p-hacking is a real problem, file-drawering is not. Consequences of p-hacking vs file-drawering With p-hacking it’s easy to…

[54] The 90x75x50 heuristic: Noisy & Wasteful Sample Sizes In The “Social Science Replication Project”

Posted on November 1, 2016February 12, 2020 by Uri Simonsohn

An impressive team of researchers is engaging in an impressive task: Replicate 21 social science experiments published in Nature and Science in 2010-2015 (.htm). The task requires making many difficult decisions, including what sample sizes to use. The authors' current plan is a simple rule: Set n for the replication so that it would have 90%…

[52] Menschplaining: Three Ideas for Civil Criticism

Posted on September 26, 2016September 25, 2016 by Uri Simonsohn

As bloggers, commentators, reviewers, and editors, we often criticize the work of fellow academics. In this post I share three ideas to be more civil and persuasive when doing so. But first: should we comment publicly in the first place? One of the best known social psychologist, Susan Fiske (.htm), last week circulated a draft of an invited opinion…

[51] Greg vs. Jamal: Why Didn’t Bertrand and Mullainathan (2004) Replicate?

Posted on September 6, 2016February 15, 2020 by Uri Simonsohn

Bertrand & Mullainathan (2004, .htm) is one of the best known and most cited American Economic Review (AER) papers [1]. It reports a field experiment in which resumes given typically Black names (e.g., Jamal and Lakisha) received fewer callbacks than those given typically White names (e.g., Greg and Emily). This finding is interpreted as evidence of racial discrimination…

[50] Teenagers in Bikinis: Interpreting Police-Shooting Data

Posted on July 14, 2016February 15, 2020 by Uri Simonsohn

The New York Times, on Monday, showcased (.htm) an NBER working paper (.pdf) that proposed that “blacks are 23.8 percent less likely to be shot at by police relative to whites.” (p.22) The paper involved a monumental data collection effort to address an important societal question. The analyses are rigorous, clever and transparently reported. Nevertheless, I do…

[48] P-hacked Hypotheses Are Deceivingly Robust

Posted on April 28, 2016January 30, 2020 by Uri Simonsohn

Sometimes we selectively report the analyses we run to test a hypothesis. Other times we selectively report which hypotheses we tested. One popular way to p-hack hypotheses involves subgroups. Upon realizing analyses of the entire sample do not produce a significant effect, we check whether analyses of various subsamples — women, or the young, or republicans, or…

[47] Evaluating Replications: 40% Full ≠ 60% Empty

Posted on March 3, 2016February 12, 2020 by Uri Simonsohn

Last October, Science published the paper “Estimating the Reproducibility of Psychological Science” (htm), which reported the results of 100 replication attempts. Today it published a commentary by Gilbert et al. (.htm) as well as a response by the replicators (.htm). The commentary makes two main points. First, because of sampling error, we should not expect all of…