Uri Simonsohn, Author at Data Colada

[117] The Impersonator: The Fake Data Were Coming From Inside the Lab

Posted on June 12, 2024June 13, 2024 by Uri Simonsohn

A previous version of this post was supposed to go live in January 2019. But the day before it was scheduled, the Data Colada team (Uri, Leif, and Joe) received an email that we took to be a potential death threat. After discussions with the local police, the FBI, and our families, we decided to…

[115] Preregistration Prevalence

Posted on November 13, 2023November 13, 2023 by Uri Simonsohn

Pre-registration is the best and possibly only solution to p-hacking. Ten years ago, pre-registrations were virtually unheard of in psychology, but they have become increasingly common since then. I was curious just how common they have become, and so I collected some data. This post shares the results. The data From the Web of Science…

[108] MRAN is Dead, long live GRAN

Posted on April 28, 2023April 28, 2023 by Uri Simonsohn

Microsoft has been making daily copies of the entire CRAN website of R packages since 2014. This archive, named MRAN, allows installing older versions of packages, which is valuable for reproducibility purposes. The 15,000+ R packages on CRAN are incessantly updated. For example, the package tidyverse depends on 109 packages; these packages accumulate 63 updates, just…

[103] Mediation Analysis is Counterintuitively Invalid

Posted on September 26, 2022September 6, 2023 by Uri Simonsohn

Mediation analysis is very common in behavioral science despite suffering from many invalidating shortcomings. While most of the shortcomings are intuitive [1], this post focuses on a counterintuitive one. It is one of those quirky statistical things that can be fun to think about, so it would merit a blog post even if it were…

[102] R on Steroids: Running WAY faster simulations in R

Posted on September 6, 2022September 6, 2022 by Uri Simonsohn

This post shows how to run simulations (loops) in R that can go 50 times faster than the default approach of running code like: for (k in 1:100) on your laptop. Obviously, a bit of a niche post. There are two steps. Step 1 involves running parallel rather than sequential loops [1]. This step can…

[100] Groundhog 2.0: Further addressing the threat R poses to reproducible research

Posted on April 8, 2022April 9, 2022 by Uri Simonsohn

About a year ago I wrote Colada[95], a post on the threat R poses to reproducible research. The core issue is the 'packages'. When using R, you can run library(some_package) and R can all of a sudden scrape a website, cluster standard errors, maybe even help you levitate. The problem is that packages get updated…

[99] Hyping Fisher: The Most Cited 2019 QJE Paper Relied on an Outdated Stata Default to Conclude Regression p-values Are Inadequate

Posted on October 13, 2021October 27, 2021 by Uri Simonsohn

The paper titled "Channeling Fisher: Randomization Tests and the Statistical Insignificance of Seemingly Significant Experimental Results" (.htm) is currently the most cited 2019 article in the Quarterly Journal of Economics (372 Google cites). It delivers bad news to economists running experiments: their p-values are wrong. To get correct p-values, the article explains, they need to…

[96] Madam Speaker: Are Female Presenters Treated Worse in Econ Seminars?

Posted on April 30, 2021April 30, 2021 by Uri Simonsohn

A recent NBER paper titled "Gender and the Dynamics of Economics Seminars" (.htm) reports analyses of audience questions asked during 462 economics seminars, concluding that “women are asked more questions . . . and the questions asked of women are more likely to be patronizing or hostile . . . suggest[ing] yet another potential explanation…

[95] Groundhog: Addressing The Threat That R Poses To Reproducible Research

Posted on January 5, 2021June 26, 2023 by Uri Simonsohn

R, the free and open source program for statistical computing, poses a substantial threat to the reproducibility of published research. This post explains the problem and introduces a solution. The Problem: Packages R itself has some reproducibility problems (see example in this footnote [1]), but the big problem is its packages: the addon scripts that…

[91] p-hacking fast and slow: Evaluating a forthcoming AER paper deeming some econ literatures less trustworthy

Posted on September 15, 2020August 16, 2021 by Uri Simonsohn

The authors of a forthcoming AER article (.pdf), "Methods Matter: P-Hacking and Publication Bias in Causal Analysis in Economics", painstakingly harvested thousands of test results from 25 economics journals to answer an interesting question: Are studies that use some research designs more trustworthy than others? In this post I will explain why I think their…