Data Colada

[134] Figuring Out Figure 1

Posted on March 19, 2026March 19, 2026 by Joe Simmons

A few years ago our Journal Club discussed an interesting methods paper entitled, “Putting Psychology to the Test: Rethinking Model Evaluation Through Benchmarking and Prediction” (.htm). This post describes my attempt to understand what’s happening in Figure 1 of that paper, which shows that extremely simple experiments can generate extremely negative R2s. I learned a…

[133] Heterofriendly: The Intuition for Why You Always Need Robust Standard Errors

Posted on March 2, 2026March 7, 2026 by Uri Simonsohn

When I taught my first PhD-level methods course, I invited students to submit questions about any topic in statistics or methodology. Six out of 10 students asked about the same topic: robust & clustered standard errors. It's clearly a topic they found both important and confusing. Psychologists basically never use robust standard errors. But they…

[132] statuser: R in user-friendly mode

Posted on February 16, 2026February 26, 2026 by Uri Simonsohn

t.test(), the R function for running t-tests, is disconcertingly imperfect. A t-test involves computing the difference between two means. And yet, t.test(), does not report… …said difference of means. It reports the p-value for the difference of means, it reports the confidence interval for the difference of means, but not the difference of means itself….

[131] Bending Over Backwards:
The Quadratic Puts the U in AI

Posted on December 10, 2025December 10, 2025 by Uri Simonsohn

For a recent journal club in Barcelona, we read a just published article in the Journal of Experimental Psychology: General (JEP:G). The paper is on the impact of using gen-AI on creativity. The paper proposes an inverted U: people are most creative with moderate levels of AI use. The paper has three studies. Studies 1…

[130] ResearchBox: Even Easier to Use and More Transparently Permanent than Before

Posted on November 17, 2025November 18, 2025 by Uri Simonsohn

Over the past 10 years or so, posting data, code, and materials for published papers has gone from eccentric to mundane. There are a few platforms that enable sharing research files, including ResearchBox. ResearchBox is hosted by the Wharton Credibility Lab, which I co-direct. We also host the pre-registration platform AsPredicted, and a new platform…

[129] P-curve works in practice, but would it work if you dropped a piano on it?

Posted on September 23, 2025September 23, 2025 by Uri Simonsohn

P-curve is a statistical tool we developed about 15 years ago to help rule out selective reporting, be it p-hacking or file-drawering, as the sole explanation for a set of significant results. This post is about a forthcoming critique of p-curve in the statistics journal JASA (pdf). The authors identify four p-curve properties they object…

[128] LinkedOut: The Best Published Audit Study, And Its Interesting Shortcoming

Posted on June 23, 2025June 22, 2025 by Uri Simonsohn

There is a recent QJE paper reporting a LinkedIn audit study comparing responses to requests by Black vs White young males. I loved the paper. At every turn you come across a clever, effortful, and effective solution to a challenge posed by studying discrimination in a field experiment. But, no paper is perfect, and this…

[127] Meaningless Means #4: Correcting Scientific Misinformation

Posted on June 18, 2025June 17, 2025 by Joe Simmons

Before we got distracted by things like being sued, we had been working on a series called Meaningless Means, which exposed the fact that meta-analytic averaging is (really) bad. When a meta-analysis says something like, “The average effect of mindsets on academic performance is d = .32”, you should not take it at face value….

[126] Stimulus Plots

Posted on June 2, 2025June 2, 2025 by Uri Simonsohn

When we design experiments, we have to decide how to generate and select the stimuli that we use to test our hypotheses. In a forthcoming JPSP article, “Stimulus Sampling Reimagined” (htm), we propose that for at least 60 years we have been thinking about stimulus selection in experiments in the wrong way [1]. Specifically, with…

[125] "Complexity" 2: Don't be mean to the median

Posted on April 1, 2025April 2, 2025 by Uri Simonsohn

In Colada[124] I summarized a co-authored critique (with Banki, Walatka and Wu) of a recent AER paper that proposed risk preferences reflect 'complexity' rather than preferences a-la Prospect Theory. Ryan Oprea, the AER author, has written a rejoinder (.pdf). Its first main point (pages 5-12), is that our results with medians are 'knife edge' (p.8),…