Data Colada

[50] Teenagers in Bikinis: Interpreting Police-Shooting Data

Posted on July 14, 2016February 15, 2020 by Uri Simonsohn

The New York Times, on Monday, showcased (.htm) an NBER working paper (.pdf) that proposed that “blacks are 23.8 percent less likely to be shot at by police relative to whites.” (p.22) The paper involved a monumental data collection effort to address an important societal question. The analyses are rigorous, clever and transparently reported. Nevertheless, I do…

[49] P-Curve Won’t Do Your Laundry, But Will Identify Replicable Findings

Posted on June 14, 2016February 12, 2020 by Uri, Joe, & Leif

In a recent critique, Bruns and Ioannidis (PlosONE 2016 .htm) proposed that p-curve makes mistakes when analyzing studies that have collected field/observational data. They write that in such cases: p-curves based on true effects and p‑curves based on null-effects with p-hacking cannot be reliably distinguished” (abstract). In this post we show, with examples involving sex,…

[48] P-hacked Hypotheses Are Deceivingly Robust

Posted on April 28, 2016January 30, 2020 by Uri Simonsohn

Sometimes we selectively report the analyses we run to test a hypothesis. Other times we selectively report which hypotheses we tested. One popular way to p-hack hypotheses involves subgroups. Upon realizing analyses of the entire sample do not produce a significant effect, we check whether analyses of various subsamples — women, or the young, or republicans, or…

[47] Evaluating Replications: 40% Full ≠ 60% Empty

Posted on March 3, 2016February 12, 2020 by Uri Simonsohn

Last October, Science published the paper “Estimating the Reproducibility of Psychological Science” (htm), which reported the results of 100 replication attempts. Today it published a commentary by Gilbert et al. (.htm) as well as a response by the replicators (.htm). The commentary makes two main points. First, because of sampling error, we should not expect all of…

[46] Controlling the Weather

Posted on February 2, 2016January 30, 2020 by Joe & Uri

Behavioral scientists have put forth evidence that the weather affects all sorts of things, including the stock market, restaurant tips, car purchases, product returns, art prices, and college admissions. It is not easy to properly study the effects of weather on human behavior. This is because weather is (obviously) seasonal, as is much of what…

[45] Ambitious P-Hacking and P-Curve 4.0

Posted on January 14, 2016February 11, 2020 by Uri, Joe, & Leif

In this post, we first consider how plausible it is for researchers to engage in more ambitious p-hacking (i.e., past the nominal significance level of p<.05). Then, we describe how we have modified p-curve (see app 4.0) to deal with this possibility. Ambitious p-hacking is hard. In “False-Positive Psychology” (SSRN), we simulated the consequences of four…

[44] AsPredicted: Pre-registration Made Easy

Posted on December 1, 2015February 11, 2020 by Uri, Joe, & Leif

Pre-registering a study consists of leaving a written record of how it will be conducted and analyzed. Very few researchers currently pre-register their studies. Maybe it’s because pre-registering is annoying. Maybe it's because researchers don't want to tie their own hands. Or maybe it's because researchers see no benefit to pre-registering. This post addresses these…

[43] Rain & Happiness: Why Didn’t Schwarz & Clore (1983) ‘Replicate’ ?

Posted on November 16, 2015February 11, 2020 by Uri Simonsohn

In my “Small Telescopes” paper, I introduced a new approach to evaluate replication results (SSRN). Among other examples, I described two studies as having failed to replicate the famous Schwarz and Clore (1983) finding that people report being happier with their lives when asked on sunny days. Figure and text from Small Telescopes paper (SSRN) I…

[42] Accepting the Null: Where to Draw the Line?

Posted on October 28, 2015February 11, 2020 by Uri Simonsohn

We typically ask if an effect exists. But sometimes we want to ask if it does not. For example, how many of the “failed” replications in the recent reproducibility project published in Science (.pdf) suggest the absence of an effect? Data have noise, so we can never say ‘the effect is exactly zero.’ We can…

[41] Falsely Reassuring: Analyses of ALL p-values

Posted on August 24, 2015October 18, 2023 by Uri Simonsohn

It is a neat idea. Get a ton of papers. Extract all p-values. Examine the prevalence of p-hacking by assessing if there are too many p-values near p=.05. Economists have done it [SSRN], as have psychologists [.html], and biologists [.html]. These charts with distributions of p-values come from those papers: The dotted circles highlight the excess of…