Of all economics papers published this century, the 10th most cited appeared in Economics Letters , a journal with an impact factor of 0.5. It makes an inconvenient and counterintuitive point: the sign of the estimate (b̂) of an interaction in a logit/probit regression, need not correspond to the sign of its effect on the…
Category: Unexpectedly Difficult Statistical Concepts
[50] Teenagers in Bikinis: Interpreting Police-Shooting Data
The New York Times, on Monday, showcased (.htm) an NBER working paper (.pdf) that proposed that “blacks are 23.8 percent less likely to be shot at by police relative to whites.” (p.22) The paper involved a monumental data collection effort to address an important societal question. The analyses are rigorous, clever and transparently reported. Nevertheless, I do…
[46] Controlling the Weather
Behavioral scientists have put forth evidence that the weather affects all sorts of things, including the stock market, restaurant tips, car purchases, product returns, art prices, and college admissions. It is not easy to properly study the effects of weather on human behavior. This is because weather is (obviously) seasonal, as is much of what…
[42] Accepting the Null: Where to Draw the Line?
We typically ask if an effect exists. But sometimes we want to ask if it does not. For example, how many of the “failed” replications in the recent reproducibility project published in Science (.pdf) suggest the absence of an effect? Data have noise, so we can never say ‘the effect is exactly zero.’ We can…
[41] Falsely Reassuring: Analyses of ALL p-values
It is a neat idea. Get a ton of papers. Extract all p-values. Examine the prevalence of p-hacking by assessing if there are too many p-values near p=.05. Economists have done it [SSRN], as have psychologists [.html], and biologists [.html]. These charts with distributions of p-values come from those papers: The dotted circles highlight the excess of…
[39] Power Naps: When do Within-Subject Comparisons Help vs Hurt (yes, hurt) Power?
A recent Science-paper (.html) used a total sample size of N=40 to arrive at the conclusion that implicit racial and gender stereotypes can be reduced while napping. N=40 is a small sample for a between-subject experiment. One needs N=92 to reliably detect that men are heavier than women (SSRN). The study, however, was within-subject, for instance, its dependent…
[33] "The" Effect Size Does Not Exist
Consider the robust phenomenon of anchoring, where people’s numerical estimates are biased towards arbitrary starting points. What does it mean to say “the” effect size of anchoring? It surely depends on moderators like domain of the estimate, expertise, and perceived informativeness of the anchor. Alright, how about “the average” effect-size of anchoring? That's simple enough….
[27] Thirty-somethings are Shrinking and Other U-Shaped Challenges
A recent Psych Science (.pdf) paper found that sports teams can perform worse when they have too much talent. For example, in Study 3 they found that NBA teams with a higher percentage of talented players win more games, but that teams with the highest levels of talented players win fewer games. The hypothesis is easy enough…
[20] We cannot afford to study effect size in the lab
Methods people often say – in textbooks, task forces, papers, editorials, over coffee, in their sleep – that we should focus more on estimating effect sizes rather than testing for significance. I am kind of a methods person, and I am kind of going to say the opposite. Only kind of the opposite because it…
[17] No-way Interactions
This post shares a shocking and counterintuitive fact about studies looking at interactions where effects are predicted to get smaller (attenuated interactions). I needed a working example and went with Fritz Strack et al.’s (1988, .html) famous paper [933 Google cites], in which participants rated cartoons as funnier if they saw them while holding a…