Data Colada
Menu
  • Home
  • Table of Contents
  • Feedback Policy
  • Seminar
  • About
Menu

[33] "The" Effect Size Does Not Exist


Posted on February 9, 2015February 11, 2020 by Uri Simonsohn

Consider the robust phenomenon of anchoring, where people’s numerical estimates are biased towards arbitrary starting points. What does it mean to say “the” effect size of anchoring?

It surely depends on moderators like domain of the estimate, expertise, and perceived informativeness of the anchor. Alright, how about “the average” effect-size of anchoring? That's simple enough. Right? Actually, that's where the problem of interest to this post arises. Computing the average requires answering the following unanswerable question: How much weight should each possible effect-size get when computing “the average?” effect size?

Should we weight by number of studies? Imagined, planned, or executed? Or perhaps weight by how clean (free-of-confounds) each study is? Or by sample size?

Say anchoring effects are larger when estimating river lengths than door heights, does “the average” anchoring effect give all river studies combined 50% weight and all door studies the other 50%? If so, what do we do with canal-length studies, combine them with rivers or count them on their own?

If we weight by study rather than stimulus, "the average” effect gets larger as more rivers studies are conducted, and if we weight by sample size “the average” gets smaller if we run more subjects in the door studies.

31 13

What about the impact of anchoring on perceived strawberry-jam viscosity. Nobody has yet studied that but they could, does “the average” anchoring effect-size include this one?

What about all the zero estimates one would get if the experiment was done in a room without any lights or with confusing instructions?  What about all the large effects one would get via demand effects or confounds? Does the average include these?

Studies aren’t random
We can think of the problem using a sampling framework: the studies we run are a sample of the studies we could run. Just not a random sample.

Cheat-sheet. Random sample: every member of the population is equally likely to be selected.

First, we cannot run studies randomly, because we don’t know the relative frequency of every possible study in the population of studies. We don’t know how many “door” vs “river” studies exist in this platonic universe, so we don’t know with what probability to run a door vs a river study.

Second, we don’t want to run studies randomly, we want studies that will provide new information, that are similar to those we have seen elsewhere, that will have higher rhetorical value in a talk or paper, that we find intrinsically interesting, that are less confounded, etc. [1]

What can we estimate?
Given a set of studies, we can ask what is the average effect of those studies. We have to worry, of course, about publication bias, p-curve is just the tool for that. If we apply p-curve to a set of studies it tells use what effect we expect to get if we run those same studies again.

To generalize beyond the data requires judgment rather than statistics.
Judgment can account for non-randomly run studies in a way that statistics cannot.

Wide logo


Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

  1. Running studies with a set instead of a single stimulus is nevertheless very important, but for construct rather than external validity. Running a set of stimuli reduces the risks of stumbling on the single confounded stimulus that works. Check out the excellent “Stimulus Sampling” paper by Wells and Windschitl (.pdf) [↩]

Get Colada email alerts.

Join 5,108 other subscribers

Social media

We tweet new posts: @DataColada
And mastopost'em: @DataColada@mas.to
And link to them on our Facebook page

Recent Posts

  • [107] Meaningless Means #3: The Truth About Lies
  • [106] Meaningless Means #2: The Average Effect of Nudging in Academic Publications is 8.7%
  • [105] Meaningless Means #1: The Average Effect
    of Nudging Is d = .43
  • [104] Meaningless Means: Some Fundamental Problems With Meta-Analytic Averages
  • [103] Mediation Analysis is Counterintuitively Invalid

Get blogpost email alerts

Join 5,108 other subscribers

tweeter & facebook

We tweet new posts: @DataColada
And link to them on our Facebook page

Posts on similar topics

Meta Analysis, Unexpectedly Difficult Statistical Concepts
  • [107] Meaningless Means #3: The Truth About Lies
  • [106] Meaningless Means #2: The Average Effect of Nudging in Academic Publications is 8.7%
  • [105] Meaningless Means #1: The Average Effect
    of Nudging Is d = .43
  • [104] Meaningless Means: Some Fundamental Problems With Meta-Analytic Averages
  • [103] Mediation Analysis is Counterintuitively Invalid
  • [99] Hyping Fisher: The Most Cited 2019 QJE Paper Relied on an Outdated Stata Default to Conclude Regression p-values Are Inadequate
  • [91] p-hacking fast and slow: Evaluating a forthcoming AER paper deeming some econ literatures less trustworthy
  • [88] The Hot-Hand Artifact for Dummies & Behavioral Scientists
  • [80] Interaction Effects Need Interaction Controls
  • [79] Experimentation Aversion: Reconciling the Evidence

search

© 2021, Uri Simonsohn, Leif Nelson, and Joseph Simmons. For permission to reprint individual blog posts on DataColada please contact us via email..