Data Colada
Menu
  • Home
  • About
  • Feedback Policy
  • Table of Contents
Menu

[30] Trim-and-Fill is Full of It (bias)


Posted on December 3, 2014September 27, 2017 by Uri Joe Leif

Statistically significant findings are much more likely to be published than non-significant ones (no citation necessary). Because overestimated effects are more likely to be statistically significant than are underestimated effects, this means that most published effects are overestimates. Effects are smaller – often much smaller – than the published record suggests.

For meta-analysts the gold standard procedure to correct for this bias, with >1700 Google cites, is called Trim-and-Fill (Duval & Tweedie 2000, .pdf). In this post we show Trim-and-Fill generally does not work.

What is Trim-and-Fill?
When you have effect size estimates from a set of studies, you can plot those estimates with effect size on the x-axis and a measure of precision (e.g., sample size or standard error) on the y-axis. In the absence of publication bias this chart is symmetric: noisy estimates are sometimes too big and sometimes too small. In the presence of publication bias the small estimates are missing. Trim-and-Fill deletes (i.e., trims) some of those large-effect studies and adds (i.e., fills) small-effect studies, so that the plot is symmetric. The average effect size in this synthetic set of studies is Trim-and-Fill’s “publication bias corrected” estimate.

What is Wrong With It?
A known limitation of Trim-and-Fill is that it can correct for publication bias that does not exist, underestimating effect sizes (see e.g., Terrin et al 2003, .pdf). A less known limitation is that it generally does not correct for the publication bias that does exist, overestimating effect sizes.

The chart below shows the results of simulations we conducted for our just published “P-Curve and Effect Size” paper (SSRN). We simulated large meta-analyses aggregating studies comparing two means, with sample sizes ranging from 10-70, for five different true effect sizes. The chart plots true effect sizes against estimated effect sizes in a context in which we only observe significant (publishable) findings (R Code for this [Figure 2b] and all other results in our paper).

f1blog_2

Start with the blue line at the top. That line shows what happens when you simply average only the statistically significant findings–that is, only the findings that would typically be observed in the published literature. As we might expect, those effect size estimates are super biased.

The black line shows what happens when you “correct” for this bias using Trim-and-Fill. Effect size estimates are still super biased, especially when the effect is nonexistent or small.

Aside: p-curve nails it.

We were wrong
Trim-and-Fill assumes that studies with relatively smaller effects are not published (e.g., that out of 20 studies attempted, the 3 obtaining the smallest effect size are not publishable). In most fields, however, publication bias is governed by p-values rather than effect size (e.g., out of 20 studies only those with p<.05 are publishable).

Until a few weeks ago we thought that this incorrect assumption led to Trim-and-Fill’s poor performance. For instance, in our paper (SSRN) we wrote

when the publication process suppresses nonsignificant findings, Trim-and-Fill is woefully inadequate as a corrective technique.” (p.667)

For this post we conducted additional analyses and learned that Trim-and-Fill performs poorly even when its assumptions are met–that is, even when only small-effect studies go unpublished (R Code). Trim-and-Fill seems to work well only when few studies are missing, that is, where there is little bias to be corrected. In situations when a correction is most needed, Trim-and-Fill does not correct nearly enough.

Two Recommendations
1) Stop using Trim-and-Fill in meta-analyses.
2) Stop treating published meta-analyses with a Trim-and-Fill “correction” as if they have corrected for publication bias. They have not.

Wide logo


Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Author response:
Our policy at Data Colada is to contact authors whose work we cover, offering an opportunity to provide feedback and to comment within our original post. Trim-and-Fill was originally created by Sue Duval and the late Richard Tweedie.  We contacted Dr. Duval and exchanged a few emails but she did not provide feedback nor a response.

  • Click to share on Facebook (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to share on WhatsApp (Opens in new window)
  • More
  • Click to share on Pinterest (Opens in new window)
  • Click to share on Pocket (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Tumblr (Opens in new window)
  • Click to share on Reddit (Opens in new window)

Get email alerts

Join 2,231 other subscribers

Your hosts

Uri Simonsohn
Joe Simmons
Leif Nelson

Other Posts on Similar Topics

Discuss own paper
  • [73] Don’t Trust Internal Meta-Analysis
  • [62] Two-lines: The First Valid Test of U-Shaped Relationships
  • [49] P-Curve Won’t Do Your Laundry, But Will Identify Replicable Findings
  • [45] Ambitious P-Hacking and P-Curve 4.0
  • [43] Rain & Happiness: Why Didn’t Schwarz & Clore (1983) ‘Replicate’ ?
  • [30] Trim-and-Fill is Full of It (bias)
  • [14] How To Win A Football Prediction Contest: Ignore Your Gut

tweeter & facebook

We tweet new posts: @DataColada
And link to them on our Facebook page

search

All posts

  • [75] Intentionally Biased: People Purposely Don’t Ignore Information They “Should” Ignore
  • [74] In Press at Psychological Science: A New ‘Nudge’ Supported by Implausible Data
  • [73] Don’t Trust Internal Meta-Analysis
  • [72] Metacritic Has A (File-Drawer) Problem
  • [71] The (Surprising?) Shape of the File Drawer
  • [70] How Many Studies Have Not Been Run? Why We Still Think the Average Effect Does Not Exist
  • [69] Eight things I do to make my open research more findable and understandable
  • [68] Pilot-Dropping Backfires (So Daryl Bem Probably Did Not Do It)
  • [67] P-curve Handles Heterogeneity Just Fine
  • [66] Outliers: Evaluating A New P-Curve Of Power Poses
  • [65] Spotlight on Science Journalism: The Health Benefits of Volunteering
  • [64] How To Properly Preregister A Study
  • [63] “Many Labs” Overestimated The Importance of Hidden Moderators
  • [62] Two-lines: The First Valid Test of U-Shaped Relationships
  • [61] Why p-curve excludes ps>.05
  • [60] Forthcoming in JPSP: A Non-Diagnostic Audit of Psychological Research
  • [59] PET-PEESE Is Not Like Homeopathy
  • [58] The Funnel Plot is Invalid Because of This Crazy Assumption: r(n,d)=0
  • [57] Interactions in Logit Regressions: Why Positive May Mean Negative
  • [56] TWARKing: Test-Weighting After Results are Known
  • [55] The file-drawer problem is unfixable, and that’s OK
  • [54] The 90x75x50 heuristic: Noisy & Wasteful Sample Sizes In The “Social Science Replication Project”
  • [53] What I Want Our Field To Prioritize
  • [52] Menschplaining: Three Ideas for Civil Criticism
  • [51] Greg vs. Jamal: Why Didn’t Bertrand and Mullainathan (2004) Replicate?
  • [50] Teenagers in Bikinis: Interpreting Police-Shooting Data
  • [49] P-Curve Won’t Do Your Laundry, But Will Identify Replicable Findings
  • [48] P-hacked Hypotheses Are Deceivingly Robust
  • [47] Evaluating Replications: 40% Full ≠ 60% Empty
  • [46] Controlling the Weather
  • [45] Ambitious P-Hacking and P-Curve 4.0
  • [44] AsPredicted: Pre-registration Made Easy
  • [43] Rain & Happiness: Why Didn’t Schwarz & Clore (1983) ‘Replicate’ ?
  • [42] Accepting the Null: Where to Draw the Line?
  • [41] Falsely Reassuring: Analyses of ALL p-values
  • [40] Reducing Fraud in Science
  • [39] Power Naps: When do Within-Subject Comparisons Help vs Hurt (yes, hurt) Power?
  • [38] A Better Explanation Of The Endowment Effect
  • [37] Power Posing: Reassessing The Evidence Behind The Most Popular TED Talk
  • [36] How to Study Discrimination (or Anything) With Names; If You Must
  • [35] The Default Bayesian Test is Prejudiced Against Small Effects
  • [34] My Links Will Outlive You
  • [33] “The” Effect Size Does Not Exist
  • [32] Spotify Has Trouble With A Marketing Research Exam
  • [31] Women are taller than men: Misusing Occam’s Razor to lobotomize discussions of alternative explanations
  • [30] Trim-and-Fill is Full of It (bias)
  • [29] Help! Someone Thinks I p-hacked
  • [28] Confidence Intervals Don’t Change How We Think about Data
  • [27] Thirty-somethings are Shrinking and Other U-Shaped Challenges
  • [26] What If Games Were Shorter?
  • [25] Maybe people actually enjoy being alone with their thoughts
  • [24] P-curve vs. Excessive Significance Test
  • [23] Ceiling Effects and Replications
  • [22] You know what’s on our shopping list
  • [21] Fake-Data Colada: Excessive Linearity
  • [20] We cannot afford to study effect size in the lab
  • [19] Fake Data: Mendel vs. Stapel
  • [18] MTurk vs. The Lab: Either Way We Need Big Samples
  • [17] No-way Interactions
  • [16] People Take Baths In Hotel Rooms
  • [15] Citing Prospect Theory
  • [14] How To Win A Football Prediction Contest: Ignore Your Gut
  • [13] Posterior-Hacking
  • [12] Preregistration: Not just for the Empiro-zealots
  • [11] “Exactly”: The Most Famous Framing Effect Is Robust To Precise Wording
  • [10] Reviewers are asking for it
  • [9] Titleogy: Some facts about titles
  • [8] Adventures in the Assessment of Animal Speed and Morality
  • [7] Forthcoming in the American Economic Review: A Misdiagnosed Failure-to-Replicate
  • [6] Samples Can’t Be Too Large
  • [5] The Consistency of Random Numbers
  • [4] The Folly of Powering Replications Based on Observed Effect Size
  • [3] A New Way To Increase Charitable Donations: Does It Replicate?
  • [2] Using Personal Listening Habits to Identify Personal Music Preferences
  • [1] "Just Posting It" works, leads to new retraction in Psychology

Pages

  • About
  • Policy on Soliciting Feedback From Authors
  • Table of Contents

Get email alerts

Follow Us

  • Twitter
  • Facebook
Data Colada - All Content Licensed: CC-BY [Creative Commons]