Researchers have increasingly been using internal meta-analysis to summarize the evidence from multiple studies within the same paper. Much of the time, this involves computing the average effect size across the studies, and assessing whether that effect size is significantly different from zero. At first glance, internal meta-analysis seems like a wonderful idea. It increases…
Category: file-drawer
[72] Metacritic Has A (File-Drawer) Problem
Metacritic.com scores and aggregates critics’ reviews of movies, music, and video games. The website provides a summary assessment of the critics’ evaluations, using a scale ranging from 0 to 100. Higher numbers mean that critics were more favorable. In theory, this website is pretty awesome, seemingly leveraging the wisdom-of-crowds to give consumers the most reliable…
[71] The (Surprising?) Shape of the File Drawer
Let’s start with a question so familiar that you will have answered it before the sentence is even completed: How many studies will a researcher need to run before finding a significant (p<.05) result? (If she is studying a non-existent effect and if she is not p-hacking.) Depending on your sophistication, wariness about being asked…
[59] PET-PEESE Is Not Like Homeopathy
PET-PEESE is a meta-analytical tool that seeks to correct for publication bias. In a footnote in my previous post (.htm), I referred to is as the homeopathy of meta-analysis. That was unfair and inaccurate. Unfair because, in the style of our President, I just called PET-PEESE a name instead of describing what I believed was…
[58] The Funnel Plot is Invalid Because of This Crazy Assumption: r(n,d)=0
The funnel plot is a beloved meta-analysis tool. It is typically used to answer the question of whether a set of studies exhibits publication bias. That’s a bad question because we always know the answer: it is “obviously yes.” Some researchers publish some null findings, but nobody publishes them all. It is also a bad…
[55] The file-drawer problem is unfixable, and that’s OK
The “file-drawer problem” consists of researchers not publishing their p>.05 studies (Rosenthal 1979 .htm). P-hacking consist of researchers not reporting their p>.05 analyses for a given study. P-hacking is easy to stop. File-drawering nearly impossible. Fortunately, while p-hacking is a real problem, file-drawering is not. Consequences of p-hacking vs file-drawering With p-hacking it’s easy to…
[24] P-curve vs. Excessive Significance Test
In this post I use data from the Many-Labs replication project to contrast the (pointless) inferences one arrives at using the Excessive Significant Test, with the (critically important) inferences one arrives at with p-curve. The many-labs project is a collaboration of 36 labs around the world, each running a replication of 13 published effects in…