The authors of a forthcoming AER article (.pdf), "Methods Matter: P-Hacking and Publication Bias in Causal Analysis in Economics", painstakingly harvested thousands of test results from 25 economics journals to answer an interesting question: Are studies that use some research designs more trustworthy than others? In this post I will explain why I think their…
Category: p-curve
[67] P-curve Handles Heterogeneity Just Fine
A few years ago, we developed p-curve (see p-curve.com), a statistical tool that identifies whether or not a set of statistically significant findings contains evidential value, or whether those results are solely attributable to the selective reporting of studies or analyses. It also estimates the true average power of a set of significant findings [1]….
[66] Outliers: Evaluating A New P-Curve Of Power Poses
In a forthcoming Psych Science paper, Cuddy, Schultz, & Fosse, hereafter referred to as CSF, p-curved 55 power-posing studies (.pdf | SSRN), concluding that they contain evidential value [1]. Thirty-four of those studies were previously selected and described as “all published tests” (p. 657) by Carney, Cuddy, & Yap (2015; .htm). Joe and Uri p-curved…
[61] Why p-curve excludes ps>.05
In a recent working paper, Carter et al (htm) proposed that one can better correct for publication bias by including not just p<.05 results, the way p-curve does, but also p>.05 results [1]. Their paper, currently under review, aimed to provide a comprehensive simulation study that compared a variety of bias-correction methods for meta-analysis. Although the…
[60] Forthcoming in JPSP: A Non-Diagnostic Audit of Psychological Research
A forthcoming article in the Journal of Personality and Social Psychology has made an effort to characterize changes in the behavior of social and personality researchers over the last decade (.htm). In this post, we refer to it as “the JPSP article” and to the authors as "the JPSP authors." The research team, led by…
[59] PET-PEESE Is Not Like Homeopathy
PET-PEESE is a meta-analytical tool that seeks to correct for publication bias. In a footnote in my previous post (.htm), I referred to is as the homeopathy of meta-analysis. That was unfair and inaccurate. Unfair because, in the style of our President, I just called PET-PEESE a name instead of describing what I believed was…
[49] P-Curve Won’t Do Your Laundry, But Will Identify Replicable Findings
In a recent critique, Bruns and Ioannidis (PlosONE 2016 .htm) proposed that p-curve makes mistakes when analyzing studies that have collected field/observational data. They write that in such cases: p-curves based on true effects and p‑curves based on null-effects with p-hacking cannot be reliably distinguished” (abstract). In this post we show, with examples involving sex,…
[45] Ambitious P-Hacking and P-Curve 4.0
In this post, we first consider how plausible it is for researchers to engage in more ambitious p-hacking (i.e., past the nominal significance level of p<.05). Then, we describe how we have modified p-curve (see app 4.0) to deal with this possibility. Ambitious p-hacking is hard. In “False-Positive Psychology” (SSRN), we simulated the consequences of four…
[41] Falsely Reassuring: Analyses of ALL p-values
It is a neat idea. Get a ton of papers. Extract all p-values. Examine the prevalence of p-hacking by assessing if there are too many p-values near p=.05. Economists have done it [SSRN], as have psychologists [.html], and biologists [.html]. These charts with distributions of p-values come from those papers: The dotted circles highlight the excess of…
[37] Power Posing: Reassessing The Evidence Behind The Most Popular TED Talk
A recent paper in Psych Science (.pdf) reports a failure to replicate the study that inspired a TED Talk that has been seen 25 million times. [1] The talk invited viewers to do better in life by assuming high-power poses, just like Wonder Woman’s below, but the replication found that power-posing was inconsequential. If an…