Data Colada
Menu
  • Home
  • Table of Contents
  • Feedback Policy
  • About
Menu

[1] "Just Posting It" works, leads to new retraction in Psychology


Posted on September 17, 2013February 11, 2020 by Uri Simonsohn

The fortuitous discovery of new fake data.
For a project I worked on this past May, I needed data for variables as different from each other as possible. From the data-posting journal Judgment and Decision Making I downloaded data for ten, including one from a now retracted paper involving the estimation of coin sizes. I created a chart and inserted it into a paper that I sent to several colleagues, and into slides presented at an APS talk.

An anonymous colleague, “Larry,” saw the chart and, for not-entirely obvious reasons, became interested in the coin-size study. After downloading the publicly available data he noticed something odd (something I had not noticed): while each participant had evaluated four coins, the data contained only one column of estimates. The average? No, for all entries were integers; averages of four numbers are rarely integers. Something was off.

Interest piqued, he did more analyses leading to more anomalies. He shared them with the editor, who contacted the author. The author provided explanations. These were nearly as implausible as they were incapable of accounting for the anomalies. The retraction ensued.

Some of the anomalies
1. Contradiction with paper
Paper describes 0-10 integer scale, dataset has decimals and negative numbers.
image

2. Implausible correlations among emotion measures
Shame and embarrassment are intimately related emotions, and yet they are correlated negatively in the data r = -.27. Fear and anxiety: r = -.01. Real emotion ratings don’t exhibit these correlations.

3. Impossibly similar results
Fabricated data often exhibit a pattern of excessive similarity (e.g., very similar means across conditions). This pattern led to uncovering Sanna and Smeesters as fabricateurs (see “Just Post It” paper). Diederik Stapel’s data also exhibit excessive similarity, going back to his dissertation at least.

The coin-size paper also has excessive similarity. For example, coin-size estimates supposedly obtained from 49 individuals across two different experiments are almost identical:
Experiment 1 (n=25): 2,3,3,3,3,4,4,4,4,4,5,5,5,5,5,5,5,6,6,6,6,6,6,6,7
Experiment 2 (n=24): 2,3,3,3,3,4,4,4,4,4,5,5,5,5,5,5,_,6,6,6,6,6,6,6,7

Simulations drawing random samples from the data themselves (bootstrapping) show that it is nearly impossible to obtain such similar results. The hypothesis that these data came from random samples is rejected, p<.000025 (see R code, detailed explanation).
image

Who vs. which
These data are fake beyond reasonable doubt.  We don’t know, however, who faked them.
That question is of obvious importance to the authors of the paper and perhaps their home and granting institutions, but arguably not so much to the  research community more broadly. We should care, instead, about which data are fake.

If other journals followed the lead of Judgment and Decision Making and required data posting (its  editor Jon Baron, by the way,  started the data posting policy well before I wrote my “Just Post It”), we would have a much easier time identifying invalid data.  Some of the coin-size authors have  a paper in JESP, one in Psychological Science, and another with similar results  in Appetite.  If the data behind those papers were available, we would not need to speculate as to their validity.

Author's response
When discussing the work of others, our policy here at Data Colada is to contact them before posting. We ask for feedback to avoid inaccuracies and misunderstandings, and  give authors space for commenting within our original blog post. The corresponding author of the retracted article,  Dr. Wen-Bin Chiou, wrote to me via email:

Although the data collection and data coding was done by my research assistant, I must be responsible for the issue.Unfortunately, the RA had left my lab last year and studied abroad. At this time, I cannot get the truth from him and find out what was really going wrong […] as to the decimal points and negative numbers, I recoded the data myself and sent the editor with the new dataset. I guess the problem does not exist in the new dataset. With regard to the impossible similar results, the RA sorted the coin-size estimate variable, producing the similar results. […]  Finally, I would like to thank Dr. Simonsohn for including my clarifications in this post.
[See unedited version]

Uri's note: the similarity of data is based on the frequency of values across samples, not their order, so sorting does not explain  that the data are incompatible with random sampling.

Related

Get Colada email alerts.

Join 10.5K other subscribers

Social media

Recent Posts

  • [125] "Complexity" 2: Don't be mean to the median
  • [124] "Complexity": 75% of participants missed comprehension questions in AER paper critiquing Prospect Theory
  • [123] Dear Political Scientists: The binning estimator violates ceteris paribus
  • [122] Arresting Flexibility: A QJE field experiment on police behavior with about 40 outcome variables
  • [121] Dear Political Scientists: Don't Bin, GAM Instead

Get blogpost email alerts

Join 10.5K other subscribers

tweeter & facebook

We announce posts on Twitter
We announce posts on Bluesky
And link to them on our Facebook page

Posts on similar topics

Discuss Paper by Others, Fake data
  • [125] "Complexity" 2: Don't be mean to the median
  • [124] "Complexity": 75% of participants missed comprehension questions in AER paper critiquing Prospect Theory
  • [122] Arresting Flexibility: A QJE field experiment on police behavior with about 40 outcome variables
  • [121] Dear Political Scientists: Don't Bin, GAM Instead
  • [119] A Hidden Confound in a Psych Methods Pre‑registrations Critique
  • [118] Harvard’s Gino Report Reveals How A Dataset Was Altered
  • [117] The Impersonator: The Fake Data Were Coming From Inside the Lab
  • [114] Exhibits 3, 4, and 5
  • [112] Data Falsificada (Part 4): "Forgetting The Words"
  • [111] Data Falsificada (Part 3): "The Cheaters Are Out of Order"

search

© 2021, Uri Simonsohn, Leif Nelson, and Joseph Simmons. For permission to reprint individual blog posts on DataColada please contact us via email..