Data Colada
Menu
  • Home
  • About
  • Feedback Policy
  • Table of Contents
Menu

[14] How To Win A Football Prediction Contest: Ignore Your Gut


Posted on January 30, 2014March 20, 2016 by Joe Simmons

This is a boastful tale of how I used psychology to win dominate a football prediction contest.

Back in September, I was asked to represent my department – Operations and Information Management – in a Wharton School contest to predict NFL football game outcomes. Having always wanted a realistic chance to outperform Adam Grant at something, I agreed.

The contest involved making the same predictions that sports gamblers make. For each game, we predicted whether the superior team (the favorite) was going to beat the inferior team (the underdog) by more or less than the Las Vegas point spread. For example, when the very good New England Patriots played the less good Pittsburgh Steelers, we had to predict whether or not the Patriots would win by more than the 6.5-point point spread. We made 239 predictions across 16 weeks.

Contrary to popular belief, oddsmakers in Las Vegas don’t set point spreads in order to ensure that half of the money is wagered on the favorite and half the money is wagered on the underdog. Rather, their primary aim is to set accurate point spreads, one that gives the favorite (and underdog) a 50% chance to beat the spread. [1] Because Vegas is good at setting accurate spreads, it is very hard to perform better than chance when making these predictions. The only way to do it is to predict the NFL games better than Vegas does.

Enter Wharton professor Cade Massey and professional sports analyst Rufus Peabody. They’ve developed a statistical model that, for an identifiable subset of football games, outperforms Vegas. Their Massey-Peabody power rankings are featured in the Wall Street Journal, and from those rankings you can compute expected game outcomes. For example, their current rankings (shown below) say that the Broncos are 8.5 points better than the average team on a neutral field whereas the Seahawks are 8 points better. Thus, we can expect, on average, the Broncos to beat the Seahawks by 0.5 points if they were to play on a neutral field, as they will in Sunday’s Super Bowl. [2]

 

My approach to the contest was informed by two pieces of information.

First, my work with Leif (.pdf) has shown that naïve gamblers are biased when making these predictions – they predict favorites to beat the spread much more often than they predict underdogs to beat the spread. This is because people’s first impression about which team to bet on ignores the point spread and is thus based on a simpler prediction as to which team will win the game. Since the favorite is usually more likely to win, people’s first impressions tend to favor favorites. And because people rarely talk themselves out of these first impressions, they tend to predict favorites against the spread. This is true even though favorites don’t win against the spread more often than underdogs (paper 1, .pdf), and even when you manipulate the point spreads to make favorites more likely to lose (paper 2, .pdf). Intuitions for these predictions are just not useful.

Second, knowing that evidence-based algorithms are better forecasters than humans (.pdf), I used the Massey-Peabody algorithm for all my predictions.

So how did the results shake out? (Notes on Analyses; Data)

First, did my Wharton colleagues also show the bias toward favorites, a bias that would indicate that they are no more sophisticated than the typical gambler?

Yes. All of them predicted significantly more favorites than underdogs.

 

Second, how did I perform relative to the “competition?”

Since everyone loves a humble champion, let me just say that my victory is really a victory for Massey-Peabody. I don’t deserve all of the accolades. Really.

Yeah, for about the millionth time (see meta-analysis, .pdf), we see that statistical models outperform human forecasters. This is true even (especially?) when the humans are Wharton professors, students, and staff.

So, if you want to know who is going to win this Sunday’s Super Bowl, don’t ask me and don’t ask the bestselling author of Give and Take. Ask Massey-Peabody.

And they will tell you, unsatisfyingly, that the game is basically a coin flip.


Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

  1. Vegas still makes money in the long run because gamblers have to pay a fee in order to bet [↩]
  2. For any matchup involving home field advantage, give an additional 2.4 points to the home team [↩]
  • Click to share on Facebook (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to share on WhatsApp (Opens in new window)
  • More
  • Click to share on Pinterest (Opens in new window)
  • Click to share on Pocket (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Tumblr (Opens in new window)
  • Click to share on Reddit (Opens in new window)

Get email alerts

Join 2,231 other subscribers

Your hosts

Uri Simonsohn
Joe Simmons
Leif Nelson

Other Posts on Similar Topics

Discuss own paper, Just fun
  • [73] Don’t Trust Internal Meta-Analysis
  • [72] Metacritic Has A (File-Drawer) Problem
  • [62] Two-lines: The First Valid Test of U-Shaped Relationships
  • [56] TWARKing: Test-Weighting After Results are Known
  • [49] P-Curve Won’t Do Your Laundry, But Will Identify Replicable Findings
  • [45] Ambitious P-Hacking and P-Curve 4.0
  • [43] Rain & Happiness: Why Didn’t Schwarz & Clore (1983) ‘Replicate’ ?
  • [32] Spotify Has Trouble With A Marketing Research Exam
  • [30] Trim-and-Fill is Full of It (bias)
  • [22] You know what’s on our shopping list

tweeter & facebook

We tweet new posts: @DataColada
And link to them on our Facebook page

search

All posts

  • [75] Intentionally Biased: People Purposely Don’t Ignore Information They “Should” Ignore
  • [74] In Press at Psychological Science: A New ‘Nudge’ Supported by Implausible Data
  • [73] Don’t Trust Internal Meta-Analysis
  • [72] Metacritic Has A (File-Drawer) Problem
  • [71] The (Surprising?) Shape of the File Drawer
  • [70] How Many Studies Have Not Been Run? Why We Still Think the Average Effect Does Not Exist
  • [69] Eight things I do to make my open research more findable and understandable
  • [68] Pilot-Dropping Backfires (So Daryl Bem Probably Did Not Do It)
  • [67] P-curve Handles Heterogeneity Just Fine
  • [66] Outliers: Evaluating A New P-Curve Of Power Poses
  • [65] Spotlight on Science Journalism: The Health Benefits of Volunteering
  • [64] How To Properly Preregister A Study
  • [63] “Many Labs” Overestimated The Importance of Hidden Moderators
  • [62] Two-lines: The First Valid Test of U-Shaped Relationships
  • [61] Why p-curve excludes ps>.05
  • [60] Forthcoming in JPSP: A Non-Diagnostic Audit of Psychological Research
  • [59] PET-PEESE Is Not Like Homeopathy
  • [58] The Funnel Plot is Invalid Because of This Crazy Assumption: r(n,d)=0
  • [57] Interactions in Logit Regressions: Why Positive May Mean Negative
  • [56] TWARKing: Test-Weighting After Results are Known
  • [55] The file-drawer problem is unfixable, and that’s OK
  • [54] The 90x75x50 heuristic: Noisy & Wasteful Sample Sizes In The “Social Science Replication Project”
  • [53] What I Want Our Field To Prioritize
  • [52] Menschplaining: Three Ideas for Civil Criticism
  • [51] Greg vs. Jamal: Why Didn’t Bertrand and Mullainathan (2004) Replicate?
  • [50] Teenagers in Bikinis: Interpreting Police-Shooting Data
  • [49] P-Curve Won’t Do Your Laundry, But Will Identify Replicable Findings
  • [48] P-hacked Hypotheses Are Deceivingly Robust
  • [47] Evaluating Replications: 40% Full ≠ 60% Empty
  • [46] Controlling the Weather
  • [45] Ambitious P-Hacking and P-Curve 4.0
  • [44] AsPredicted: Pre-registration Made Easy
  • [43] Rain & Happiness: Why Didn’t Schwarz & Clore (1983) ‘Replicate’ ?
  • [42] Accepting the Null: Where to Draw the Line?
  • [41] Falsely Reassuring: Analyses of ALL p-values
  • [40] Reducing Fraud in Science
  • [39] Power Naps: When do Within-Subject Comparisons Help vs Hurt (yes, hurt) Power?
  • [38] A Better Explanation Of The Endowment Effect
  • [37] Power Posing: Reassessing The Evidence Behind The Most Popular TED Talk
  • [36] How to Study Discrimination (or Anything) With Names; If You Must
  • [35] The Default Bayesian Test is Prejudiced Against Small Effects
  • [34] My Links Will Outlive You
  • [33] “The” Effect Size Does Not Exist
  • [32] Spotify Has Trouble With A Marketing Research Exam
  • [31] Women are taller than men: Misusing Occam’s Razor to lobotomize discussions of alternative explanations
  • [30] Trim-and-Fill is Full of It (bias)
  • [29] Help! Someone Thinks I p-hacked
  • [28] Confidence Intervals Don’t Change How We Think about Data
  • [27] Thirty-somethings are Shrinking and Other U-Shaped Challenges
  • [26] What If Games Were Shorter?
  • [25] Maybe people actually enjoy being alone with their thoughts
  • [24] P-curve vs. Excessive Significance Test
  • [23] Ceiling Effects and Replications
  • [22] You know what’s on our shopping list
  • [21] Fake-Data Colada: Excessive Linearity
  • [20] We cannot afford to study effect size in the lab
  • [19] Fake Data: Mendel vs. Stapel
  • [18] MTurk vs. The Lab: Either Way We Need Big Samples
  • [17] No-way Interactions
  • [16] People Take Baths In Hotel Rooms
  • [15] Citing Prospect Theory
  • [14] How To Win A Football Prediction Contest: Ignore Your Gut
  • [13] Posterior-Hacking
  • [12] Preregistration: Not just for the Empiro-zealots
  • [11] “Exactly”: The Most Famous Framing Effect Is Robust To Precise Wording
  • [10] Reviewers are asking for it
  • [9] Titleogy: Some facts about titles
  • [8] Adventures in the Assessment of Animal Speed and Morality
  • [7] Forthcoming in the American Economic Review: A Misdiagnosed Failure-to-Replicate
  • [6] Samples Can’t Be Too Large
  • [5] The Consistency of Random Numbers
  • [4] The Folly of Powering Replications Based on Observed Effect Size
  • [3] A New Way To Increase Charitable Donations: Does It Replicate?
  • [2] Using Personal Listening Habits to Identify Personal Music Preferences
  • [1] "Just Posting It" works, leads to new retraction in Psychology

Pages

  • About
  • Policy on Soliciting Feedback From Authors
  • Table of Contents

Get email alerts

Follow Us

  • Twitter
  • Facebook
Data Colada - All Content Licensed: CC-BY [Creative Commons]