Data Colada
Menu
  • Home
  • About
  • Feedback Policy
  • Table of Contents
Menu

[46] Controlling the Weather


Posted on February 2, 2016January 24, 2019 by Joe & Uri

Behavioral scientists have put forth evidence that the weather affects all sorts of things, including the stock market, restaurant tips, car purchases, product returns, art prices, and college admissions.

It is not easy to properly study the effects of weather on human behavior. This is because weather is (obviously) seasonal, as is much of what people do. This means that any investigation of the relation between weather and behavior must properly control for seasonality.

For example, in the U.S., Google searches for "fireworks" correlate positively with temperature throughout the year, but only because July 4th is in the summer. This is a seasonal effect, not a weather effect.
f0Almost every weather paper tries to control for seasonality. This post shows they don't control enough.

How do they do it?
To answer this question, we gathered a sample of 10 articles that used weather as a predictor. [1]
t1In economics, business, statistics, and psychology, authors use monthly and occasionally weekly controls to account for seasonality. For instance they ask, "Does how cold it was when a coat was bought predict if it was returned, controlling for the month of the year in which it was purchased?"

That's not enough.
The figures below show the average daily temperature in Philadelphia, along with the estimates provided by monthly (left panel) and weekly (right panel) fixed effects. These figures remind us that the weather does not jump discretely from month to month or week to week. Rather, weather, like earth, moves continuously. This means that seasonal confounds, which are continuous, will survive discrete (monthly or weekly) controls.

F12The vertical distance between the blue lines (monthly/weekly dummies) captures the residual seasonality confound. For example, during March (just left of the '100 day' tick), the monthly dummy assigns 44 degrees to every March day, but temperature systematically fluctuates within March, from a long-term average of 39 degrees on March 1st to a long-term average of 50 degrees on March 31st. This is a seasonally confounded 11-degree difference that is entirely unaccounted for by monthly dummies.

The confounded effect of seasonality that survives weekly dummies is roughly 1/4 that size.

Fixing it.
The easy solution is to control for the historical average of the weather variable of interest for each calendar date.[2]

For example, when using how cold January 24, 2013 was to predict whether a coat bought that day was eventually returned, we include as a covariate the historical average temperature for January 24th  (in that city).[3]

Demonstrating the easy fix
To demonstrate how well this works, we analyze a correlation that is entirely due to a seasonal confound: the number of daylight hours  in Bangkok, Thailand (sunset – sunrise), and the temperature that same day in Philadelphia (data: .dta | .csv| ).  Colder days in Philadelphia tend to be shorter days in Bangkok, but not because coldness in one place shortens the day in the other (nor vice versa), but because seasonal patterns influence both variables. Properly controlling for seasonality should eliminate an association between these variables.

Using day duration in Bangkok as the dependent variable and temperature in Philly as the predictor, we threw in monthly and then weekly dummies to control for the seasonal confound. Neither technique fully succeeded, as same-day temperature survived as a significant predictor. (STATA .do)

t2BangkokThus, using monthly and weekly dummy variables made it seem like, over and above the effects of seasonality, colder days are more likely to be shorter. However, controlling for the historical average daily temperature showed, correctly, that seasonality is the sole driver of this relationship.

Wide logo

Original author feedback:
We shared a draft of this post with authors from all 10 papers from Table 1 and we heard back from 5 of them. Their feedback led to correcting errors in Table 1, changing the title of the post, and fixing the day-duration example (Table 2). Devin Pope, moreover, conducted our suggested analysis on his convertible purchases (QJE) paper and shared the results with us. The finding is robust to our suggested additional control. Devin thought it was valuable to highlight that while historic temperature average is a better control for weather-based seasonality, reducing bias, weekly/monthly dummies help with noise from other seasonal factors such as holidays. We agreed. Best practice, in our view, would be to include time dummies to the granularity permitted by the data to reduce noise, and to include the daily historic average to reduce the seasonal confound of weather variation.


Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.


Footnotes.

  1. Uri created the list by starting with the most well-cited observational weather paper he knew – Hirshlifer & Shumway – and then selected papers citing it in the Web-of-science and published in journals he recognized. [↩]
  2. Another is to use daily dummies. This option can easily be worse. It can lower statistical power by throwing away data. First, one can only apply daily fixed effects to data with at least two observations per calendar date. Second, this approach ignores historical weather data that precedes the dependent variable. For example, if using sales data from 2013-2015 in the analyses, the daily fixed effects force us to ignore weather data from any prior year. Lastly, it 'costs' 365 degrees-of-freedom (don't forget leap year), instead of 1. [↩]
  3. Uri has two weather papers. They both use this approach to account for seasonality. [↩]

Get Colada email alerts.

Join 2,708 other subscribers

Your hosts

Uri Simonsohn (.htm)
Joe Simmons (.htm)
Leif Nelson (.htm)

Other Posts on Similar Topics

About Research Design, Unexpectedly Difficult Statistical Concepts
  • [79] Experimentation Aversion: Reconciling the Evidence
  • [71] The (Surprising?) Shape of the File Drawer
  • [70] How Many Studies Have Not Been Run? Why We Still Think the Average Effect Does Not Exist
  • [57] Interactions in Logit Regressions: Why Positive May Mean Negative
  • [51] Greg vs. Jamal: Why Didn't Bertrand and Mullainathan (2004) Replicate?
  • [50] Teenagers in Bikinis: Interpreting Police-Shooting Data
  • [46] Controlling the Weather
  • [42] Accepting the Null: Where to Draw the Line?
  • [41] Falsely Reassuring: Analyses of ALL p-values
  • [39] Power Naps: When do Within-Subject Comparisons Help vs Hurt (yes, hurt) Power?

tweeter & facebook

We tweet new posts: @DataColada
And link to them on our Facebook page

search

All posts

  • [81] Data Replicada
  • [80] Interaction Effects Need Interaction Controls
  • [79] Experimentation Aversion: Reconciling the Evidence
  • [78c] Bayes Factors in Ten Recent Psych Science Papers
  • [78b] Hyp-Chart, the Missing Link Between P-values and Bayes Factors
  • [78a] If you think p-values are problematic, wait until you understand Bayes Factors
  • [77] Number-Bunching: A New Tool for Forensic Data Analysis
  • [76] Heterogeneity Is Replicable: Evidence From Maluma, MTurk, and Many Labs
  • [75] Intentionally Biased: People Purposely Don't Ignore Information They "Should" Ignore
  • [74] In Press at Psychological Science: A New 'Nudge' Supported by Implausible Data
  • [73] Don't Trust Internal Meta-Analysis
  • [72] Metacritic Has A (File-Drawer) Problem
  • [71] The (Surprising?) Shape of the File Drawer
  • [70] How Many Studies Have Not Been Run? Why We Still Think the Average Effect Does Not Exist
  • [69] Eight things I do to make my open research more findable and understandable
  • [68] Pilot-Dropping Backfires (So Daryl Bem Probably Did Not Do It)
  • [67] P-curve Handles Heterogeneity Just Fine
  • [66] Outliers: Evaluating A New P-Curve Of Power Poses
  • [65] Spotlight on Science Journalism: The Health Benefits of Volunteering
  • [64] How To Properly Preregister A Study
  • [63] "Many Labs" Overestimated The Importance of Hidden Moderators
  • [62] Two-lines: The First Valid Test of U-Shaped Relationships
  • [61] Why p-curve excludes ps>.05
  • [60] Forthcoming in JPSP: A Non-Diagnostic Audit of Psychological Research
  • [59] PET-PEESE Is Not Like Homeopathy
  • [58] The Funnel Plot is Invalid Because of This Crazy Assumption: r(n,d)=0
  • [57] Interactions in Logit Regressions: Why Positive May Mean Negative
  • [56] TWARKing: Test-Weighting After Results are Known
  • [55] The file-drawer problem is unfixable, and that's OK
  • [54] The 90x75x50 heuristic: Noisy & Wasteful Sample Sizes In The "Social Science Replication Project"
  • [53] What I Want Our Field To Prioritize
  • [52] Menschplaining: Three Ideas for Civil Criticism
  • [51] Greg vs. Jamal: Why Didn't Bertrand and Mullainathan (2004) Replicate?
  • [50] Teenagers in Bikinis: Interpreting Police-Shooting Data
  • [49] P-Curve Won't Do Your Laundry, But Will Identify Replicable Findings
  • [48] P-hacked Hypotheses Are Deceivingly Robust
  • [47] Evaluating Replications: 40% Full ≠ 60% Empty
  • [46] Controlling the Weather
  • [45] Ambitious P-Hacking and P-Curve 4.0
  • [44] AsPredicted: Pre-registration Made Easy
  • [43] Rain & Happiness: Why Didn't Schwarz & Clore (1983) 'Replicate' ?
  • [42] Accepting the Null: Where to Draw the Line?
  • [41] Falsely Reassuring: Analyses of ALL p-values
  • [40] Reducing Fraud in Science
  • [39] Power Naps: When do Within-Subject Comparisons Help vs Hurt (yes, hurt) Power?
  • [38] A Better Explanation Of The Endowment Effect
  • [37] Power Posing: Reassessing The Evidence Behind The Most Popular TED Talk
  • [36] How to Study Discrimination (or Anything) With Names; If You Must
  • [35] The Default Bayesian Test is Prejudiced Against Small Effects
  • [34] My Links Will Outlive You
  • [33] "The" Effect Size Does Not Exist
  • [32] Spotify Has Trouble With A Marketing Research Exam
  • [31] Women are taller than men: Misusing Occam's Razor to lobotomize discussions of alternative explanations
  • [30] Trim-and-Fill is Full of It (bias)
  • [29] Help! Someone Thinks I p-hacked
  • [28] Confidence Intervals Don't Change How We Think about Data
  • [27] Thirty-somethings are Shrinking and Other U-Shaped Challenges
  • [26] What If Games Were Shorter?
  • [25] Maybe people actually enjoy being alone with their thoughts
  • [24] P-curve vs. Excessive Significance Test
  • [23] Ceiling Effects and Replications
  • [22] You know what's on our shopping list
  • [21] Fake-Data Colada: Excessive Linearity
  • [20] We cannot afford to study effect size in the lab
  • [19] Fake Data: Mendel vs. Stapel
  • [18] MTurk vs. The Lab: Either Way We Need Big Samples
  • [17] No-way Interactions
  • [16] People Take Baths In Hotel Rooms
  • [15] Citing Prospect Theory
  • [14] How To Win A Football Prediction Contest: Ignore Your Gut
  • [13] Posterior-Hacking
  • [12] Preregistration: Not just for the Empiro-zealots
  • [11] "Exactly": The Most Famous Framing Effect Is Robust To Precise Wording
  • [10] Reviewers are asking for it
  • [9] Titleogy: Some facts about titles
  • [8] Adventures in the Assessment of Animal Speed and Morality
  • [7] Forthcoming in the American Economic Review: A Misdiagnosed Failure-to-Replicate
  • [6] Samples Can't Be Too Large
  • [5] The Consistency of Random Numbers
  • [4] The Folly of Powering Replications Based on Observed Effect Size
  • [3] A New Way To Increase Charitable Donations: Does It Replicate?
  • [2] Using Personal Listening Habits to Identify Personal Music Preferences
  • [1] "Just Posting It" works, leads to new retraction in Psychology

Pages

  • About
  • Drop That Bayes: A Colada Series on Bayes Factors
  • Policy on Soliciting Feedback From Authors
  • Table of Contents

Get email alerts

Data Colada - All Content Licensed: CC-BY [Creative Commons]