Thursday, April 10, 2008

How Well Do Early Polls Predict Election Outcomes?

SurveyUSA created quite a stir last month when they released the results of their fifty-state general election trial-heat survey. Over at Charles Franklin and Mark Blumenthal provided a nice breakdown of the results, categorizing states "strong" or "leaning" for one or the other candidates, or as a "toss-up," and concluded that these early results gave a slight advantage to both Obama and Clinton in their match ups with McCain. Bob Erikson and Karl Sigman added to the mix with their teched-up, simulation-based analysis of the same data, reaching a very similar conclusion.

This is all great fun, and poll junkies (I include myself) love to see this type of analysis. But all of this assumes that early polls are good indicators of what will eventually happen on election day. Are they?

Jay DeSart and I have done some work on using state polls to predict presidential election outcomes, and it's quite clear that polls taken in September are good predictors of the state outcomes in November. But how well do statewide polls from spring of the election year predict the eventual outcomes? I thought it might be useful to look at data from 2004 to provide some sense of how much stock we should put in early polls.

In the figures below, I use statewide presidential trial-heat polls from March through June (polls averaged by state and month) to calculate John Kerry's expected percent of the two-party vote and then plot it against the actual two-party vote for Kerry in the November election. Note that there were no spring polls in many states in 2004 (no Survey USA fifty-state poll, for instance) so none of the plots include all 50 states.

Let's start with March, since this is closest to the timing of the2008 Survey USA poll (late February).

As expected, there is a strong, positive relationship between March polls and November votes in the states that had polling results. But I wouldn't exactly describe the data points as tightly clustered, and the point estimates called the wrong winner in 4 (Michigan, Wisconsin, New Hampshire, and Pennsylvania) of the 11 states in which one candidate held a polling advantage (two other states, Ohio and West Virgina, were tied). It is worth noting that in each of these misfires both the polling margin and the eventual margin of victory were fairly narrow.

Results for April (N=21), May(N=23), and June (N=21) are posted below.

Two take-away points here. First, the correlation between statewide polls and the eventual election outcome grew stronger as the 2004 campaign progressed. Obvious enough, I suppose.

Second, when the polling margin was fairly narrow the outcome was truly up in the air. In fact, across all four months the poll result called the wrong winner in 17 of the 36 cases in which Kerry's share of the two-party vote in trial-heat polls was between 47% and 53% (this excludes two case in which the poll result was tied). These results suggest that we should take the term "toss-up" very seriously. At the same time, the poll result was wrong in only 3 of the 44 cases in which Kerry's poll margin was outside this range.

So what are the implications of this for how we should view the early Survey USA results for 2008? Assuming the data from 2004 provide a reasonable basis for speculating, I expect most of the "strong" (as categorized by Franklin and Blumenthal) states to stay in their candidate's camp. But I would not assign "toss-up" or "leaning" states to either candidate with much confidence. One exception to this would be states such as South Carolina (lean McCain) and Massachusetts (lean Obama), whose partisan histories argue in favor of greater confidence.

Update: Per Mark Blumenthal's suggestion, here are the graphs again, except with the same y-axis. Visually, this seems to make the most difference in the impression given by the March data.


Yazi said...

If you're aware of how do you not mention it in this post? If you're not aware of it, how is that possible?

Tom said...

Wasn't aware; am now.

Anonymous said...

I heard someone comment on Charlie Rose the other night (can't remember his name) that the polls are not picking up millions of young, enthusiastic, newly registered voters for the Dems because they call historically "likely voters." As a poll junkie, do you think it is possible that this factor could surprise everyone (now that McCain appears to be leading :( in the all important "september polls"?