Tuesday, May 10, 2016

2016 Exit Polls and Early Voting

In recent months, controversy has erupted over exit poll discrepancies in the Democratic primaries. Hillary Clinton consistently performs better in the official results than the exit polls. This trend is visible in almost every primary state, with discrepancies ranging from minor to significant. Clinton’s consistent overperformance of her exit poll results has stoked theories that the Democratic primaries are rigged to her advantage.

To many political observers, the theory seems ridiculous. Joshua Holland wasted no time in calling it out, as did the Washington Post. The media consensus is that exit polls are a crude tool, helpful for forecasting elections, but nothing to take too seriously. On the other hand, those in the election integrity field view exit poll discrepancies as indicators of a fraudulent election. A well-done exit poll is an accurate count of how people voted, one that can be checked against official vote counts to test for fraud.

Richard Charnin is the source of allegations for the 2016 Democratic primaries. He’s been statistically analyzing exit polls for over a decade. By his reckoning, exit polls represent a much purer picture of how people actually voted than the official counts do. On May 5, after the Indiana primaries, he published a summary of purported fraud up to that point. Charnin's calculations find a near-impossibility that so many exit polls would miss as they have.

His calculations are sound under the assumption that exit polls are valid. Of course, this isn’t necessarily true, and as mentioned, few in the media believe it. Nate Silver of FiveThirtyEight wrote an article way back in 2008 about why exit polls should be ignored. But no pundit, not even Nate Silver, should simply be believed without looking at their claims.

If his reasons explain why the discrepancies consistently benefit Clinton, the theories about exit polls can easily be debunked. One of the more interesting points Nate Silver brings up is counting early voting incorrectly. Clinton has a known advantage with early voting, and exit polls mainly reflect the results of election day. Perhaps exit polls are correct for election day, but not everyone voted that day. Failing to account for the early vote is what results in those discrepancies.

The reasoning

The reasoning Richard Charnin assumes that exit polls represent the population of those who voted. Since exit polls are done on election day, early and absentee voting isn't counted. This might explain the discrepancies. Let's suppose, for the sake of argument, that the exit polls do match official results on election day; what margin of the early vote would the candidates need to meet the complete official results? If the margins are reasonable, that shows that the early voting explanation is likely.

In reality, assuming no early voting is too simplistic of a model. As Nate Silver points out, exit polls do sample early voters over the telephone, and merge that into their election day results. But they may incorrectly assume how much needs to be merged. So we’ll also need to look at the possibility that the exit pollsters misestimated how much of the total vote was early voting.

Ultimately, we’ll be making three assumptions for our calculations:
  1. Official election results correctly reflect the total votes that candidates received
  2. Polling place interviews done on election day accurately reflect the election day vote
  3. Telephone surveys of early/absentee voters accurately reflect that population
If we end up with unreasonable early/absentee margins, one or more of these assumptions is wrong.

Getting the data

(Note: This post is still a work in progress, since I lack the data for many states)

We need to know the official vote shares, exit poll results, and the amount of early voting. Since primary results get reported live online, through news outlets, Secretary of State offices, and other aggregate sources, we know the total vote counts. The exit poll results come from Richard Charnin's site. We just need the early/absentee counts and we're set. All states have some form of early and/or absentee voting, and we're focusing only the primary states. So where can we get this data?

0% of the early vote counted

Let’s start by assuming no early/absentee voting is counted in the exit polls. The basic idea to verify this is simple:
  • Calculate the number of early votes and election day votes
  • Proportion the election day votes by exit poll results
  • Determine what percentage of early votes would be needed to meet official results
  • See if that's reasonable, using known early vote counts and common sense
One we have all the data, we can do a bit of math. Here are the variables we're working with:
  • EPC: Clinton's share in exit polls
  • EPS: Sanders' share in exit polls
  • OC: Clinton's share in official results
  • OS: Sanders' share in official results
  • VT: Total number of votes cast
  • VE: Number of early/absentee votes cast
We assume the election day votes match the exit polls, and see what the early vote needs to be for the total count to match official results. Here are the equations we get:
  • OC * VT = (EPC * (VT - VE)) + (x * VE)
  • OS * VT = (EPS * (VT - VE)) + (y * VE)
We can solve for x and y to get x% Clinton early votes and y% Sanders early votes. Here are the results of those numbers, calculated from my spreadsheet:
State
Calculated Clinton EV share
Calculated Sanders EV share
Official Clinton EV share
Official Sanders EV share
AR
68.56%
23.98%
69.31%
27.17%
CT
96.58%
21.47%
FL
65.12%
30.86%
GA
95.73%
7.15%
77.91%
21.55%
IN
61.55%
38.45%
MD
59.07%
34.86%
66.37%
30.52%
MI
57.79%
39.90%
NC
56.52%
38.47%
OH
84.09%
16.32%
OK
-25.56%
64.79%
49.69%
44.39%
SC
887.75%
-881.05%
TN
75.89%
26.67%
TX
66.56%
31.81%
When it comes to how plausible they are, it's a mixed bag. Clinton is known to excel at early voting, so margins of 60%-40% or 70%-30% aren't unreasonable. However, some more questionable things pop up as well. For Clinton to make up exit poll margins with early voting:
  • She'd need over 96% of the CT early vote, which is quite high
  • She'd need about 96% of the GA early vote; quite high, and the official results say she only got 78%
  • She'd need about 84% of the OH early vote, which is quite high
  • She'd need 887.75% of the SC early vote (this is obviously ludicrous)
Maryland and Oklahoma are two strange cases, since my model underestimates the early vote she'd get. In Oklahoma, she would need negative absentee votes to meet the margins, meaning that EPC * (VT - VE) is greater than OC * VT. In Maryland, her calculated early vote is lower than the actual one by about 7 points. I'm not entirely sure what that means, but it may indicate that exit polls did correctly count early votes (as they try to), and adding more would be unnecessary.

Some of the early vote counted

(Added 2016/6/28)

Assuming the exit polls count 0% of the early vote produces realistic results in some states, and strange results in others. Now let’s look at the more likely possibility that the early vote was counted, but incorrectly. The exit pollsters do telephone interviews to get the early vote share, and then merge it into the election day results by estimating how much of the total vote is early voting. What happens if they overestimate or underestimate it?

Recall our variables from before:
  • EPC: Clinton's share in exit polls
  • EPS: Sanders' share in exit polls
  • OC: Clinton's share in official results
  • OS: Sanders' share in official results
  • VT: Total number of votes cast
  • VE: Number of early/absentee votes cast
Let’s make some new variables to add to our old ones:
  • EVP: How much we over/underestimated (percent of the actual early vote)
  • EVEP: The percentage of exit poll results that come from early voting
  • EDC: Clinton’s exit poll share on election day
  • EDS: Sanders’ exit poll share on election day
Letting x be Clinton’s early vote share and y be Sanders’, we can get their election day shares:
  • EVEP = EVP * (VE / VT)
  • EDC = (EPC - (x * EVEP)) / (1 - EVEP)
  • EDS = (EPS - (y * EVEP)) / (1 - EVEP)
Then we can plug them into the same equations as before:
  • OC * VT = (EDC * (VT - VE)) + (x * VE)
  • OS * VT = (EDS * (VT - VE)) + (y * VE)
The equation is less nice, but we can still solve for x and y. Running the results again, for various values of EVP, we get the below, calculated from my spreadsheet:

[Table coming soon; for now, look at the spreadsheet]

Again, how reasonable the margins are is a mixed bag. Reasonability is subjectively assessed by whether Clinton wins, as is expected, and whether the margins are extreme or impossible:
  • AR margins are close to the official ones if the early vote is underestimated around 25%, or overestimated above 200%
  • CT’s margins are not reasonable at all, either drastically overstating Clinton’s share or making Sanders the early/absentee winner by a landslide
  • FL’s margins are always reasonable
  • GA’s margins are only reasonable at 300% or 400% overestimation, and they still fall short of Clinton’s official share of the early vote
  • IN’s margins are reasonable for underestimations at 50% or lower
  • MD’s official early/absentee margins are met at around 150% overestimation, which is not too unlikely
  • MI’s margins are reasonable for underestimations at 50% or lower
  • NC’s margins are almost always within reason
  • OH’s margins are never reasonable
  • OK’s margins are only reasonable at 300% or 400% overestimation, and they still overstate Clinton’s share by a lot
  • SC’s margins are never reasonable
  • TN’s margins are reasonable at underestimations at or below 25%, and overestimations at 200%
  • TX’s margins are always reasonable
So FL, NC, and TX are totally consistent with respect to exit polls and early voting. CT, GA, OH, and SC are completely inconsistent. AR, IN, MD, MI, and TN are sometimes reasonable. OK, one of the only states where Clinton did worse than the exit polls predicted, still fails to make sense in this model.

Conclusions

Recall our three assumptions that we used in calculating early/absentee margins:
  1. Official election results correctly reflect the total votes that candidates received
  2. Polling place interviews done on election day accurately reflect the election day vote
  3. Telephone surveys of early/absentee voters accurately reflect that population
FL, NC, and TX had reasonable margins for a wide array of misestimations. CT, GA, OH, and SC completely flunk this test, never providing reasonable margins, as does OK. AR, IN, MD, MI, and TN could go either way; they’re reasonable for a smaller subset of misestimations.

Since the early voting hypothesis does work for some states, perhaps some of these exit poll discrepancies have a legitimate justification that isn’t fraud. Clinton’s success with the early vote can make up for her doing worse on election day. On the other hand, several states (CT, GA, OH, SC, and OK) require impossible early/absentee margins. And other states’ margins may or may not be reasonable.

In states whose margins don’t make sense, one or more of the above assumptions was wrong. If #1 and #2 are correct, but #3 isn’t, the calculations done here simply uncovered bad phone interviews. But these margins are often so unreasonable (exceeding 100% or being negative) that they couldn’t have shown up in the telephone polls. So #3 can’t be the only failed assumption: #1 and/or #2 must be invalid.

That leaves us with two possibilities: exit polls done on election day are wrong, or the official results are. Both possibilities are plausible, and the rest of this series will examine them.

What this shows is that exit poll discrepancies can't simply be dismissed based on early voting. Sometimes, that can explain it, but it doesn't always. Regardless, the discrepancies do deserve to be looked at more critically. Not every single discrepancy has to point to fraud.

Other posts on election fraud

8 Comments:

At May 12, 2016 at 10:44 AM , Blogger Unknown said...

This is a fascinating deep dive into these poll and vote results. I am impressed by your application of intellectual rigour to this problem. It does seem plausible that some of the discrepancy in results may be attributable to early and absentee voting. As you note in your conclusion, this analysis is predicated on an assumption of exit poll reliability, which seems to be a particularly shaky foundation to build upon.

Richard Charnin's analysis that you linked to points to the "adjustment" of exit poll data as the key evidence of a conspiracy to mislead the public and mask widespread voter fraud. Adjusting exit poll data, more often called "weighting," is statistically valid if done correctly and a method used virtually universally to reduce error in polling (http://www.applied-survey-methods.com/weight.html). I have not dug into the methodology used in the polls referred to in this post, but it is quite likely that their methods are published and made publicly available somewhere.

I'm curious to know if you have read this piece from the NYTimes (http://www.nytimes.com/2014/11/05/upshot/exit-polls-why-they-so-often-mislead.html). If so, do the findings from your mathematical analysis or other research still lead you to suspect widespread voter fraud in these primaries?

 
At May 12, 2016 at 9:55 PM , Blogger Marionumber1 said...

I still suspect election fraud for several reasons, which a future post will address:
1) Exit poll discrepancies have historically been linked to strange election circumstances (see 2004 Ukraine, 2004 here, the 2008 NH Dem primary, etc.)
2) Even if the polls are unreliable, the irregularities should be random, not biased, and they're often not
3) Our voting machines are scarily vulnerable to hacking
4) The failure to count early votes can't explain every election, based on the analysis in this post

Based on an interview with Joe Lenski (https://www.washingtonpost.com/news/the-fix/wp/2016/04/22/how-exit-polls-work-explained/), head of the company doing exit polling, there are two types of adjustments made to exit polls. One is done as data comes in throughout the day, accounting for non-responses and known turnout/demographic info. The other is forcing the polls to meet the election results once they come out. So it's wrong for the NYTimes article to say they're completely unweighted.

Richard Charnin doesn't criticize the first type of adjustment, only the second. Forcing the exit polls to match the official results is only valid if the official results are right, and this isn't an infallible assumption. While I wouldn't say there's a media conspiracy going on, it seems strange that they wouldn't even consider that official results aren't necessarily correct.

Whether exit polls are valid is, of course, disputable. But more important than validity is statistical consistency. When the exit polls fail to match the official results in a way that almost always benefits a certain candidate/party, that indicates some other bias in the results. This bias could be election fraud, or it could be something like early voting.

In the 2016 Dem primary, that bias usually favors Clinton. This article shows that while early voting can sometimes account for that bias, it doesn't always work. Granted, assuming the exit polls are correct was needed to figure this out, but it's still useful in showing how the Clinton bias could occur. Some future work would be sensitivity analysis: shifting the exit polls to see how much it changes these results. I also noticed that your NYTimes article said exit polls tried to sample early voters. That also makes this simplistic analysis a little less valid.

 
At June 2, 2016 at 2:16 AM , Anonymous Anonymous said...

The "unadjusted" exit polls actually already have been weighted. Edison Research, the polling firm that conducts exit polls, has trained poll workers who track non-response rates and record demographic information that they can guess visually about the non-respondents. That demographic information is used to weight the exit poll results as they are being reported. The final adjustment is done once the outcome they are trying to estimate, the final vote tallies, is known. They basically assume the demographic relationships are the same but that the proportions that went for the actual victor are increased until they match the outcome.

 
At June 2, 2016 at 2:18 AM , Blogger joft said...

This comment has been removed by the author.

 
At June 2, 2016 at 2:21 AM , Anonymous Anonymous said...

Also, this is a great post. I wanted to suggest a follow-up idea. Instead of asking what % of early votes Clinton must win to make up for the full discrepancy, ask how much is needed to bring it within the margin of error.

 
At June 2, 2016 at 2:26 AM , Anonymous Anonymous said...

(Sorry for spamming so many comments in a row)

I'm certain that bringing it within the margin of error will end up seeming reasonable. But that doesn't fully fix the problem, it would still be strange for so many of the discrepancies to be close to the MOE in Clinton's favor instead of equally distributed around 0. I think there would still be strong statistical evidence of a bias.

 
At June 2, 2016 at 6:50 AM , Blogger Marionumber1 said...

Well, the purpose of this was to assume that the exit polls were right for election day, and find the early vote percentages needed to match the official results. There is no margin of error for the official results; they're just a single value being compared to, so bringing a value within the margin of error of those results doesn't make sense.

 
At June 2, 2016 at 11:54 AM , Blogger Marionumber1 said...

Yeah, I know how the process works. When most people mention unadjusted exit polls, they refer to ones with the demographic adjustments done, but from before they're forced to meet official results. Forcing them to meet official results simply masks any fraud that occurred, and the method to do so (which you also bring up) seems pretty non-rigorous.

 

Post a Comment

Subscribe to Post Comments [Atom]

<< Home