Tuesday, May 10, 2016

2016 Exit Polls and Early Voting

In recent months, controversy has erupted over exit poll discrepancies in the Democratic primaries. Hillary Clinton consistently performs better in the official results than the exit polls. This trend is visible in almost every primary state, with discrepancies ranging from minor to significant. Clinton’s consistent overperformance of her exit poll results has stoked theories that the Democratic primaries are rigged to her advantage.

To many political observers, the theory seems ridiculous. Joshua Holland wasted no time in calling it out, as did the Washington Post. The media consensus is that exit polls are a crude tool, helpful for forecasting elections, but nothing to take too seriously. On the other hand, those in the election integrity field view exit poll discrepancies as indicators of a fraudulent election. A well-done exit poll is an accurate count of how people voted, one that can be checked against official vote counts to test for fraud.

Richard Charnin is the source of allegations for the 2016 Democratic primaries. He’s been statistically analyzing exit polls for over a decade. By his reckoning, exit polls represent a much purer picture of how people actually voted than the official counts do. On May 5, after the Indiana primaries, he published a summary of purported fraud up to that point. Charnin's calculations find a near-impossibility that so many exit polls would miss as they have.

His calculations are sound under the assumption that exit polls are valid. Of course, this isn’t necessarily true, and as mentioned, few in the media believe it. Nate Silver of FiveThirtyEight wrote an article way back in 2008 about why exit polls should be ignored. But no pundit, not even Nate Silver, should simply be believed without looking at their claims.

If his reasons explain why the discrepancies consistently benefit Clinton, the theories about exit polls can easily be debunked. One of the more interesting points Nate Silver brings up is counting early voting incorrectly. Clinton has a known advantage with early voting, and exit polls mainly reflect the results of election day. Perhaps exit polls are correct for election day, but not everyone voted that day. Failing to account for the early vote is what results in those discrepancies.

The reasoning

The reasoning Richard Charnin assumes that exit polls represent the population of those who voted. Since exit polls are done on election day, early and absentee voting isn't counted. This might explain the discrepancies. Let's suppose, for the sake of argument, that the exit polls do match official results on election day; what margin of the early vote would the candidates need to meet the complete official results? If the margins are reasonable, that shows that the early voting explanation is likely.

In reality, assuming no early voting is too simplistic of a model. As Nate Silver points out, exit polls do sample early voters over the telephone, and merge that into their election day results. But they may incorrectly assume how much needs to be merged. So we’ll also need to look at the possibility that the exit pollsters misestimated how much of the total vote was early voting.

Ultimately, we’ll be making three assumptions for our calculations:
  1. Official election results correctly reflect the total votes that candidates received
  2. Polling place interviews done on election day accurately reflect the election day vote
  3. Telephone surveys of early/absentee voters accurately reflect that population
If we end up with unreasonable early/absentee margins, one or more of these assumptions is wrong.

Getting the data

(Note: This post is still a work in progress, since I lack the data for many states)

We need to know the official vote shares, exit poll results, and the amount of early voting. Since primary results get reported live online, through news outlets, Secretary of State offices, and other aggregate sources, we know the total vote counts. The exit poll results come from Richard Charnin's site. We just need the early/absentee counts and we're set. All states have some form of early and/or absentee voting, and we're focusing only the primary states. So where can we get this data?

0% of the early vote counted

Let’s start by assuming no early/absentee voting is counted in the exit polls. The basic idea to verify this is simple:
  • Calculate the number of early votes and election day votes
  • Proportion the election day votes by exit poll results
  • Determine what percentage of early votes would be needed to meet official results
  • See if that's reasonable, using known early vote counts and common sense
One we have all the data, we can do a bit of math. Here are the variables we're working with:
  • EPC: Clinton's share in exit polls
  • EPS: Sanders' share in exit polls
  • OC: Clinton's share in official results
  • OS: Sanders' share in official results
  • VT: Total number of votes cast
  • VE: Number of early/absentee votes cast
We assume the election day votes match the exit polls, and see what the early vote needs to be for the total count to match official results. Here are the equations we get:
  • OC * VT = (EPC * (VT - VE)) + (x * VE)
  • OS * VT = (EPS * (VT - VE)) + (y * VE)
We can solve for x and y to get x% Clinton early votes and y% Sanders early votes. Here are the results of those numbers, calculated from my spreadsheet:
State
Calculated Clinton EV share
Calculated Sanders EV share
Official Clinton EV share
Official Sanders EV share
AR
68.56%
23.98%
69.31%
27.17%
CT
96.58%
21.47%
FL
65.12%
30.86%
GA
95.73%
7.15%
77.91%
21.55%
IN
61.55%
38.45%
MD
59.07%
34.86%
66.37%
30.52%
MI
57.79%
39.90%
NC
56.52%
38.47%
OH
84.09%
16.32%
OK
-25.56%
64.79%
49.69%
44.39%
SC
887.75%
-881.05%
TN
75.89%
26.67%
TX
66.56%
31.81%
When it comes to how plausible they are, it's a mixed bag. Clinton is known to excel at early voting, so margins of 60%-40% or 70%-30% aren't unreasonable. However, some more questionable things pop up as well. For Clinton to make up exit poll margins with early voting:
  • She'd need over 96% of the CT early vote, which is quite high
  • She'd need about 96% of the GA early vote; quite high, and the official results say she only got 78%
  • She'd need about 84% of the OH early vote, which is quite high
  • She'd need 887.75% of the SC early vote (this is obviously ludicrous)
Maryland and Oklahoma are two strange cases, since my model underestimates the early vote she'd get. In Oklahoma, she would need negative absentee votes to meet the margins, meaning that EPC * (VT - VE) is greater than OC * VT. In Maryland, her calculated early vote is lower than the actual one by about 7 points. I'm not entirely sure what that means, but it may indicate that exit polls did correctly count early votes (as they try to), and adding more would be unnecessary.

Some of the early vote counted

(Added 2016/6/28)

Assuming the exit polls count 0% of the early vote produces realistic results in some states, and strange results in others. Now let’s look at the more likely possibility that the early vote was counted, but incorrectly. The exit pollsters do telephone interviews to get the early vote share, and then merge it into the election day results by estimating how much of the total vote is early voting. What happens if they overestimate or underestimate it?

Recall our variables from before:
  • EPC: Clinton's share in exit polls
  • EPS: Sanders' share in exit polls
  • OC: Clinton's share in official results
  • OS: Sanders' share in official results
  • VT: Total number of votes cast
  • VE: Number of early/absentee votes cast
Let’s make some new variables to add to our old ones:
  • EVP: How much we over/underestimated (percent of the actual early vote)
  • EVEP: The percentage of exit poll results that come from early voting
  • EDC: Clinton’s exit poll share on election day
  • EDS: Sanders’ exit poll share on election day
Letting x be Clinton’s early vote share and y be Sanders’, we can get their election day shares:
  • EVEP = EVP * (VE / VT)
  • EDC = (EPC - (x * EVEP)) / (1 - EVEP)
  • EDS = (EPS - (y * EVEP)) / (1 - EVEP)
Then we can plug them into the same equations as before:
  • OC * VT = (EDC * (VT - VE)) + (x * VE)
  • OS * VT = (EDS * (VT - VE)) + (y * VE)
The equation is less nice, but we can still solve for x and y. Running the results again, for various values of EVP, we get the below, calculated from my spreadsheet:

[Table coming soon; for now, look at the spreadsheet]

Again, how reasonable the margins are is a mixed bag. Reasonability is subjectively assessed by whether Clinton wins, as is expected, and whether the margins are extreme or impossible:
  • AR margins are close to the official ones if the early vote is underestimated around 25%, or overestimated above 200%
  • CT’s margins are not reasonable at all, either drastically overstating Clinton’s share or making Sanders the early/absentee winner by a landslide
  • FL’s margins are always reasonable
  • GA’s margins are only reasonable at 300% or 400% overestimation, and they still fall short of Clinton’s official share of the early vote
  • IN’s margins are reasonable for underestimations at 50% or lower
  • MD’s official early/absentee margins are met at around 150% overestimation, which is not too unlikely
  • MI’s margins are reasonable for underestimations at 50% or lower
  • NC’s margins are almost always within reason
  • OH’s margins are never reasonable
  • OK’s margins are only reasonable at 300% or 400% overestimation, and they still overstate Clinton’s share by a lot
  • SC’s margins are never reasonable
  • TN’s margins are reasonable at underestimations at or below 25%, and overestimations at 200%
  • TX’s margins are always reasonable
So FL, NC, and TX are totally consistent with respect to exit polls and early voting. CT, GA, OH, and SC are completely inconsistent. AR, IN, MD, MI, and TN are sometimes reasonable. OK, one of the only states where Clinton did worse than the exit polls predicted, still fails to make sense in this model.

Conclusions

Recall our three assumptions that we used in calculating early/absentee margins:
  1. Official election results correctly reflect the total votes that candidates received
  2. Polling place interviews done on election day accurately reflect the election day vote
  3. Telephone surveys of early/absentee voters accurately reflect that population
FL, NC, and TX had reasonable margins for a wide array of misestimations. CT, GA, OH, and SC completely flunk this test, never providing reasonable margins, as does OK. AR, IN, MD, MI, and TN could go either way; they’re reasonable for a smaller subset of misestimations.

Since the early voting hypothesis does work for some states, perhaps some of these exit poll discrepancies have a legitimate justification that isn’t fraud. Clinton’s success with the early vote can make up for her doing worse on election day. On the other hand, several states (CT, GA, OH, SC, and OK) require impossible early/absentee margins. And other states’ margins may or may not be reasonable.

In states whose margins don’t make sense, one or more of the above assumptions was wrong. If #1 and #2 are correct, but #3 isn’t, the calculations done here simply uncovered bad phone interviews. But these margins are often so unreasonable (exceeding 100% or being negative) that they couldn’t have shown up in the telephone polls. So #3 can’t be the only failed assumption: #1 and/or #2 must be invalid.

That leaves us with two possibilities: exit polls done on election day are wrong, or the official results are. Both possibilities are plausible, and the rest of this series will examine them.

What this shows is that exit poll discrepancies can't simply be dismissed based on early voting. Sometimes, that can explain it, but it doesn't always. Regardless, the discrepancies do deserve to be looked at more critically. Not every single discrepancy has to point to fraud.

Other posts on election fraud