Sunday, September 18, 2016

The Clinton Shift

We started this blog series with the controversy over Democratic primary exit poll discrepancies. The exit polls have consistently failed to match the official results, and Hillary Clinton is nearly always the beneficiary of this shift.

Mathematical analyses, most notably Richard Charnin's, find this trend statistically impossible. But this analysis is based on the belief that the exit polls are accurate, something most seasoned political analysts dispute. Clearly, though, we end up with two possibilities: a bias against Clinton in the exit polls, or an odd shift towards Clinton in the official results.

Let's recap what we've covered in the last three parts. Part 1 looked at a possible reason that the exit polls would understate Clinton's margin: miscounting early/absentee voting. For some states, this hypothesis worked, but in others, it didn't. Part 2 explained how exit polling worked and analyzed past discrepancies. There's a pattern of exit poll discrepancies pointing to suspicious elections. Part 3 looked more into our electronic voting landscape, explaining why it's inherently untrustworthy.

Based on this, two things are clear. First, there are two possible causes of the "Clinton shift": an anti-Clinton polling bias that isn't early/absentee voting, or official results that are weirdly skewed towards her. Second, the history of exit polling misses and the inherent flaws in our election system make incorrect official results a legitimate possibility. Both potential causes are worth considering, without dismissing one of them as "conspiracy theory" territory.

A review of exit polls

It's worth recapping how exit polls are done in the US. The NEP, a consortium of major media outlets, hires Edison Research to conduct the poll. A series of sample precincts, meant to represent the state, are chosen, and pollsters go out to interview voters. The interviews are done in the morning, afternoon, and right before poll closing.

As the data is compiled, they weight it by demographics and turnout. Early/absentee voters are also sampled by telephone, and likely weighted similarly (the process is unclear). It's merged with the election day poll by estimating how much of the total vote is early/absentee voting.

By the time polls close, most or all of the data should be in. This exit poll, called an unadjusted exit poll, is made available to the NEP media outlets. The media uses it to add to their election night coverage, letting them forecast winners and discuss voting trends. Some media outlets may make it public: CNN, for instance, posts it on their website. As votes comes in, Edison forces the exit poll to meet the official results, by shuffling around respondents until the poll matches. This is called the adjusted exit poll.

(Note: The word "unadjusted" might imply that the poll is just raw survey data, without any weighting. But an unadjusted exit poll has been weighted by demographics and voter turnout. It only lacks the final adjustment to force it to the official results.)

The unadjusted exit poll is what Charnin and others have compared to the official results. Since they're using it to assess the integrity of the results, this makes sense. Unadjusted exit polls are an alternate count of voter preferences, something to check those official results against. Forcibly adjusting the poll ruins that quality. So let's take a look at the unadjusted exit poll data for the Democratic primaries.

Democratic primary exit polls

CNN posts the unadjusted exit poll on their website right around poll closing. But as Edison adjusts it, the poll on CNN's site also changes. In New York, for example, CNN's exit poll showed Bernie Sanders behind by 4% at 9:00 PM, but the margin grew to 12% by 9:45 PM. The best way to get the unadjusted exit polls is to capture them from CNN's website as soon as the polls close.

Theodore de Macedo Soares, an election integrity researcher, has done this for the 2016 primaries. His compiled data is what other analysts have used to compare the exit polls and official results. It covers all the primaries from New Hampshire to West Virginia. After that point, the media cancelled the exit polls in the remaining states. His table, comparing exit polls and official results, is below:


This data shows a strong shift towards Clinton. Clinton's official lead over Sanders has nearly always been higher than what the exit polls projected. The few times the exit polls missed in Sanders' favor, the discrepancy was small, while many misses favoring Clinton were extreme (as high as 14% in Alabama). We can analyze the probability of these trends like how we analyzed the Republican red shift.

Exit polls are not perfect, and they're subject to some sampling error. But the error should be random, affecting all candidates evenly. In the case of the Democratic primary, Clinton and Sanders should each benefit from those errors about 50% of the time. But this is decidedly not the case.

Out of the 25 states above, the polls missed in Clinton's favor 21 times and Sanders' favor 4 times. This is like flipping a coin 25 times and getting heads 21 times. The probability of that is BINOMDIST(21, 25, 0.5, FALSE) = 0.00038, less than 1 in 2500.

We can also look at how extreme the misses are. Statistical samples, including exit polls, have a margin of error (MoE) specifying how much the true value can deviate from the reported value. There is a 95% chance that the true value is within MoE of the reported value. Conversely, there's a 5% chance that the true result is outside the MoE.

Theodore de Macedo Soares has specified an MoE on the margin between Clinton and Sanders, called the TSE (total survey error) in the above table. He calculated the MoE for the difference between two candidates using an accepted method, and then increased it by 32%. The 32% is a design effect, meant to account for the increased error exit polls have. Exit polls aren't a pure random sample - they pick a set of precincts and sample within those - so extra error can be introduced.

Nate Silver also mentions the need for a design effect in his article criticizing exit polls, but he puts it higher, at 50-80%. Soares derived the 32% from the GOP exit polls, which have been much more accurate than the Democratic ones. The margin discrepancies are very low, and evenly distributed between Trump and the other candidates. Increasing the standard polling MoE by 32% keeps the GOP results within the TSE 95% of the time. Since Edison did both polls, the same design effect is used for the Democrats. 32% is close to the 30% that Richard Charnin uses, which was suggested in a 1996 review of media exit polls.

In the Democratic primaries, 10 out of 25 states have had discrepancies that exceeded the TSE. This is 40% of the time, far more than 95% confidence in the poll results would imply. And every single shift outside the TSE was in favor of Clinton, when they should have been evenly distributed between Clinton and Sanders. A single discrepancy outside the TSE favoring Clinton has a 2.5% probability. The probability of this occurring 10 times is BINOMDIST(10, 25, 0.025, FALSE) = 2.13 * 10^-10, less than 1 in 4 billion.

It's one thing to have the polls miss - it's quite another to have nearly all of them miss (often outside the MoE) for one candidate. The Clinton shift cannot be caused by random polling error: it's statistically impossible as per the calculations above. Sanders should benefit from exit polling misses about as often as Clinton does, but that's not happening. Either the exit polls are systematically underestimating Clinton's support, or Clinton's official vote count is being inflated for some reason.

Polling error explanations

Media election analysts are quick to point to flaws in the exit polls. Nate Silver has a list of reasons to ignore exit polling, which I've already brought up a couple times. Nate Cohn of the New York Times has written several articles criticizing exit polls, including a recent one about the Democratic primaries. But we're not just looking for reasons why exit polls are bad - we're looking for reasons why they consistently understate Clinton's support. These are their suggested reasons for that:
  • Sampling error (due to precinct clustering or bad randomness)
  • Early/absentee voting may be accounted for badly, or not at all
  • One candidate's voters are more likely to take the poll than another's
  • Young voters are more likely to respond to exit polls
The explanation involving sampling error can be dismissed right away. Sampling error is certainly possible, especially if the precincts polled fail to represent the state. But there's little reason to believe that this would have created a consistent bias against Clinton across 25 states. This is a random statistical error that should have affected both candidates evenly, and the math above shows that the Clinton shift can't be caused by that kind of random error. Furthermore, the TSE accounts for sampling error, but discrepancies in 10 out of 25 states exceeded it.

Other explanations are more compelling. Early/absentee voting is known to favor Clinton, there's been talk of Sanders supporters being more enthusiastic, and Sanders is overwhelmingly supported by young people. It seems possible that one of these three reasons explains the Clinton shift.

Someone at a local New York newspaper also suggested, perhaps jokingly, that Clinton voters were lying about supporting Sanders to seem cooler. There is a known phenomenon, the Bradley effect, regarding people lying to pollsters. But while it may have mattered in 2008 with Obama, a prominent black candidate, there's no reason to think it affects Clinton and Sanders. If anything, one would expect people to be reluctant to say they voted against the first woman president. Any theory about lying to pollsters, though, is virtually impossible to prove or disprove.

Early/absentee voting

The theory that early/absentee voting wasn't counted correctly was looked at in part 1. We assumed the official results were correct, and that the exit polls misestimated the prevalence of early/absentee voting, but were otherwise fine. Then we calculated what early/absentee margins would be required by Clinton and Sanders to make everything match. We did this for various underestimations and overestimations of the early/absentee vote in proportion to the total vote.

Afterwards, we looked at whether the margins made sense. In a few states, we knew the official early/absentee results, so we could compare to those. In most states, we didn't know, but could still check whether the margins were realistic. That gave us a less decisive answer on whether a state did work with this hypothesis, but allowed us to tell if a state didn't.

Here are the states (incomplete), broken down by whether the margins made sense. If they were, the early/absentee hypothesis is considered to work:
  • Yes: Florida, North Carolina, Texas
  • No: Connecticut, Georgia, Ohio, Oklahoma, South Carolina
  • Maybe: Arkansas, Indiana, Maryland, Michigan, Tennessee
For states where the hypothesis worked, the average polling miss in Clinton's favor is 4.8%. Much of this comes from Texas, and without Texas, it's only 2.55%. In states where the hypothesis failed, the average is 5.72%, but excluding Oklahoma (the only miss for Sanders), it's 8.7%. Where the hypothesis may have worked, the average miss is 4.64%.

Excluding a couple outliers, the early/absentee hypothesis is mainly only useful in explaining the small exit polling misses. It could explain Florida and North Carolina well, but these states had very small misses. Georgia, Ohio, and South Carolina had some of the largest discrepancies, and they can't be explained with it. And it's unclear whether the early/absentee hypothesis explains some moderate misses, like Arkansas, Indiana, Michigan, and Tennessee.

Many states weren't tested in this model, due to a lack of available data. But the states that were tested give a clear conclusion: early/absentee misestimation can only explain minimal discrepancies.

Enthusiasm gap

Maybe Sanders appears to do better in the exit polls because his supporters are more enthusiastic, and thus more willing to talk to exit pollsters. Edison proposed a similar theory in 2004 to explain why Kerry did so much better in the exit polls than officially. This theory was disproven by looking at Edison's precinct-level data, comparing the response rates and Kerry support by precinct. We don't have that data for the 2016 primaries, but we can still try to see if the enthusiasm gap idea works.

To start with, is it even true that Sanders supporters are more enthusiastic? Sanders drew larger crowds and outraised Clinton, so it seems like that'd be the case. But several pre-election and post-election telephone polls have implied the opposite is true.

Gallup poll from late March showed Clinton supporters as far more enthusiastic. 54% of Clinton supporters were extremely or very enthusiastic about her, compared to 44% of Sanders supporters. These numbers suggest that the enthusiasm gap is actually to Sanders' detriment, not Clinton's.

The same Gallup poll shows a massive enthusiasm gulf on the GOP side: 65% of Trump supporters, compared to 39% for Cruz and 33% for Kasich. Yet Republican primary exit polls have been far more accurate. Only 2 exit polls missed outside the margin of error, and the discrepancies divide evenly between favoring Trump and favoring others. Even supposing Sanders supporters are more enthusiastic, that doesn't explain why enthusiasm would only affect the Democratic exit polls.

Pre-election polls for Super Tuesday and March 8 gave Clinton an enthusiasm advantage in every state but Vermont. Vermont, where Sanders led slightly in enthusiasm, had one of the smallest exit polling misses, just 1.1%. Meanwhile, Alabama had Clinton up in enthusiasm by 30%, and Georgia had her up by 20%. If an enthusiasm gap mattered, the exit polling misses in those two states would benefit Sanders. But they ended up being the two largest exit polling misses for Clinton, at 14% and 12.2%, respectively.

(Admittedly, these pre-election polls aren't the best to compare to, since they're 2-3 weeks in advance and don't capture who actually turned out to vote. Still, since Vermont, Alabama, and Georgia are very strongly in the camp of either Sanders or Clinton, these numbers are likely to be in the right ballpark.)

An enthusiasm gap skewing the exit polls towards Sanders is a convenient idea, but ends up being completely counterintuitive. It's inconsistent with what we see on the Republican side, where Trump's massive lead in enthusiasm has no effect on whether the exit polls miss. And Clinton has been ahead in enthusiasm by almost every count, yet the exit polls still seem to understate her support.

When Sanders has slight enthusiasm leads, the exit polls barely miss in his favor; when Clinton has massive enthusiasm leads, her vote share is wildly understated. If an enthusiasm gap does affect exit poll response rates, Sanders' support in the exit polls should be significantly lower. The exact opposite has turned out to be true.

Youth overrepresentation

Another promising theory is that young voters are overrepresented in the exit polls. The obvious beneficiary would be Bernie Sanders, who led decisively among youth, even in states he majorly lost. Edison has observed for a long time that young voters are more likely to respond to exit polls. They account for that in their weighting process, by tracking which age groups refuse an exit poll and giving certain ones more weight. But perhaps their process isn't sufficient.

Nate Cohn's article has exit poll response rates by age in 2004. Voters aged 18-29 and 30-59 have similar response rates, with a drop for voters 60 and older. So the youth overrepresentation theory is really about underrepresenting older voters. We can test the theory by adjusting the weight of older voters (65 and over, since that's what we have data for). How high do they need to be weighted for the exits to match the official results, and does it make sense?

Spencer Gundert, who wrote an article about election fraud in April, has saved pictures of the unadjusted exit polls from CNN. Some of them contain age breakdowns, so we can analyze those states. I've done so in this spreadsheet. Let's see what percentage of voters have to be 65 or older for the polls to match, and how much greater that is than what the unadjusted exit polls say:
  • Alabama (14.0% discrepancy): Impossible to make match, as-per Doug Johnson Hatlem
  • Georgia (12.2% discrepancy): 69% of voters (53% change)
  • New York (11.6% discrepancy): 47% of voters (27% change)
  • Ohio (10.0% discrepancy): 40% of voters (18% change)
  • Mississippi (9.9% discrepancy): 67% of voters (47% change)
  • Michigan (4.6% discrepancy): 29% of voters (9% change)
  • Virginia (4.3% discrepancy): 29% of voters (7% change)
  • Illinois (4.1% discrepancy): 30% of voters (8% change)
  • Missouri (3.9% discrepancy): 34% of voters (9% change)
  • Florida (3.4% discrepancy): 33% of voters (9% change)
  • North Carolina (1.7% discrepancy): 24% of voters (4% change)
  • Vermont (1.1% discrepancy): 30% of voters (7% change)
Obviously, this is only a subset of states. But it shows that the youth overrepresentation theory can only work for smaller discrepancies. In the states with discrepancies outside the TSE, the percentage of voters 65+ has to be off by a ludicrous 18% or more. The theory can't explain large discrepancies, and only might explain the discrepancies that are 5% or lower.

Ted Soares, who compiled all the unadjusted exit polls, did regression analyses to test the youth overrepresentation theory. One thing he looked at was the correlation between exit poll discrepancies and Sanders' performance among young voters. If overrepresenting the youth vote is inflating Sanders' share in the exit polls, the effect should be magnified when Sanders does better among the young. In fact, the opposite happens:

Figure 1: Exit poll discrepancies vs. Sanders' performance among 18-29 year-olds. There's a slight negative correlation: the exit polls match more closely when more young voters support Sanders. Graph by Ted Soares.

This graph has the limitation of only including the 18-29 demographic, not slightly-older voters with similar response rates. But since age is the best predictor of candidate support, his 18-29 performance is still meaningful. If his support is low among the young, it's even lower with older demographics.

Soares' graph completely contradicts the logic of the youth overepresentation theory. If younger voters are counted too much, this overcounting should help Sanders more (increase the discrepancy) when he has higher support among those voters. The exact opposite occurred.

The youth overrepresentation theory is a failure when looked at state-by-state and overall. It can't work with individual states unless the exit poll discrepancies are minor. And the relationship between discrepancies and Sanders' youth support is contrary to what the theory should result in.

Combined theories

None of the three polling error theories above can explain the Clinton shift on their own. But perhaps when considered together, they do. Two or more individually-weak factors might work in tandem to skew the exit polls against Hillary Clinton. Compelling as this may seem, it's quite unlikely.

The enthusiasm gap and youth overrepresentation theories aren't just weak: they run completely contrary to what the polls say. More enthusiasm for Clinton is correlated with a significant exit poll miss in her favor, and higher support for Bernie among the youth is correlated with lower exit poll discrepancies. Rather than just being unable to fully explain a Clinton shift, they don't work at all.

That leaves the early/absentee theory on its own. Based on the data above, it was quite unsuccessful, especially for most of the larger discrepancies. Furthermore, this occurs even when supposing (unrealistically) that the exit polls underestimated the early/absentee vote by 300%. Massively driving up one of these factors in Hillary's favor still can't justify the exit polling shift.

Our two theories about response bias (enthusiasm gap and youth overrepresentation) are totally contradicted by the data. And early/absentee voting, which was already shown to be weak on its own, can't explain the Clinton shift even when pushed beyond the realm of plausibility. Combining the polling theories doesn't give us a better explanation.

Another reason?

None of the three polling error theories above, nor their combination, is able to explain the persistent Clinton shift. Of course, these are just the most common explanations, not an exhaustive list. There may well be another theory based on polling error that successfully explains the discrepancies. If somebody has one, feel free to comment below, and I'll take a look at it, perhaps updating this post.

Still, when the most common polling error theories can't explain the recurring discrepancies favoring Clinton, that's a good sign that the exit polls aren't the problem. If the Clinton shift can't be explained by random error or systematic bias, then it's likely the exit polls are actually quite good.

Until a successful polling error explanation comes along, it's logical to believe that the exit polls successfully captured voter intent. In that case, why do the official results fail to reflect it?

Uncounted ballots explanation

In Florida's 2000 election, the official results clearly failed to capture voter intent. Nearly 200,000 ballots were left uncounted, and thousands more were cast for unintended candidates due to ballot confusion. Al Gore may well have won had these errors not occurred. This disconnect between voter intent and vote counting could have led exit pollsters to overstate Gore's lead at 6.6%. Al Gore voters who marked their ballot incorrectly would believe they voted for Gore, and tell pollsters that.

Maybe that happened in the Democratic primaries: exit polls included people whose ballots weren't counted. If these ballots were mainly for Bernie Sanders, it could explain why the exit polls consistently had him doing better. Of course, if that's the case, we'd have to ask why there were enough uncounted ballots to affect the exit polls, and why most of them are for a particular candidate.

The most common way for ballots to be uncounted is when people are forced to vote on provisional ballots. Provisional ballots are meant to solve issues with voter eligibility. If a voter is denied at the polls (due to registration confusion, missing voter ID, or other problems), but believes they should be allowed to vote, they can cast a provisional ballot. Election officials will later review the provisionals and decide whether or not to count them.

The uncertainty of provisional ballots is a potential pitfall in exit polling. Edison does not ask voters if they voted provisionally in order to screen them out. If large numbers of people voted provisionally, told exit pollsters who they supported, and had their ballots rejected, the exit polls could be skewed.

In the primaries before May, over 220,000 provisional ballots were uncounted. About half of the uncounted provisionals were from New York and Arizona: 91,000 from New York City alone, and 20,000 more from Maricopa County. Most of those ballots were cast in the Democratic primaries.

Maricopa County officials suggested that large numbers of independents erroneously tried to vote in the primary. Arizona's Democratic primary is closed, meaning only registered Democrats can participate. If independents tried to do so anyway, they could vote provisionally, but their ballot would be rejected.

While Arizona had no exit polling, New York did and was also a closed primary. Could large numbers of independents voting provisionally, and answering exit pollsters, help explain New York's 12% polling miss? Independents usually favor Bernie Sanders, so that would make sense.

But looking further shows there's more going on than just voter mistakes. There are numerous accounts of people in closed primary states (Arizona, New York, and others) having their voter registration mysteriously changed to make them ineligible. Most were registered Democrats, who had their party affiliation switched or got purged from the rolls. Either one would have barred them from the closed primaries, leaving them no choice but to vote provisionally.

This first occurred in Arizona, with reports in Maricopa, Pima, Yavapai, and other counties. Not long after that, New York voters fell victim to the same issues. In Brooklyn alone, 126,000 registered voters were purged from the rolls, many of whom had never been inactive or moved. The same issues carried over to several other Democratic primaries, including Pennsylvania, Maryland, Connecticut, New Jersey, and California. All of these were closed or semi-open primaries.

Interestingly, the vast majority of victims seem to be Sanders supporters. Nearly every report online comes from the Sanders subreddit, or supporters of his on social media. This anecdotal hypothesis was confirmed by Anonymous for Arizona, and by Counterpunch for New York. It's likely that the same is true for the other states where this issue occurred.

Throughout multiple closed primary states, the pattern is clear: supporters of Bernie Sanders are being disenfranchised through voter registration irregularities. Unexplained party switches and roll purges have occurred since Arizona, and are only being reported by Sanders supporters. The one-sided nature and prevalence of the issues makes it unlikely that it's a simple glitch - it appears to be a targeted effort to disenfranchise Sanders supporters.

What would happen to the exit polling misses if all the uncounted provisionals were included for Sanders? (It's unlikely that all of them were for Sanders, but the vast majority probably were.) In New York, with its 91,000 rejected NYC provisionals, the results would be 55% to 44% for Clinton. Still a major exit polling miss, but only by 7% instead of 12%, just within the TSE. And adding in provisional ballots from elsewhere in the state may narrow the miss further.

Aside from New York, though, uncounted provisional ballots don't explain the discrepancies. The other closed primaries that were affected had much smaller exit polling misses. Pennsylvania, Maryland, and Connecticut had misses by 2.6%, -0.6%, and 2.2% in Clinton's favor. With the exception of New York and West Virginia, all of the largest exit polling misses were in open primaries less susceptible to these registration issues.

Registration fraud targeting Sanders supporters deserves to be looked it in more depth, and I plan to do that in a future post. But outside of Arizona and New York (as well as California), it's unlikely to have had much effect on the results. And that means it had little effect on the exit poll discrepancies.

Vote rigging explanation

Polling error can't explain the Clinton shift, meaning that the exit polls reflected voter intent. And the failure of the uncounted ballots explanation leaves no legitimate reason for the official results not to. We're left only with an illegitimate reason - vote rigging in favor of Hillary Clinton.

Vote rigging is often dismissed as an implausible conspiracy theory: it would require too many people, and someone would blow the whistle. But as explained in part 3, a single person is often all it takes to compromise an election. And there are multiple past examples of vote rigging without a perpetrator being found. People can and do get away with election fraud.

Many people studying the exit polls suspected vote rigging in Clinton's favor from the beginning. But there wasn't any proof until April, when fraudulent vote counts in Chicago were revealed. Illinois mandates a 5% hand-count audit to verify the machines' accuracy. The paper audit trails from the DRE machines are tallied by hand and compared against the machine counts. Multiple citizen observers who watched Chicago's audit gave disturbing testimony:


The auditors tried to hide the count from observers, clearly not wanting it to be overseen. Multiple times, the hand count differed from the machine count, so the auditors changed their hand tally to match. In one precinct, 21 Sanders votes were erased and 49 Clinton votes were added, in order to match the machine count. Several observers came forth with matching testimony and affidavits.

Chicago's audit provided the first concrete evidence that machines were miscounting votes. These miscounts occurred in many precincts, with the machines nearly always favoring Clinton. While it could simply be random machine error, the fact that one candidate consistently benefits makes it unlikely. Voting machines in Chicago were most definitely rigged in Clinton's favor. And if that happened in one city, we should be suspicious of it happening elsewhere.

Poll discrepancies and vulnerable machines

If exit poll discrepancies indicate vote rigging, greater discrepancies should have occurred where the machines are more vulnerable to tampering. Looking at the voting systems and audit procedures in each state mostly confirms that idea.

Alabama had the largest discrepancy favoring Clinton, 14.0%. Three of its four highest-population counties (Jefferson, Mobile, and Montgomery) use Model 100 optical scanners, whose firmware can easily be replaced by a malicious memory card. No hand audits of the paper ballots are conducted.

Georgia had one of the largest discrepancies, 12.2%. They use paperless AccuVote TSx machines statewide, which are severely vulnerable to voting machine viruses and leave no way to verify the results. And the central tabulator, GEMS, has easy backdoors to allow vote manipulation.

South Carolina had a 10.3% discrepancy in Clinton's favor. Just like Georgia, they use a touchscreen susceptible to viruses, the iVotronic, statewide with no paper trail.

Ohio had a fairly large exit polling shift to Clinton, 10.0%. Franklin, the second most populous county, uses iVotronic touchscreens. Summit, the fourth most populous county, uses the Model 100 optical scanner. The next five most populous counties (Montgomery, Lucas, Stark, Butler, Lorain) all use the AccuVote TSx. While the touchscreens have paper trails, Ohio doesn't audit them in the presidential primaries.

Mississippi had a 9.9% exit poll discrepancy. Three of its four highest-population counties - Harrison, DeSoto, and Rankin - use the AccuVote TSx, Model 100, and iVotronic, respectively. Six of its next seven most populous counties (Jackson, Madison, Lauderdale, Forrest, Jones, Lowndes) use AccuVote TSx, and one (Lee) uses the Model 100. Even where paper trails exist, no audits are done.

Texas had a 9.3% discrepancy for Clinton. Its eight highest-population counties (Harris, Dallas, Tarrant, Bexar, Travis, El Paso, Collin, Hidalgo) use paperless DRE machines that are all hackable: the AccuVote TSx, iVotronic, and eSlate. The lack of a paper trail makes Texas's minimal 1% audit useless.

Tennessee had an 8.3% exit polling miss. Shelby, its highest-population county, uses paperless AccuVote TSx machines, and is infamous for tampering with its GEMS central tabulator. The next two highest-population counties - Davidson and Knox - use paperless iVotronic and eSlate machines, respectively. All of these counties are susceptible to rigging without the 3% audit catching it.

Massachusetts had a Clinton shift of 8.0%, right at the margin of error. Each town has its own election system, with various optical scanners and hand counts being used. 219 towns, the vast majority, use the AccuVote OS, whose rigging was demonstrated in HBO's Hacking Democracy. A private contractor, LHS Associates, programs all of these machines with no oversight. 18 other towns use the Optech IIIP-Eagle, which can be rigged in the same way. No hand-count audits are done.

Indiana's exit poll discrepancy was 5.7%. The top four counties by population use the ES&S DS200 or Microvote Infinity, neither of which is known to be vulnerable to tampering. The next five most populous counties (Saint Joseph, Vanderburgh, Elkhart, Tippecanoe, Porter) use the Model 100, the paperless AccuVote TSx, or the paperless iVotronic. No hand-count audits are done.

Arkansas's discrepancy was 5.2%. Of its top six most populous counties, five (Pulaski, Benton, Washington, Faulkner, Saline) use a combination of iVotronic and Model 100/650 machines, while one (Sebastian) uses the DS200. The next two most populous counties (Craighead and Garland) also use the DS200. No hand-count audits are done. This makes the state reasonably possible to rig.

Michigan's discrepancy was 4.6%. Its eight most populous counties (Wayne, Oakland, Maycomb, Kent, Genesee, Washtenaw, Ingham, Ottawa) use a mix of Model 100, AccuVote OS, and Optech Insight machines, all of which have known vulnerabilities. No hand-count audits are done.

Virginia's discrepancy was 4.3%. Many of the lower-population counties use vulnerable optical scanners (such as the AccuVote OS) and DRE machines (such as the AccuVote TSx and iVotronic). However, the 10 highest-population counties use the DS200 or OpenElect, neither of which is known to be hackable. No hand-count audits are done. This makes some minor shift in the outcome possible.

Illinois's discrepancy was 4.1%. The most populous county, Cook, uses AVC Edge touchscreens that can easily be rigged, and Chicago's audit discovered that many of them were. Several other counties use vulnerable machines, but all have paper trails, and are subject to Illinois's 5% audits. While Cook County ran a corrupt audit that obscured machine rigging, other counties likely did them properly. Cook County alone, though, provided nearly triple Clinton's margin of victory.

Missouri's discrepancy was 3.9%. The most populous county, Saint Louis, uses the iVotronic and Model 100, both of which have known vulnerabilities. The next two most populous counties (Jackson and Saint Charles), however, use the OpenElect, which is not known to be vulnerable. After that, there are several counties using the AccuVote OS and Optech Insight. But all machines have a paper record subject to a 1% audit, making only minor machine rigging possible.

Florida's discrepancy was 3.4%. Its two highest-population counties, Miami-Diade and Broward, use the Model 650 for absentee ballots, which is vulnerable to firmware replacement. But aside from Palm Beach (using the Optech), the top eight counties by population all use the DS200 at the polling place, which has no known vulnerabilities. And Florida mandates 1% hand-count audit. This would allow only some minor rigging.

Pennsylvania's discrepancy was 2.6%. While the top five counties by population all use paperless DREs, only Alleghany and Montgomery use ones that are known to be hackable: the iVotronic and AVC Advantage, respectively. The next two most populous counties (Lancaster and Chester) use vulnerable optical scanners, but they're subject to 2% hand-count audits. It would be difficult for much significant rigging to occur.

Connecticut's discrepancy was 2.2%. It uses the AccuVote OS statewide, which is quite susceptible to rigging, but a 10% hand-count audit is required. This rigorous audit makes it difficult for vote rigging not to be caught, explaining the very small exit polling miss.

North Carolina's discrepancy was 1.7%. The top eight counties by population use a mix of iVotronic machines with a paper trail and Model 100/650 scanners. While all of these machines can be rigged, North Carolina has each county conduct random hand audits, making it less likely to occur.

Vermont's discrepancy was 1.1%. Each town conducts its elections with either the AccuVote OS or hand counts. No audits are required, so rigging the AccuVote optical scanners is possible. But Bernie Sanders' massive support in Vermont makes it nearly pointless, explaining the minimal discrepancy.

Maryland's discrepancy was -0.6%, finally in Sanders' favor, but just barely. The DS200 optical scanner, which has no known vulnerabilities, is used statewide. Since rigging the scanners is unlikely, Maryland's near-perfect match with the exit polls makes sense.

Wisconsin's discrepancy was -2.0%. The four most populous counties (Milwaukee, Dane, Waukesha, Brown) all use the relatively-secure DS200. Some slightly less populous counties use the AccuVote and Optech machines, which can be rigged, but Wisconsin mandates 5% hand audits. These factors make rigging unlikely to occur, which is consistent with a minimal -2.0% discrepancy.

That leaves us with two discrepancies favoring Sanders: New Hampshire at -4.2%, and Oklahoma at -6.1%. Both states use hackable machines statewide that conceivably could be rigged. But since these are the only significant discrepancies favoring Sanders, and still within the margin of error, we can likely chalk them up to polling error.

Looking state-by-state shows a very strong correlation between exit poll discrepancies and vulnerability to vote rigging. When hackable machines are in wide use, and audits aren't conducted (or can't be), the exit polls miss by a lot. When hackable machines are less prevalent, or audits are done, the exit polls match more closely. The easier it is to rig a state, the more the exit polls miss.

This correlation confirms the theory that exit poll discrepancies indicate vote rigging. Still, there are two states that fall outside the correlation: West Virginia and New York.

West Virginia's discrepancy was 12.0%, close to South Carolina's 12.2%. The states do share some similar machines: the iVotronic and Model 100. But West Virginia also uses the more-secure DS200, and requires a 5% audit of its machines, while South Carolina does not. Based on this, West Virginia's discrepancy should be much smaller. Its large discrepancy might be caused by a third candidate pulling significant votes from Bernie Sanders. That explains the deviation from the pattern.

New York's discrepancy was 11.6%. It uses the DS200 scanners (not provably hackable) downstate and the ImageCast scanners (maybe hackable) upstate, with a 3% hand audit statewide. Uncounted ballots may account for part of the discrepancy, reducing it to 6.6%. But that's still a larger discrepancy than many states with more vulnerable machines and worse audit procedures. Maybe polling error is to blame, or New York's vote rigging was more sophisticated. It deserves a closer look.

Aside from a couple outliers, exit poll discrepancies are well-correlated with how susceptible states are to vote rigging. Correlation doesn't prove a link, but it makes a strong case for the exit poll discrepancies to be showing vote rigging.

Statistical irregularities

The conclusion that exit poll discrepancies show vote rigging is reinforced by irregularities in the precinct results. Numerous states saw Clinton's vote share significantly increase with the precinct size. Demographics fail to explain the trend, indicating that a vote rigging algorithm is inflating Clinton's share in larger precincts. And almost all of the states with substantial exit poll discrepancies show this irregular pattern.

Election Justice USA analyzed the pattern in their 100-page report, Democracy Lost. It appeared in well over a dozen states, even when controlling for wealth, age, and race. They used the highest population counties in each state to confirm the significance of the pattern.

One possible demographic explanation is a rural vs. urban divide. Smaller precincts tend to be more rural and white (favoring Sanders); larger precincts are more densely-populated and urban, with more people of color (favoring Clinton). But the high-population counties tend to be more diverse and have few rural precincts. Furthermore, it was often possible to isolate the urban precincts, and the pattern still appeared.

Even within urban areas, however, wealth might be a factor. People of color, who tend to be less wealthy, often live in concentrated areas such as apartments. Wealthier, often white, people are more likely to live in less-dense areas such as suburbs. So it's still logical that larger precincts would have more people of color, to Clinton's benefit. But Clinton also did better among wealthier Democrats, who are more likely to be in the smaller precincts. Both small and large precincts have demographic factors favoring Clinton, so there shouldn't be a major change in her vote share between them.

There might also be age differences between the precincts, which is the most significant predictor of candidate support. EJUSA controlled for this by isolating early/absentee voting, which skews towards older voters. Sometimes, this control made the vote share increase vanish, like in Cuyahoga County OH. Other places, such as Fulton County GA, major CA counties (LA, San Diego, Orange), and Chicago, continued to show the trend. Often, it was stronger for early/absentee voting than on election day.

Racial differences between precincts are still a potential explanation. This was controlled for by isolating counties and (where possible) precincts with high percentages of a particular race. The Bronx (majority Latino) showed the pattern, even looking only at the majority Latino precincts. Louisiana showed the pattern with precincts in which 95% of voters were white Democrats. Across the country, the pattern appears in counties regardless of racial makeup. While precinct size has some correlation with race, partially explaining Clinton's rising vote share, it's also an independent factor.

Despite numerous demographic controls, Clinton's vote share increases steadily with precinct size. Without a demographic explanation, the only possible conclusion is that the increase is intentional. A vote rigging algorithm is raising Clinton's share as the precinct gets larger. This increases the impact on the results (targeting larger precincts has a greater effect), while disguising the rigging (it's easier to hide vote flipping in precincts with more total votes).

For the most part, significant exit polling misses match up with the vote rigging pattern. The only states that don't are New Hampshire (4.2% miss for Sanders), Massachusetts (8.0% miss for Clinton), Texas (9.3% miss for Clinton), and Maryland (0.6% miss for Sanders). Massachusetts fits the pattern when only college towns are analyzed, implying that college towns in particular were targeted. The other three states deserve further analysis. Overall, though, the correlation is quite strong, reinforcing the idea that the exit poll discrepancies indicate vote rigging.

Alabama, with a 14.0% exit polling miss, is a particularly good example. Its largest counties with the easily-hackable Model 100 scanners (Jefferson, Mobile, Montgomery) show the vote rigging pattern, while its largest counties with the more-secure DS200 scanners (Madison, Tuscaloosa) don't:


Figure 2: Alabama's highest-population DS200 counties. Aside from the last couple precincts, precinct size has little effect on Clinton's performance. Credit to EJUSA.

Figure 3: Alabama's two highest-population Model 100 counties. Clinton's vote share steadily increases with precinct size here. Credit to EJUSA.


Tennessee, with an 8.3% exit polling miss, shows the pattern in Shelby County. Shelby County uses paperless hackable AccuVote touchscreens, and election fraud on its GEMS central tabulator has been caught in the past. The pattern remains when controlling for urbanness, by only looking at the diverse city of Memphis:

Figure 4: The vote rigging pattern appears in Shelby County TN as a whole (left), as well as Memphis (right), showing that urban vs. rural is not the deciding factor. Credit to EJUSA.


Louisiana didn't have exit polls, but Clinton overperformed pre-election polls by 8.9%. They use paperless AVC Edge and Advantage DREs, both of which have similar vulnerabilities. EJUSA controlled for race by isolating precincts with specific percentages of white Democrats. In every racial grouping, Clinton's vote share continued to increase with precinct size.

Other states with exit poll misses and hackable machines show similar signs of vote rigging. As do several late-May and June states with no exit polls (Kentucky, California, New Jersey, New Mexico).

Conclusion

Election fraud benefiting Clinton is the best explanation for what we've seen in the Democratic primaries. Nothing else can explain the exit poll discrepancies, altered registrations, and precinct vote irregularities.

It's mathematically impossible for the Clinton shift to be caused by random sampling error. Simple probability calculations confirm that. And all three explanations for a polling bias (misestimating early/absentee voting, enthusiasm gap, youth overrepresentation) don't hold water. Barring another polling error explanation, the exit polls clearly reflect voter intent.

Some of the exit polling misses are attributable to large numbers of uncounted provisional ballots. This is not necessarily fraudulent, but in the case of the Democratic primaries, it was. Most of the provisional ballots were cast due to registration issues: voters who had their party switched or were purged. Nearly all of the voters affected were Bernie Sanders supporters, indicating targeted voter registration tampering.

Most exit polling misses, though, cannot be explained by uncounted ballots either. By deduction, the only possible explanation is vote rigging. Far from being an implausible conspiracy, vote rigging can happen with minimal human involvement, and has in the past. It was also confirmed by audit observers in Chicago, raising the likelihood that other jurisdictions had incorrect results too.

The conclusion that exit poll discrepancies show vote rigging is further confirmed with a state-by-state analysis. States that are more vulnerable to vote rigging - more hackable machines, and less effective audit procedures - have higher exit poll discrepancies. And in almost every state with major exit poll discrepancies, Clinton's vote share increases with precinct size independently of demographics, indicating that precinct results were altered to increase her performance.

Might there be an explanation for all of this that doesn't involve election fraud? Perhaps, but at this point, the burden of proof is on the skeptics to show that. The evidence for election fraud in the Democratic primaries is now overwhelming.

What next?

At this point, Clinton is the Democratic nominee. Even if the election fraud was enough to reverse the winner (as it very well might have been), her nomination is still a fact. But that doesn't make continuing to focus on it a waste of time. What's most important is knowing the truth, regardless of how long it is after 2016. Every instance of fraud we document now gives us more cause to reform the system later.

In terms of researching and documenting election fraud, there's plenty more to be done. Of all the states that were tampered with, California's story is the most complex and still ongoing. Chronicling everything that took place there is important. Another essential task is investigating exactly how the registration tampering took place. That question may never be answered, but there are still some clues to follow.

Of course, research and writing can only do so much. Various lawsuits related to the primaries are in progress. Cliff Arnebeck has been preparing a RICO lawsuit to get recounts across the country. Bob Fitrakis has filed suit to get raw, precinct-level exit polls from Edison. Chicago's board of elections was recently sued to redo the voting machine audit properly. A judge in San Diego just ruled that election officials improperly excluding certain ballots from a random audit. It remains to be seen what'll happen with them.

Meanwhile, all of us who care about election integrity should also turn our attention to the future. We have a lot of evidence for fraud in the primaries, but it's all circumstantial: statistical analyses and deductions. This evidence remains unconvincing to many, especially election officials and judges. What we need to start doing is collecting hard, irrefutable evidence of election fraud: poll tape photos, central tabulator logs, videos of suspicious things, etc. That way, if election fraud happens in the future, we can fight back with solid proof.

All the while, it's easy to forget that we, the people, shouldn't have to do any of this. For years, it's been the job of activists to show that elections were fraudulent or suspicious. That's completely backwards. In reality, the states running the elections should prove that they weren't unfair. Currently, our system of electronic voting makes that impossible. What we need is a new system.

Many people believe it should be hand-counted paper ballots. Others think we should use electronic machines, but find a way to make them secure. Having that debate is fine. But first, we've got to realize that we need to have it. It's time for a mass movement calling for election reform.

Other posts on election fraud