What the USCV Report Doesn’t Say (Part I)

Exit Polls Legacy blog posts
Mea Culpa (4/14):  This post originally argued under heading “3)” below that sampling error alone will create an artifact in the data tabulation provided in the Edison-Mitofsky report. That assertion was flatly incorrect.   The artifact that had been discussed on the AAPORnet listserv in January would theoretically result, not from sampling error, but from randomly distributed clerical errors in the count data obtained by Edison-Mitofsky.  Although my subsequent discussion of the implications of the artifact (wrongly defined) may prove similar to the implications of the theoretical artifact I should have described, it was certainly premature to conclude that such an error “undermines” the USCV thesis.  I apologize for that error.
However, the underlying issue – the appropriate interpretation and analysis of the table of within-precinct-error by partisanship that appears in on p. 36 of the Edison-Mitofsky report – remains of central importance to the USCV thesis.  Appropriately chastened by my error, I am digging much deeper into this issue (as are other bloggers and academics) and I will have more to say on it in a few days.
Please note:  I stand by the arguments made in points #1 and #2 below.  In the ethical spirit of the blogosphere, I will leave the original section #3 intact, albeit with strikeout or clarification to correct the most glaring problems.

A few weeks ago, I asked if readers were “tired of exit polls yet.”  The replies, though less than voluminous, were utterly one sided.  Twenty-five of you emailed to say yes, please continue to discuss the exit poll controversy as warranted.  Only one emailer dissented, and then only in hope that I not let the issue “overwhelm the site.”  Of course those responses are a small portion of those who regularly visit MP on a daily or weekly basis, who for whatever reason, saw no pressing need to email a reply.  To those who responded, thank you for your input.  I will continue to pass along items of interest when warranted.

One such item came up late last week.  A group called US Count Votes (USCV) released a new report and executive summary — a follow-up to an earlier effort — that collectively take issue with the lengthy evaluation of the exit polls prepared for the consortium of news organizations known as the National Election Pool (NEP) by Edison Research and Mitofsky International.  The USCV report concludes that “the [exit poll] data appear to be more consistent with the hypothesis of bias in the official count, rather than bias in the exit poll sampling.”   MP has always been skeptical of that line of argument, but while there is much in USCV report to chew over, I am mostly troubled by what the report does not say. 

First some background:  The national exit polls showed a consistent discrepancy in the vote that favored John Kerry.  The so-called “national” exit poll, for example, had Kerry ahead nationally by 3% but George Bush ultimately won the national vote by a 2.5% margin.  Although they warned “it is difficult to pinpoint precisely the reasons,” Edison-Mitofsky advanced the theory that “Kerry voters [were] less likely than Bush voters to refuse to take the survey” as the underlying reason for much of the discrepancy.  They also suggested that “interactions between respondents and interviewers” may have exacerbated the problem (p.4).  The USCV report takes dead aim at this theory, arguing that “no data in the report supports the E/M hypothesis” (USCV Executive Summary, p. 2). 

The USCV report puts much effort into an analysis of data from two tables provided by Edison-Mitofsky on pp. 36-37 of their report.  As this discussion gets into murky statistical details quickly, let me first try to explain those tables and what the USCV report says about them.  Both tables show averages values from 1,250 precincts in which NEP conducted at least 20 interviews and was able to obtain precinct level election returns.  Each table divides the exit poll categories into five categories based on the “partisanship” of the precinct.  In this case, partisanship really means preference for either Bush or Kerry.  “High Dem” precincts are those where John Kerry received 80% or more of the vote, “Mod Dem” precincts are where Kerry received 60-80% and so on.

The first table shows a measure of the average precinct level discrepancy between the exit poll and the vote count that Edison-Mitofsky label “within precinct error” (or WPE – Note: I reproduced screen shots of the original tables below, but given the width of this column of text, you will probably need to click on each to see a readable full-size version).

To calculate the error for any given precinct, they subtract the margin by which Bush led (or trailed) Kerry in the count from the the margin by which bush led (or trailed) Kerry in the unadjusted poll sample.  So (using the national results as an example) if had Kerry led on the exit poll by 3% (51% to 48%), but Bush won the precinct count by 2.5% (50.7% to 48.3%), the WPE for that precinct would be -5.5  [ (48-51) – (50.7 – 48.3) = -3 – 2.5 = -5.5.  A negative value for WPE means Kerry did better in the poll than in the count, positive values mean a bias toward Bush.  Edison-Mitosfky did this calculation for 1,250 precincts with at least 20 interviews per precinct and where non-absentee Election Day vote counts were available.  The table shows the average (“mean”) for WPE, as well as the median and the average “absolute value” of WPE.  In the far right column the “N” size indicates the number of precincts in each category. 

The average WPE was -6.5 percent – meaning an error in the Bush-Kerry margin of 6.5 points favoring Kerry.  The USCV places great importance on the fact that average WPE (the “mean”) appears to be much bigger in the “High Rep” precincts (-10.0) and than in the “High Dem” precincts (+0.3). 

The second table shows the average completion rates across same partisanship categories. 

MP discussed this particular table at length back in January.  A bit of review on how to read the table:  Each number is a percentage and you read across. Each row shows completion, refusal and miss rates for various categories of precincts, categorized by their level of partisanship. The first row shows that in precincts that gave 80% or more of their vote to John Kerry, 53% of voters approached by interviewers agreed to be interviewed, 35% refused and another 12% should have been approached but were missed by the interviewers.

The USCF report argues that this table contradicts the “central thesis” of the E-M report, that Bush voters were less likely to participate in the survey than Kerry voters:

The reluctant Bush responder hypothesis would lead one to expect a higher non-response rate where there are many more Bush voters, yet Edison/Mitofsky’s data shows that, in fact, the response rate is slightly higher in precincts where Bush drew ?80% of the vote (High Rep) than in those where Kerry drew ?80% of the vote (High Dem). (p. 9)

As I noted back in January, this pattern is a puzzle but does not automatically “refute” the E-M theory: 

If completion rates were uniformly higher for Kerry voters than Bush across all precincts, the completion rates should be higher in precincts voting heavily for Kerry than in those voting heavily for Bush….However, the difference in completion rates need not be uniform across all types of precincts. Mathematically, an overall difference in completion rates will be consistent with the pattern in the table above if you assume that Bush voter completion rates tended to be higher where the percentage of Kerry voters in the precincts was lower, or that Kerry voter completion rates tended to be higher where the percentage of Bush voters in the precincts was lower, or both.  I am not arguing that this is likely, only that it is possible.

The USCV report essentially concedes this point but then goes through a series of complex calculations in an effort to find hypothetical values that will reconcile the completion rates, the WPE rates and the notion that Kerry voters participated at a higher rate than Bush voters.  They conclude:

[While] it is mathematically possible to construct a set of response patterns for Bush and Kerry voters while faithfully reproducing all of Edison/Mitofsky’s “Partisanship Precinct Data”… The required pattern of exit poll participation by Kerry and Bush voters to satisfy the E/M exit poll data defies empirical experience and common sense under any assumed scenario. [emphasis in original]

I will not quarrel with the mechanics of their mathematical proofs.   While there are conclusions I differ with, what troubles me most is what USCV does not say

1)  As reviewed here back in January, the Edison-Mitofsky report includes overwhelming evidence that the error rates were worse when interviewers were younger, relatively less experienced, less well educated or faced bigger challenges in selecting voters at random.    The USCV report makes no mention of the effects of interviewer expeerience or characteristics and blithely dismisses the other factors as “irrelevant” because any one considered alone fails to explain all of WPE.   Collectively, these various interviewer characteristics are an indicator of an underlying factor that we cannot measure:  How truly “random” was the selection of voters at each precinct?  It is a bit odd – to say the least – that the USCV did not consider the possibility that the cumulative impact of these factors might explain more error than any one individually. 

[Clarification 4/10 :  As Rick Brady points out, errors in either direction (abs[WPE]) were highest among the least well educated.  However, WPE in Kerry’s favor was greatest among interviewers with a post-graduate education].

2)  More specifically, the USCV report fails to mention that poor performance by interviewers also calls into question the accuracy of the hand tallied response rates that they dissect at great length.  Edison-Mitofsky instructed its interviewers keep a running hand count of those who they missed or who refused to participate.  If WPE was greatest in precincts with poor interviewers, it does not take a great leap to consider that non-response tallies might be similarly inaccurate in those same precincts. 

MP can also suggest a few theories for why those errors might artificially depress the refusal or miss rates in those problem precincts.  If an interviewer is supposed to interview every 5th voter, but occasionally takes the 4th or the 6th because they seem more willing or approachable, their improper substitution will not show up at all in their refusal tally. Some interviewers having a hard time completing interviews may have fudged their response tallies out of fear being disciplined for poor performance.   

3)  Most important, the USCV report fails to point out a critical artifact in the data that explains why WPE is so much higher than average in the “High Rep” precincts (-10.0)  and virtually non-existent in the “High Dem” precincts (+0.3).  It requires a bit of explanation, which may get even more murky, but if you can follow this you will understand the fundamental flaw in the USCV analysis.   

[4/13 – CLARIFICATION: On April 11 and 12, 2005, after this message
was posted, the USCV authors twice revised their paper to include and then
expand a new Appendix B that discusses the implications of this artifact.  4/14 – As I am in error below in describing the artifact, my conclusion that it represents a “fundamental flaw,” is premature at best ]

Given the small number of interviews, individual precinct samples will always show a lot of “error” due to ordinary random sampling variation.  I did my own tabulations on the raw data released by NEP.  After excluding those precincts with less than 20 interviews, the average number of unweighted interviews per precinct is 49.3 with a standard deviation of 15.9.  That’s quite a range — 10% of these precincts have an n-size of just 20 to 30 interviews — and makes for a lot of random error.  For a simple random sample of 50 interviews the ordinary “margin of error” (assuming a 95% confidence level) is +/- 14%.  With 30 interviews that error rate is +/- 18%.  The error on the difference between two percentages (the basis of WPE) gets bigger still. 

Edison-Mitofsky did not provide the standard errors for their tabulations of WPE, but they did tell us (on p. 34 of their report) that the overall standard deviation for WPE in 2004 was 18.2.  If we apply the basic principals of the standard deviation to an average WPE of -6.5, we know that 68% of precincts had WPEs that ranged from from -24.7 in Kerry’s favor to 11.7 in Bush’s favor.  The remaining third of the precincts (32%) had even greater errors, and a smaller percentage — roughly 5% — had errors greater than -42.9 in Kerry’s favor or 36.4 in Bush’s favor.   That’s a very wide spread. 

First consider what would happen if the vote count and exit poll had been perfectly executed, with zero bias in any direction.  Even though the random error would create a very wide spread of values for WPE (with a large standard deviation), the extreme values would cancel each other out leaving an average overall WPE of 0.0.

Now consider the tabulation of WPE by precinct partisanship under this perfect scenario.  In the three middle partisanship categories where each candidate’s vote falls between 20% or 80% (Weak Rep, Even and Weak Dem) the same canceling would occur, and WPE would be close to zero. 

But in the extreme categories, there would be a problem.  The range of potential error is wider than the range of potential results.  Obviously, candidates cannot receive more than 100% or less than 0% of the vote.  So in the “Strong Rep” and “Strong Dem” categories, where there candidate totals are more than 80% or less than 20%, the ordinarily wide spread of values would be impossible.  Extreme values would get “clipped” in one direction but remain in place in the other.  Thus, even with perfect sampling, perfect survey execution and a perfect count, we would expect to see a negative average WPE (an error in Kerry’s favor) in the “Strong Rep” precincts and a positive average WPE (an error in Bush favor) in the in the “Strong Dem” category.  In each “Strong” category, the extreme values on the tail of the distribution closest to 50% would pull the average up or down, so the median WPE would be closer to zero than then mean WPE.  Further, the absolute value of WPE would be a bit larger in the middle categories than the “Strong” categories.  The table would show differences in WPE between Strong Dem and Strong Rep precincts, but none would be real.

[4/14 – My argument that random error alone causes such an artifact is not correct]. 

Now take it one step further.  Assume a consistent bias in Kerry’s favor — an average WPE of -6.5 — across all precincts regardless of partisanship.  Every one of the averages will get more negative, so the negative average WPE in the Strong Rep precincts will get bigger (more negative) and the positive average WPE will get smaller (closer to zero).  Look back at the table from the Edison-Mitofsky report.  That’s exactly what happens.  Although the average WPE is -6.5, it is -10.00 -10.04 in the Strong Rep precincts and 0.3 0.5 in the Strong Dem.

Finally, consider the impact on the median WPE (the median is the middle point of the distribution). For the Strong Rep category, pulling the distribution of values farther away from 50% increases the odds that extreme values on the negative tail of the distribution will not be canceled out by extreme values of the positive end.  So for Strong Rep, the median will be smaller than the mean. The opposite will occur on the other end of the scale.  For the Strong Dem category, the distribution of the values will be pulled closer to 50%, so the odds are greater that the extreme errors on the negative end of the distribution (in Kerry’s favor) will be present to cancel out some of the extreme errors favoring Bush.  Here the mean and median WPE will be closer.  Again, this is exactly the pattern in the Edison-Mitfosky table.

Thus, the differences in WPE by precinct partisanship appear to be mostly an artifact of the way the data were tabulated.  They are not real. 

[4/14 – Given the error above, this conclusion is obviously premature.  A similar artifact based on randomly distributed clerical errors in the count data gathered by Edison-Mitofsky may  produce such a pattern.  I’ll have more to say on this in a future post]. .

Yet the USCV report plows ahead using those differences in a long series of mathematical proofs in an attempt to refute the Edison-Mitofsky theory.  As far as I can tell, they make no reference to the possibility of an artifact that the Edison-Mitofsky report plainly suggests in the discussion of the table (p. 36):

The analysis is more meaningful if the precincts where Kerry and Bush received more than 80% of the vote are ignored. In the highest Kerry precincts there is little room for overstatement of his vote. Similarly the highest Bush precincts have more freedom to only overstate Kerry rather than Bush. The three middle groups of precincts show a relatively consistent overstatement of Kerry.

[4/13 CLARIFICATION & CORRECTION:  As noted above, the new Appendix B of the USCV report now discusses this implications of this artifact and implies that the authors took this artifact into account in designing the derivations and formulas detailed in Appendix A.  A statistician I consulted explains that their formulas make assumptions about the data that minimize or control for artifact.  I apologize for that oversight]

Read the USCV report closely and you will discover that their most damning words- “dramatically higher,” “very large spread,” “implausible patterns,” “defies empirical experience and common sense” – all apply to comparisons of data derived from the phantom patterns in the Strong Dem or Strong Rep precincts.  Next follow the advice from Edison-Mitofsky and examine their charts and tables (like the one copied below), but ignoring the end points.  The patterns seem entirely plausible and tend to support the Edison-Mitofsky theory rather than contradict it. 

But USCV goes further, arguing that the difference between the mean and median values of WPE in Strong Bush precincts supports their theory that “vote-counting anomalies occurred disproportionately in ‘high-Bush’ precincts” (p. 15).  To put it nicely, given what we know about why the mean and median values are so different in those precincts, that argument falls flat. 

The statistical artifact that undermines the USCV argument was not exactly a secret.  The Edison-Mitofsky report alluded to it.  It was also discussed back in January on the AAPOR member listserv, a forum widely read by the nation’s foremost genuine experts in survey methodology.  You can reach your own conclusions as to why the USCV report made no mention of it.

UPDATE:  A DailyKos diarist named “Febble” has done a very similar an intriguing post that points up a slightly different artifact in the NEP data.    It’s definitely worth reading if you care about this subject.

[4/13 & 4/14  – CLARIFICATION & CORRECTION:  As noted above, the USCV report now discusses a similar the artifact and its implications.  I overlooked that the assumptions of the derviations made by the USCV authors may be that are relevant to the artifact I should have described.  In retrospect, I was in error to imply that they failed to take the artifact into account and my conclusions about the implications of their derived values was premature.  As such, I struck the two sentences above.  I apologize for the error which was entirely mine.

In clarifying these issues in Appendix B, the USCV authors indicate that the derived response rates they consider most implausible occur in the 40 precincts that were most support of George Bush (pp. 26-27):

WPE should be greatest for more balanced precincts and fall as precincts become more partisan. The data presented on p. 36, 37 of the E/M report and displayed in Table 1 of our report above, show that this is the case for all except the most highly partisan Republican precincts for which WPE dramatically increases to -10.0%. Our calculations above show the differential partisan response necessary to generate this level of WPE in these precincts ranges from 40% (Table 2) to an absolute minimum of 20.5% (Table 4)…

The dramatic and unexpected increase in (signed) mean WPE in highly Republican precincts of -10.0%, noted above, is also unexpectedly close to mean absolute value WPE (12.4%) in these precincts. This suggests that the jump in (signed) WPE in highly partisan Republican precincts occurred primarily because (signed) WPE discrepancies in these precincts were, unlike in [the highly Democratic precincts] and much more so than in [the middle precincts], overwhelmingly one-sided negative overstatements of Kerry’s vote share.

To assess the plausibility of the dervied response rates in the Republican precincts, we need to take a closer look at the assumptions the authors make and their implications.  I will do so in a subsequent post.]

There is more of relevance that the USCV leaves out, but I have already gone too long for one blog post.  I will take up the rest in a subsequent post.  To be continued…

Mark Blumenthal

Mark Blumenthal is the principal at MysteryPollster, LLC. With decades of experience in polling using traditional and innovative online methods, he is uniquely positioned to advise survey researchers, progressive organizations and candidates and the public at-large on how to adapt to polling’s ongoing reinvention. He was previously head of election polling at SurveyMonkey, senior polling editor for The Huffington Post, co-founder of Pollster.com and a long-time campaign consultant who conducted and analyzed political polls and focus groups for Democratic party candidates.