April 29, 2005
The Liddle Model That Could
Regular readers of this blog may know her as "Febble," the diarist from DailyKos. Her real name is Elizabeth Liddle, a 50-something mother and native of Scotland who originally trained as a classical musician and spent most of her life performing, composing and recording renaissance and baroque music. She also studied architecture and urban design before enrolling in a PhD program in Cognitive Psychology at the University of Nottingham where she is currently at work on a dissertation on the neural correlates of dyslexia. In her spare time, she wrote a children's book and developed in interest in American politics while posting on the British Labor party on DailyKos. A self proclaimed "fraudster" she came to believe our election "may well have been rigged," that "real and massive vote suppression" occurred in Ohio where the recount was "a sham" and the Secretary of State "should be jailed."
She is, in short, perhaps the least imaginable person to have developed a computational model that both suggests a way to resolve the debate about whether the exit polls present evidence of fraud and undermines the central thesis of a paper by a team of PhD fraud theorists.
But that's exactly what she's done.
Assignment editors, if you are out there (and I know you are), while it is a bit complex and geeky, this is an amazing story. Read on...
Let's start at the beginning.
As nearly everyone seems to know, the "early" exit polls conducted on November 2, 2004 (the ones tabulated just before the polls closed in each state), had John Kerry leading George Bush nationally and had him running stronger in most states than he ultimately did in the final count. On the Internet, an argument has waged ever about whether those exit polls present evidence of vote fraud, about the possibility that the exit polls were right and the election result was wrong.
In late January, Edison Research and Mitofsky International, the two companies that conducted the exit polls on behalf of the consortium of television network news organizations known as the National Election Pool (NEP) issued a public report that provided an analysis of what happened accompanied by an unprecedented release of data and information about the workings of the exit polls. The tabulations provided in that report are the basis of the ongoing debate.
On March 31, an organization known as U.S. Count Votes (USCV) released a new "scientific paper" and executive summary that "found serious flaws" with the Edison-Mitofsky (E-M) report (according to the USCV press release). They found that the E-M explanation "defies empirical experience and common sense" and called for a "thorough investigation" since "the absence of any statistically-plausible explanation for the discrepancy between Edison/Mitofsky's exit poll data and the official presidential vote tally is an unanswered question of vital national importance" (p. 22). [MP also discussed this paper and its conclusions in a two-part series earlier this month].
In an email, Elizabeth Liddle explained to me that she discovered the USCV web site while "poking about the web" for data on the U.S. elections. After doing her own statistical tests on Florida election returns, she became part of the discussion among the USCV statisticians that ultimately led to their paper. While she reviewed early drafts, she ultimately came to disagree with the conclusions of the final report.
[Full disclosure: For the last two weeks, I have had the unique opportunity to watch the development of Elizabeth's work through a running email conversation between Elizabeth, Rick Brady of StonesCryOut and "DemFromCT" from DailyKos. My use of the familiar "Elizabeth" henceforth results from that remarkable experience. This post benefits greatly from their input, although as always, the views expressed here are my own].
To understand Elizabeth's contribution to this debate, we need to consider the various possible reasons why the exit polls might differ from the count.
Random Sampling Error? - All polls have some built in error (or variability) that results from interviewing a sample of voters rather than the entire population. Although there have been spirited debates about the degree of significance within individual states (here, here, here and here), all agree that the there was a consistent discrepancy at the national level that had Kerry doing better in the poll than the count. The "national sample" of precincts (a subsample of 250 precincts) showed Kerry winning by three points (51% to 48%), but he ultimately lost the race by 2.5 points (48.26% to 50.73%). The E-M report quantifies that error (on p. 20) by subtracting the Bush margin in the election (+2.5) from the Bush margin in the poll (-3.0) for a total error on the national sample of -5.5 (a negative number means an error that favored Kerry in the poll, a positive number means an error favoring Bush).
At the state level, the E-M report showed errors in the Bush-Kerry poll margin (the Bush vote minus the Kerry vote) favoring Kerry in 41 of 50 states and averaging -5.0. At the precinct level the discrepancy favoring Kerry in the poll averaged -6.5 Both E-M and USCV agree that random sampling error alone does not explain these discrepancies. Biased Sample of Precincts? - Exit pollsters use a two step process to sample voters. They first draw a random sample of precincts and then have interviewers approach a random sample of voters at each precinct. The exit pollsters can check for any sort of systematic bias in the first stage by simply replacing the interviews in each selected precinct with the count of all votes cast in each precinct. As explained on pp. 28-30 of the E-M report, they did so and found a slight error in Bush's favor (+0.43). Both E-M and USCV agree that the selection of precincts did not cause the larger discrepancy in Kerry's favor. The remaining discrepancy occurred at the precinct level, something the E-M report calls "within precinct error" (or WPE) (p. 31).
Response Bias? - The E-M report summarized its main conclusion (p. 3):
Our investigation of the differences between the exit poll estimates and the actual vote count point to one primary reason: in a number of precincts a higher than average Within Precinct Error most likely due to Kerry voters participating in the exit polls at a higher rate than Bush voters (p. 3)
Although the E-M report made no attempt to prove that Kerry voters were more likely to want to participate in exit polls than Bush voters (they noted that the underlying "motivational factors" were "impossible to quantify"- p. 4), they based their conclusion on two findings: (1) A similar but smaller pattern of Democratic overstatement had occurred in previous exit polls and (2) errors were greater when interviewers were less experienced or faced greater challenges following the proscribed random selection procedures. [Why would imperfect performance by the interviewers create a pro-Kerry Bias? See Note 1]
Bias in the Official Count? - The USCV report takes strong exception to the E-M conclusion about response bias, which they termed the "reluctant Bush responder (rBr)" hypothesis. The USCV authors begin by arguing that "no data in the E/M report supports the hypothesis that Kerry voters were more likely than Bush voters to cooperate with pollsters (p. 8)." But they go further, claiming "the required pattern of exit poll participation by Kerry and Bush voters to satisfy the E/M exit poll data defies empirical experience and common sense" (p. 12). This is the crux of the USCV argument. Refute the "reluctant Bush responder" theory, and the only remaining explanation is bias in the official count.
To make this case, the USCV authors scrutinize two tables in the E-M report that tabulate the rate of "within precinct error" (WPE) and the survey completion rates by the "partisanship" of the precinct (in this case, partisanship refers to the percentage of the vote received by Kerry). I combined the data into one table that appears below:
If the Kerry voters had been more likely to participate in the poll, the USCV authors argue, we would "expect a higher non-response rate where there are many more Bush voters" (p. 9). Yet, if anything, the completion rates are "slightly higher [0.56] in the in precincts where Bush drew >=80% of the vote (High Rep) than in those where Kerry drew >=80% of the vote (High Dem)" [0.53 - although the E-M report says these differences are not significant, p. 37].
Yet the USCV report concedes that this pattern is "not conclusive proof that the E/M hypothesis is wrong" because of the possibility that the response patterns were not uniform in across all types of precincts (p. 10). I made a similar point back in January. They then use a series of algebraic equations (explained in their Appendix A) to derive the response rates for Kerry and Bush voters in each type of precinct that would be consistent with the overall error and response rates in the above table.
Think of their algebraic equations as a "black box." Into the box go the average error rates (mean WPE) and overall response rates from the above table, plus three different assumptions of the Kerry and Bush vote in each category of precinct. Out come the differential response rates for Kerry and Bush voters that would be consistent with the values that went in. They get values like those in the following chart (from p. 12):
The USCV authors examine the values in this chart, and they note the large differences in required differential response rates for Kerry and Bush supporters in the stronghold precincts on the extreme right and left categories of the chart and conclude:
The required pattern of exit poll participation by Kerry and Bush voters to satisfy the E/M exit poll data defies empirical experience and common sense under any assumed scenario [p. 11 - emphasis in original].
Thus, they find that the "'Reluctant Bush Responder" hypothesis is inconsistent with the data," leaving the "only remaining explanation - that the official vote count was corrupted" (p. 18). Their algebra works as advertised (or so I am told by those with the appropriate expertise). What we should consider is the reliability of the inputs to their model. Is this a "GIGO" situation? In other words, before we put complete faith in the values and conclusions coming out of their algebraic black box, we need to carefully consider the reliability of the data and assumptions that go in.
First, consider the questions about the completion rates (that I discussed in an earlier post). Those rates are based on hand tallies of refusals and misses kept by interviewers on Election Day. The E-M report tells us that 77% had never worked as exit poll interviewers before and virtually all worked alone without supervision. The report also shows that rates of error (WPE) were significantly higher among interviewers without prior experience or when the challenges they faced were greater. At very least, these findings suggest some fuzziness in the completion rates. At most, they suggest that the reported completion rates may not carry all of the "differential" response that could have created the overall discrepancy [How can that be? See Note 2].
The second input into the USCV model is the rate of error in each category of precinct, more specifically the mean "within precinct error" (WPE) in the above table. This, finally, brings us to the focus of Elizabeth Liddle's paper. Her critical insight, one essentially missed by the Edison-Mitofsky report and dismissed by the USCV authors, is that "WPE as a measure is itself confounded by precinct partisanship." That "confound" creates an artifact in the tabulation of WPE that causes a phantom pattern in the tabulation of WPE by partisanship. The phantom values going in to the USCV model are another reason to question the "implausible" values that come out.
Elizabeth's paper explains all of this in far greater detail than I will attempt here, and is obviously worth reading in full by those with technical questions (also see her most recent DailyKos blog post). The good news is she tells the story with pictures. Here is the gist.
[Update: here is an alternate link for the paper]
Her first insight, explained in an earlier DailyKos post on 4/6/05 is that the value of WPE "is a function of the actual proportion of votes cast." Go back to the E-M explanation for the exit poll discrepancy: Kerry voters participated in the exit poll at a slightly higher rate (hypothetically 56%) than Bush voters (50%). If there were 100 Kerry voters and 100 Bush voters in a precinct, an accurate count would show a 50-50% tie in that precinct, but the exit poll would sample 56 Kerry voters, and 50 Bush voters showing Kerry ahead 53% to 47%. This would yield a WPE of -6.0. But consider another hypothetical precinct with 200 Bush voters and 0 Kerry voters. Bush will get 100% in the poll regardless of the response rate. Thus response error is impossible and WPE will be zero. Do the same math assuming different levels of Bush and Kerry support in between and you will see that if you assume constant response rates for Bush and Kerry voters across the board, the WPE values get smaller (closer to zero) as the vote for the leading candidate gets larger as illustrated in the following chart (you can click on any chart to see a fullsize version:
Although this pattern is an artifact in the data, it does not undermine the USCV conclusions. As they explained in Appendix B (added on April 12 after Elizabeth's explanation of the artifact appeared in her 4/6/05 DailyKos diary), the artifact might explain why WPE was larger (-8.5) in "even" precincts than in Kerry strongholds (+0.3). However, it could not explain the larger WPE (-10.0) in Bush strongholds. In effect, they wrote, it made their case stronger by making the WPE pattern seem even more improbable: "These results would appear to lend further support to the "Bush Strongholds have More Vote-Corruption" (Bsvcc) hypothesis" (p. 27).
However, Elizabeth had a second and more critical insight. Even if the average differential response (the differing response rates of Kerry and Bush voters) were constant across categories of precincts, those differences would show random variation at the precinct level.
Consider this hypothetical example. Suppose Bush voters tend to be more likely to co-operate with a female interviewer, Kerry voters with a male interviewer. And suppose the staff included equal numbers of each. There would be no overall bias, but some precincts would have a Bush bias and some a Kerry bias. If more of the interviewers were men, you'd get an overall Kerry bias. Since the distribution of male and female interviewers would be random, a similar random pattern would follow in the distribution of differences in the response rates.
No real world poll is ever perfect, but ideally the various minor errors are random and cancel each other out. Elizabeth's key insight was to see that that this random variation would create another artifact, a phantom skew in the average WPE when tabulated by precinct partisanship.
Here's how it works: Again, a poll with no overall bias would still show random variation in both the response and selection rates. As a result, Bush voters might participate in the poll in some precincts at a greater rate than Kerry voters resulting in errors favoring Bush. In other precincts, the opposite pattern would produce errors favoring Kerry. With a large enough sample of precincts, those random errors would cancel out to an average of zero. The same thing would happen if we calculated the error in precincts where the vote was even. However, if we started to look at more partisan precincts, we would see a skew in the WPE calculation: As the true vote for the leading candidate approaches the ceiling of 100%, there would be more room to underestimate the leader's margin than to overestimate it.
If that description is confusing, the good news is that Elizabeth drew a picture. Actually, she did far better. She created a computational model to run a series of simulations of randomly generated precinct data - something financial analysts refer to as a Monte Carlo simulation. The pictures tell the whole story.
The chart that follows illustrates the skew in the error calculation that results under the assumption described above -- a poll with no net bias toward either candidate.
The black line shows the average "within precinct error" (mean WPE) for each of nine categories of partisanship. The line has a distinctive S-shape, which takes into account both of the artifacts that Elizabeth had observed. For precincts where Kerry and Bush were tied at 50%, the WPE averaged zero. In precincts where Kerry leads, the WPE calculation skewed positive (indicating an understatement of Kerry's vote), while in the Bush precincts, it skewed negative (an understatement of the Bush vote).
In the most extreme partisan precincts, the model shows the mean WPE line turning slightly back toward zero. Here, the first artifact that Elizabeth noticed (that mean WPE gets smaller as precinct partisanship increases) essentially overwhelms the opposite pull of the second (the effect of random variation in response rates).
Again, if these concepts are confusing, just focus on the chart. The main point is that even if the poll had no net bias, the calculation of WPE would appear to show big errors that are just an artifact of the tabulation. They would not indicate any problem with the count.
Now consider what happens when Elizabeth introduces bias into the model. The following chart assumes a net response rate of 56% for Kerry voters and 50% for Bush voters (the same values that E-M believes could have created the errors in the exit poll) along with the same pattern of random variation within individual precincts.
Under her model of a net Kerry bias in the poll, both the mean and median WPE tend to vary with partisanship. The error rates tend to be higher in the middle precincts, but as Elizabeth observes in her paper, "WPEs are greater for high Republican precincts than for high Democrat precincts."
Again, remember that the "net Kerry bias" chart assumes that Kerry voters are more likely to participate in the poll, on average, regardless of the level of partisanship of the precinct. The error in the model should be identical- and in Kerry's favor - everywhere. Yet the tabulation of mean WPE shows it more negative in the Republican direction.
The Model vs. the Real Data - Implausible?
Now consider one more picture. This shows the data for meanWPE and medianWPE as reported by Edison-Mitofsky, the input into the USCV "black box" that produced those highly "implausible" results.
Elizabeth used a regression function to plot a trend line for mean and median WPE. Note the similarity to the Kerry net bias chart above. The match is not perfect (more on that below) but the mean WPE is greater in the Republican precincts in both charts, and both show a divergence between the median and mean WPE in the heavy Republican precincts. Thus, the money quote from Elizabeth's paper (p. 21):
To the extent that the pattern in [the actual E-M data] shares a family likeness with the pattern of the modeled data... the conclusion drawn in the USCV report, that the pattern observed requires "implausible" patterns of non-response and thus leaves the "Bush strongholds have more vote-count corruption" hypothesis as "more consistent with the data", would seem to be unjustified. The pattern instead is consistent with the E-M hypothesis of "reluctant Bush responders," provided we postulate a large degree of variance in the degree and direction of bias across precinct types.
In other words, the USCV authors looked at the WPE values by partisanship and concluded they were inconsistent with Edison-Mitofsky's explanation for the exit poll discrepancy. Elizabeth's proof shows just the opposite: The patterns of WPE are consistent with what we would expect had Kerry voters been more likely to participate in the exit polls across the board. Of course, this pattern does not prove that differential response occurred, but it cuts the legs out from the effort to portray the Edison-Mitofsky explanation as inherently "implausible."
It is worth saying that nothing here "disproves" the possibility that some fraud occurred somewhere. Once again - and I cannot repeat this often enough - The question we are considering is not whether fraud existed but whether the exit polls are evidence of fraud.
The Promise of the Fancy Function
Her paper also raises some fundamental questions about "within-precinct error," the statistic used by Edison-Mitofsky to analyze what went wrong with the exit polls (p. 19):
[These computations] indicate that the WPE is a confounded dependent measure, at best introducing noise into any analysis, but at worst creating artefacts that suggest that bias is concentrated in precincts with particular degrees of partisanship where no such concentration may exist. It is also possible that other more subtle confounds occur where a predictor variable of interest such as precinct size, may be correlated with partisanship.
In other words, the artifact may create some misleading results for other cross tabulations in the original Edison-Mitofsky report. But this conclusion leads to the most promising aspect of Elizabeth Liddle's contribution: She has done more than suggest some theoretical problems. She has actually proposed a solution that will not only help improve the analysis of the exit poll discrepancy but may even help resolve the debate over whether the exit polls present evidence of fraud.
In her paper, Elizabeth proposed what she called an "unconfounded index of bias," an algebraic function that "RonK" (another DailyKos blogger that also occasionally comments on MP) termed a "fancy function." The function, derived in excruciating algebraic detail in her paper, applies a logarithmic transformation to WPE.
Don't worry if you don't follow what that means. Again, consider the pictures. When applied to her no-net bias scenario (the model of the perfect exit poll with no bias for either candidate), the mean of her "BiasIndex" plots a perfectly straight lines with both the mean and median at 0 (no error) at every level of partisanship (the error bars represent the standard deviation).
When applied to the "net Kerry bias" scenario, it again shows two straight lines, only this time both lines show a consistent bias in Kerry's favor across the board. When applied to the model data, the "fancy function" eliminates the artifact as intended.
Which brings us to what ultimately may be the most important contribution of Elizabeth's work. It is buried, without fanfare, near the end of her paper (pp. 20-21):
One way, therefore, of resolving the question as to whether "reluctant Bush responders" were indeed more prevalent in one category of precinct rather than another would be to compute a pure "bias index" for each precinct, rather than applying the formula to either the means or medians given, and then to regress the "bias index" values on categories or levels of precinct partisanship.
In other words, the USCV claim about the "plausibility" of derived response rates need not remain an issue of theory and conjecture. Edison-Mitofsky and the NEP could chose to apply the "fancy function" at the precinct level. The results, applied to the USCV models and considered in light of the appropriate sampling error for both the index and the reported completion rates, could help us determine once and for all, just how "plausible" the "reluctant Bush responder" theory is.
The upcoming conference of the American Association for Public Opinion Research (AAPOR) presents Warren Mitofsky and NEP with a unique opportunity to conduct and release just such an analysis. The final program for that conference, just posted online, indicates that a previously announced session on exit polls - featuring Mitofsky, Kathy Frankovic of CBS News and Fritz Scheuren of the National Organization for Research at the University of Chicago (NORC) -- has been moved to special lunch session before the full assembled AAPOR membership. I'll be there and am looking forward to hearing what they have to say.
Epilogue: The Reporting Error Artifact
There is one aspect of the pattern of the actual data that diverges slightly from Elizabeth's model. Although her model predicts that the biggest WPE values in the moderately Republican precincts (as opposed to Bush strongholds), the WPE in the actual data is greatest (most negative) in the precincts that gave 80% or more of their vote to George Bush.
A different artifact in the tabulation of WPE by partisanship may explain this minor variation. In January, an AAPOR member familiar with the NEP data suggested such an artifact on AAPOR's member only electronic mailing list. It was this artifact (not the one explained in Elizabeth's paper) that I attempted to explain (and mangled) in my post of April 8.
The second artifact involves human errors in the actual vote count as reported by election officials or gathered by Edison-Mitofsky staff (my mistake in the April 8 post was to confuse these with random sampling error). Reporting errors might result from a mislabeled precinct, a missed or extra digit, a mistaken digit (a 2 for a 7), a transposition of two sets of numbers or precincts. Errors of this sort, while presumably rare, could create very large errors. Imagine a precinct with a true vote of 79 to 5 (89%). Transpose the two numbers and (if the poll produced a perfect estimate) you would get a WPE of 156. Swap a "2" for the "7" in the winner's tally and the result would be a WPE of 30.
Since truly human errors should be completely random, they would be offsetting (have a mean of zero) in a large sample of precincts or in closely divided precincts (which allow for large errors in both directions). However, in the extreme partisan precincts, they create an artifact on a tabulation of WPE, because there is no room for extreme overestimation of the winner's margin. Unlike the differential response errors at the heart of Elizabeth's paper, however, these errors will not tend to get smaller in the most extreme partisan precincts. In fact, these errors would tend to have the opposite pattern, creating a stronger artifact effect in the most partisan precincts.
We can assume that such errors exist in the data because the Edison-Mitofsky report tells that they removed three precincts from the analysis "with large absolute WPE (112, -111, -80) indicating that the precincts or candidate vote were recorded incorrectly" (p. 34). Presumably, similar errors may remain in the data that are big enough to produce an artifact in the extreme partisan precincts (less than 80 but greater than 20).
I will leave it to wiser minds to disentangle the various potential artifacts in the actual data, but it seems to me that a simple scatterplot showing the distribution of WPE by precinct partisanship would tell us quite a bit about whether this artifact might cause bigger mean WPE in the heavily Republican precincts. Edison-Mitofsky and NEP could release such data without any compromise of respondent or interviewer confidentiality.
The USCV authors hypothesize greater "vote corruption" in Bush strongholds. As evidence, they point to the difference between the mean and median WPE in these precincts: "Clearly there were some highly skewed precincts in the Bush strongholds, although the 20 precincts (in a sample of 1250) represent only about 1.6% of the total" (p. 14, fn). A simple scatterplot would show whether that skew resulted from a handful of extreme outliers or a more general pattern. If a few outliers are to blame, and if similar outliers in both directions are present in less partisan precincts, it would seem to me to implicate random human error rather than something systematic. Even if the math is neutral on this point, it would be reasonable to ask the USCV authors to explain how outliers in a half dozen or so precincts out of 1,250 (roughly 0.5% of the total) somehow leave "vote corruption" as the only plausible explanation for an average overall exit poll discrepancy of 6.5 percentage points on the Bush-Kerry margin.
Endnotes [added 5/2/2005]:
The assumption here is not that the interviewers lied or deliberately biased their interviewers. Rather, the Edison-Mitofsky data suggest that a breakdown in random sampling procedures exacerbated a slightly greater hesitance by Bush voters to participate.
Let's consider the "reluctance" side of the equation first. It may have been about Bush voters having less trust in the news organizations that interviewers named prominently in their solicitation and whose logos appeared prominently on questionnaires and their materials. It might have been about a slightly greater enthusiasm by some Kerry voters to participate as a result of apparently successful efforts by the DNC to get supporters to participate in online polls.
Now consider the data presented in the Edison-Mitofsky report (pp. 35-46). They reported a greater discrepancy between the poll and the vote where:
- The "interviewing rate" (the number of voters the interviewer counts in order to select a voter to approach) was greatest
- The interviewer had no prior experience as an exit poll interviewer
- The interviewer had been hired a week or less prior to the election
- The interviewer said they had been trained "somewhat or not very well" (as opposed to "very well")
- Interviewers had to stand far from the exits
- Interviewers could not approach every voter
- Polling place officials were not cooperative
- Voters were not cooperative
- Poll-watchers or lawyers interfered with interviewing
- Weather affected interviewing
What all these factors have in common is that they indicate either less interviewer experience or a greater degree of difficulty facing the interviewer. That these factors all correlate with higher errors suggests that the random selection procedure broke down in such situations. As interviewers had a harder time keeping track of the nth voter to approach, they may have been more likely to consciously or unconsciously skip the nth voter and substitute someone else who "looked" more cooperative, or to allow an occasional volunteer that had not been selected to leak through. Challenges like distance from the polling place or even poor weather would also make it easier for reluctant voters to avoid the interviewer altogether.
Another theory suggests that the reluctance may have had less to do with how voters felt about the survey sponsors or about shyness in expressing a preference for Bush or Kerry than their reluctance to respond to an approach from a particular kind of interviewer. For example, the E-M report showed that errors were greater (in Kerry's favor) when interviewers were younger or had advanced degrees. Bush voters may have been more likely to brush off approaching interviewers based on their age or appearance than Kerry voters.
The problem with this discussion is that proof is elusive. It is relatively easy to conceive of experiments to test these hypotheses on future exit polls, but the data from 2004 - even if we could see every scrap of data available to Edison-Mitofsky - probably does not facilitate conclusive proof. As I wrote back in January, the challenge in studying non-respondents is that without an interview we know little about them. Back to text
The refusal, miss and completion rates upon which USCV places such great confidence were based on hand tallies. Interviewers were supposed to count each exiting voter and approach the "nth" voter (such as the the 4th or the 6th) to request that they complete an interview (a specific "interviewing" rate was assigned to each precinct). Whenever a voter refused or whenever an interviewer missed a passing "nth" voter, the interviewer was to record the gender, race and approximate age of each on a hand tally sheet. This process has considerable room for human error.
For comparison consider the way pollsters track refusals in a telephone study. For the sake of simplicity, let's imagine a sample based on a registration list of voters in which every selected voter has a working telephone and every name on the list will qualify for the study. Thus, every call will result in either a completion, a refusal or some sort of "miss" (a no answer, a busy signal, an answering machine, etc.). The telephone numbers are randomly selected by a computer beforehand, so the interviewer has no role in selecting the random name from the list. Another computer dials each number, so once the call is complete it is a fairly simple matter to ask the interviewer to enter a code with the "disposition" of each call (refusal, no-answer, etc). It is always possible for an interviewer to type the wrong key, but the process is straightforward and any such mistakes should be random and rate.
Now consider the exit poll. The interviewer - and they almost always worked alone - is responsible for counting each exiting voter, approaching the nth voter (while continuing to count those walking by), making sure they deposit their completed questionnaire in a "ballot box", and also keep a tally of misses and refusals.
What happens during busy periods when the interviewer cannot keep up with the stream of exiting voters? What if they are busy trying to persuade one selected voter to participate while another 10 exit the polling place? If they lose track, they will need to record their tally of refusals (including gender, race and age) from memory. Consider the interviewer facing a particularly high level or refusals, especially in a busy period. The potential for error is great. Moreover, the potential exists for under-experienced and overburdened interviewers to systematically underreport their refusals and misses compared to interviewers with more experience or facing less of a challenge. Such a phenomenon would artificially inflate the completion rates where we would expect to see lower values.
Consider also what happens under the following circumstances: What happens when an interviewer - for whatever reason - approaches the 4th or the 6th voter when they were supposed to approach the 5th. What happens when an interviewer allows a non-selected voter to "volunteer?" What happens when a reluctant voter exits from a back door to avoid being solicited? The answer in each case, with respect to the non-response tally, is nothing. Not one of these deviations from random selection results in a greater refusal rate, even though all could exacerbate differential response error. So the completion rates reported by Edison-Mitofsky probably omit a lot of the "differential non-response" that created the overall exit poll discrepancy.
It is foolish, under these circumstances, to put blind faith into the reported completion rates. We should be suspicious of any analysis that comes to conclusions about what is "plausible" without taking into account the possibility of the sorts of human error discussed above. Back to text
April 27, 2005
ABC/Washington Post on Judicial Nominees
The conservative wing of the blogosphere took great exception yesterday to the latest survey from the Washington Post and ABC News that gave front page play to the assertion that "a strong majority of Americans oppose changing the rules to make it easier for Republican leaders to win confirmation of President Bush's court nominees." The complaints fell into two categories, (1) that the sample was unrepresentative and (2) that the questions "changing the rules" was biased. MPs quick take is that the former complaints are largely unfounded, the latter debatable. Let's take a closer look.
1) Biased sample? Our friend Gerry Daly
es (of Dalythoughts) nicely summarized the first grievance [though as he points out in the comments section, he did not endorse it]: ]
The Ankle Biting Pundits , Erick at Red State and Powerline have all noted (hat tip to Michelle Malkin that while the 2004 exit polls showed that the parties were at parity among voters, the sample in this poll is not; it includes 35% Democrats and 28% Republicans- a 7 point advantage for Democrats.
The problem with this complaint is that ABC News and the Washington Post -- like most polling organizations -- surveys all American adults, not just registered or likely voters. The voting population is slightly more Republican than the population of all adults. Screening for voters is appropriate in a pre-election survey intended to track the campaign or forecast the outcome, but a survey of what "Americans" think ought to survey, well, all Americans. Even if you disagree, the issue is not one of "bias" or "over-representation" but of a difference in the population surveyed.
Among all adults, as opposed to registered or likely voters, most survey organizations have shown a slight Democratic advantage in party ID over the last year, consistent with the ABC/Post results. I put together the following table that averaged data from 2004 and year-to-date 2005 when available: The surveys from CBS/New York Times, Harris, the Pew Research Center and Time/SRBI all show a Democratic advantage of two to six points. Gallup (subscription required) is the exception, showing party ID at parity.
More to the point is this sentence in the ABC analysis:
Thirty-five percent of respondents in this survey identify themselves as Democrats, 28 percent as Republicans, about the same as the 2004 and 2005 averages in ABC/Post polls. It was even on average, 31 percent-31 percent, in 2003 [emphasis added].
If anyone from ABC or the Post is reading, it would be helpful to see those averages from 2004 and 2005. Nonetheless, considering the ABC/Post poll's +/-3% sampling error, the party ID results are within range of the results for the other surveys from 2005 presented above (with the exception of Gallup's 35% GOP number), though they do look a point or two more Democratic than the average of the other surveys.
If we were confident that this small difference resulted from random chance or some sort of sample bias aloine, we would want the ABC/Post pollsters to weight their data to correct it. The problem is that the difference could be the result of a slight variations in question wording, in the content of earlier questions that might affect responses the party ID question the end of the survey, or perhaps they reflect a small real but momentary change in party identification. If the difference is just about sampling error or sample bias, weighting could make the survey more representative. If the difference is about any of the other issues, weighting would make it worse.
The irony of all this -- one likely not lost on other pollsters -- is that the Washington Post enabled this criticism by breaking with past practice and putting out a PDF summary that included complete results not only for party identification and ideology, but also for the full list of demographics. MP commends Richard Morin and the Washington Post for taking this step, even though it seems to be bringing them only grief.
Yes, consumers of poll deserve this level of transparency. Yes, it is appropriate to ask tough questions about how well any poll represents the nation. But leaping to the conclusion that the sample composition is "ridiculously bad" (Ankle Biting Pundits) or that it shows "egregious" bias (Powerline) is just flat wrong.
2) Biased question? - The second category of complaint took issue with the wording and context of the question that was the focus of the coverage: "Would you support or oppose changing Senate rules to make it easier for the Republicans to confirm President Bush's judicial nominees?"
Judicial filibuster is an example of the type of issue that makes pollsters lives miserable. The underlying issue is both complex and remote. Few Americans are well informed about the procedures and rules of the Senate, and few have been following the issue closely (only 31% tell robo-pollster Scott Rasmussen they are following stories on the judicial nominees "very closely"). So true "public opinion" with respect to judicial filibusters is largely unformed. When we present questions about judicial nominees in the context of a survey interview, many respondents will form an opinion on the spot. Results will thus be very sensitive to question wording. No single question will capture the whole story, yet every question inevitably "frames" the issue to some degree.
To MP, the most frustrating bias in media coverage of polling -- be it mainstream or blog -- is the pressure to find a settle on a single question as the ultimate measure of "public opinion" on any issue. In a sense, public opinion about issues like the judicial filibuster is inherently hypothetical. Many Americans, perhaps most, lack a pre-existing opinion. If we want to know how Americans will react to some future development (or whether they will react at all), no single question can tell us what we need to know.
The best approach in situations like these is to follow the advice of our old friend, Professor M:
The answer is NOT to find a single poll with the "best" wording and point to its results as the final word on the subject. Instead, we should look at ALL of the polls conducted on the issue by various different polling organizations. Each scientifically fielded poll presents us with useful information. By comparing the different responses to multiple polls -- each with different wording -- we end up with a far more nuanced picture of where public opinion stands on a particular issue. If we can see through such comparisons that stressing different arguments or pieces of information produces shifts in responses, then we have perhaps learned something
So what can we learn from different polls on this issue? The PollingReport has a one page summary that includes most recent polling on the issue (including survey dates and sample sizes):
In mid-March, Newsweek found 32% approved and 57% disapproved changing the rules regarding filibusters with the following question:
U.S. Senate rules allow 41 senators to mount a filibuster -- refusing to end debate and agree to vote -- to block judicial nominees. In the past, this tactic has been used by both Democrats and Republicans to prevent certain judicial nominees from being confirmed. Senate Republican leaders -- whose party is now in the majority -- want to take away this tactic by changing the rules to require only 51 votes, instead of 60, to break a filibuster. Would you approve or disapprove of changing Senate rules to take away the filibuster and allow all of George W. Bush's judicial nominees to get voted on by the Senate?
At the beginning of April, the NBC News/Wall Street Journal poll found 40% who wanted to eliminate the filibuster and 50% wanted to maintain it, when they asked this question:
As you may know, the president of the United States is a Republican and Republicans are the majority party in both houses of Congress. Do you think that the Republicans have acted responsibly or do you think that they have NOT acted responsibly when it comes to handling their position and allowing full and fair debate with the Democrats?
Then there is this week's ABC/Washington Post survey that found 26% supporting a rule change to make it easier for Bush to win confirmation of his judicial appointees and 66% opposed. The ABC question had two parts:
"The Senate has confirmed 35 federal appeals court judges nominated by Bush, while Senate Democrats have blocked 10 others. Do you think the Senate Democrats are right or wrong to block these nominations?" 48% said "right," 36% said wrong, 3% both and 13% were unsure
"Would you support or oppose changing Senate rules to make it easier for the Republicans to confirm Bush's judicial nominees?" - 26% support, 66% oppose, 8% unsure.
In an online survey, Rasmussen Reports asked a national sample several different questions. Unfortunately, they did not release the verbatim language. The following comes from language in their online release:
"Forty-five percent (45%) of Americans believe that every Presidential nominee should receive an up or down vote on the floor of the Senate. That's down from 50% a month ago."
"When asked if Senate rules should be changed to give every nominee a vote, 56% say yes and 26% say no. A month ago, those numbers were 59% and 22% respectively"
The Republican polling firm Strategic Vision asked the following questions this past week of sample of registered voters in Florida:
Do you approve or disapprove of a Republican plan in the United States Senate to limit Democratic filibustering of judicial nominations and allow a vote on the nominations? Florida registered voters: Approve 44%, Disapprove 33%, Undecided 23%.
Do you approve or disapprove of Democratic filibusters of President Bush's judicial nominations in the United States Senate? Florida registered voters: Approve 28%, Disapprove 57%, Undecided 15%
One thing largely missing in the questions asked by public pollsters is a better sense of how informed and engaged Americans are in this issue. So far, only Rasmussen has asked, "how closely have you been following the issue?" Unless I've missed it, no one has asked for a rating of the importance of the issue as compared to issues like health care, Social Security, Terrorism, Iraq, etc.
In the same vein, MP wishes that a public poll would ask Americans an open-ended question about this issue. It would first ask, "have you heard anything about a controversy involving President Bush's judicial nominations?" Those who answer yes would then get an open-ended follow-up: "What specifically have you heard?" The answers would help show how many have pre-existing opinions that demonstrate worry about conservative nominees or about the President Bush has getting his nominees confirmed.
MP does not agree that the question asked by the ABC/Washington Post poll is inherently biased: "Would you support or oppose changing Senate rules to make it easier for the Republicans to confirm Bush's judicial nominees?" There is much I like about this question: It is clear, concise and easy to understand and interpret because it avoids the use of often unfamilar terms like "filibuster."
The problem is -- and here the conservative critics have a point -- it is just one question and it does reflect one particular framing of the issue. As Ramesh Ponnuru points out, there is another question they could have asked that is equally concise and clear: "Would you support or oppose changing Senate rules so that judges can be confirmed by majority vote?" We might take Ponnuru's suggestion a step further and ask whether rules "that make it easy for a minority of Senators to block a nomination even when majority of the Senate supports it?"
Different questions may produce greater support for the Republican position, as the various results presented above imply. Understanding public opinion with respect to judicial nominees is not about not about deciding which question is best, or whether any one question alone is biased. It is about measuring all attitudes, even the ones that conflict, and coming to a greater understanding of what it all means. The answers may be contradictory, but sometimes, so is public opinion.
[minor typos corrected]
April 25, 2005
Disclosing Party ID: LA Times
Today I have another response from a public pollster regarding the disclosure of party identification. I have been asking public pollsters that do not typically disclose the results for the party identification to explain their policy. Today we hear from Susan Pinkus, director of the Los Angeles Times poll:
My predecessors usually did not release this information in the press releases unless it was requested and I just followed the precedent. However, at times, the Times Poll has published party ID figures in poll stories if it was part of the overall analysis. If someone requests it, of course, we are more than happy to give it to them. Having said that, the party ID results were so much in the headlines last year, and so contested by the campaigns (depending if they liked the results of that particular poll or not) that I will probably start putting those figures in the poll's press release -- when it is called for (i.e., during election years or is relevant to the survey's analysis). Party ID is asked in national polls because of the obvious -- not every state is registered by party. In California, however, voters have to register by party or declined-to-state. In state and local Ca. races, we usually only ask registered voter question and not party ID.
Thank you LA Times!
To confirm the willingness of the Times to release party numbers on request: Those who follow this topic closely will recall that in June 2004 an LA Times poll that showed John Kerry leading George Bush came under attack by Matthew Dowd of the Bush campaign. He told ABC's The Note that the poll was "a mess" because it "is too Democratic by 10 to 12 points" (proving that complaints about party identification do not always come from Democrats). Pinkus responded with a statement and a release of party identification for all polls going back to September 2001.
There is one more public pollster that I asked for a statement that has not yet responded. I'll post that if and when I receive it.
Party Disclosure Archive (on the jump)
- Pew Research Center
- Fox News/Opinion Dynamics
- American Research Group
- ABC News/Washington Post
- How to interpret shifts in Party ID
April 22, 2005
The Hotline Asks Pollsters: What Validates a Poll?
This week, the hard working folks at The Hotline, the daily online political news summary published by the National Journal, did a remarkable survey of pollsters on the question of how they check their samples for accuracy. The asked virtually every political pollster to answer these questions: "What's the first thing you look at in a survey to make sure it's a good sample? In other words -- what validates a poll sample for you?" They got answers from six news media pollsters and thirteen campaign pollsters (including MP and his partners).
Now, MP readers are probably not aware of it, since few can afford a subscription to Washington's premiere political news summary, but The Hotline has been closely following MP's series on disclosure of Party ID. In fact they have reproduced much of the series almost in full, with all the requisite links (for which we are appropriately grateful). In fact, we believe it was the Washington Post polling director Richard Morin's reference to party identification as a "diagnostic question" in his answer to MP that inspired The Hotline's poll of pollsters about how they validate polls. Thus, we asked the-powers-that-be at the National Journal to grant permission to reproduce their feature, and they have kindly agreed.
The responses of the various pollsters are reproduced on the jump page. Thank you National Journal!
(Under the circumstances, MP doesn't mind shameless shilling for The Hotline, especially since he is a regular reader: Their "Poll Track" feature is one of the most comprehensive and useful archives of political polling results available. It's just a shame they don't offer a more affordable subscription rate to a wider audience).
As the Hotline editors note, most pollsters listed a series of demographic and attitudinal questions that they tend to look at in evaluating a poll (particularly gender, age, race and party ID). However, a few themes deserve some emphasis:
- A point worth amplifying: Several pollsters - especially Pew's Andrew Kohut, ABC's Gary Langer and Democrat Alan Secrest - stressed that the procedures used to draw the sample and conduct the survey are more important to judgments about quality than the demographic results.
- Ironically, there was a difference in the way the pollsters heard and answered the Hotline's questions (next time, pre-test!): Some described how they evaluate polls done by other organizations; some (including MP and his partners) described how we evaluate our own samples.
- Although most described what factors they look at, few went on to describe what they do when a poll (either theirs or someone else's) fails their quality check. Do they weight or adjust the results to correct the problem? Leave it as is, but consider the errant measure in their analysis? Ignore the survey altogether?
There is much food for thought here. Like the Hotline editors, MP would like to know what you think. After reading over the comments below, please take a moment to leave a comment: Of all the suggestions made, what information do you want to know about a poll? More broadly, what other questions would you like to ask the pollster consultants about how we do our work?
See the complete responses from pollsters as published in yesterday's Hotline on the jump page.
- MWR Strategies (R) pres. Michael McKenna: "From a technical perspective, we look at demographic breakouts (age, gender, region, etc.) and make sure they are in the ballpark. Then we look at ideological/partisan breaks ... Then finally, and perhaps most importantly, we look at the responses themselves and sort of give them the real-world test -- does that set of answers conform to the other things I know about the world? ... It seems to me that the trick to samples is not being excessively concerned about one set of survey results, but rather to look at the results of all surveys on a given topic. It is pretty rare for a single deficient sample to twist the understanding of an issue or event, in part because everyone (I think) in the business looks at each other's results.
- Hill Research (R) principal David Hill: "I always look at the joint distributions by geography, party, age, gender, and race/ethnicity. Based on prior election databases, we know the correct percentage of the sample that should be in each combination of these categories. ... We try to achieve the proper sample distribution through stratified sampling and the imposition of quotas during the interviewing process, but sometimes it still isn't right because of quirks in cooperation rates, forcing us to impose weights on the final results."
- Ayres, McHenry & Associates (R) VP Jon McHenry: "When we get data from the calling center, the first thing I check is the racial balance and then party ID. Variation in either of those two numbers can and will affect the other numbers throughout the survey. Looking at someone else's survey, ... I'll also see how long the survey has been in the field. You can do a good survey in two days, but it's tricky. It's pretty tough in one day, which is part of the reason tracking nights can bounce around. ... But ... knowing you're in the ballpark with party id is a pretty good proxy for seeing that you have a balanced sample."
- Public Opinion Strategies (R) partner Glen Bolger: "If it's a party registration state/district, I check party registration from the survey against actual registration. I also look closely at ethnicity to ensure proper representation of minorities. We double check region and gender quotas to make sure those were followed. We check age to ensure seniors are not overrepresented.
- Probolsky Research (R) pres. Adam Probolsky: "If the poll is about a specific election, I look at whether the respondents are likely voters. If not, it hard to take the results seriously. If it is a broader public policy or general interest poll, I look to see if the universe of respondents matches the universe of interested parties, stated more plainly, that the population that is being suggested as having a certain opinion is well accounted for in the universe of respondents."
- Moore Information (R) pres. Bob Moore: "Name of pollster and partisanship of sample."
- Gallup editor-in-chief Frank Newport: "Technically there are a wide variety of factors which combine to make a "good sample." As an outside observer ... I focus first and foremost on the known integrity and track record of the researcher/s involved. If it's a research organization unknown to me, the "good sample" question becomes harder to answer without more depth investigation -- even with the sample size, target population, dates of interviewing information usually provided. Parenthetically, question wording issues are often more important in evaluating poll results than the sample per se."
- ABC News dir. of polling Gary Langer: "A good sample is determined not by what comes out of a survey but what goes into it: Rigorous methodology including carefully designed probability sampling, field work and tabulation procedures. If you've started worrying about a "good sample" at the end of the process, it's probably too late for you to have one. When data are completed, we do check that unweighted sample balancing demos (age, race, sex and education) have fallen within expected norms. But this is diagnostic, not validative."
- Schulman, Ronca & Bucuvalas, Inc. pres. Mark Schulman: "When you've been in the polling business as long as I have, you've learned all the dirty tricks and develop an instinct for the putrid polls. 'Blink blink,' as Gladwell calls it. Blink -- who sponsored the poll, partisans or straights? Blink -- are the questions and question order leading the respondent down a predictable path? Blink -- does the major finding just by chance (!) happen to support the sponsor's position? This all really does happen in a blink or two. The dirty-work usually does not involve the sample itself."
- Pew Research Center pres. Andrew Kohut: "There is no one way to judge a public opinion poll's sample. First thing we look for was whether the sample of potential respondents was drawn so that everyone in the population had an equal, or at least known chance of inclusion. Secondly, what efforts were made to reach potential respondents -- Were there call backs -- how many -- over what period of time? And what measures were used to convince refusals to participate? ... How does the distribution of obtained sample compare to Census data? We will also look at how the results of the survey line up on trend measures that tend to be stable. If a poll has a number of outlier findings on such questions it can set off a warning bell for us. I want to add that the major source of error in today's polls is more often measurement error than sampling error or bias. When I see a finding that doesn't look right, I first look to the wording of the question, and where the question was placed in the questionnaire. The context in which a question is asked often makes as much difference as how the question is worded."
- Zogby Int'l pres. & CEO John Zogby: "I go right to the harder to reach demographics. Generally, that means younger voters, Hispanics, African Americans. They are usually under-represented in a typical survey sample, but if their numbers are far too low, then the sample is not usable. I also look at such things as union membership, Born-Agains, and education. If any of these are seriously out of whack then there is a problem."
- Research 2000 pres. Del Ali: "There are two things right off the top: the firm that conducted the survey and for whom it was conducted for. If it is a partisan firm conducted for a candidate or a special interest group, the parameters and methodology become critical to examine. However, regardless of the firm or the organization who commissioned the poll, the most important components to look for are: Who was sampled (registered voters, likely voters, adults, etc.), Sample size/margin for error (at least 5% margin for error), Where was poll conducted (state wide, city, county, etc.), What was asked in the poll (closed ended/open ended questions), When was a horse race question asked in the poll. Bottom line, I take all candidate and policy group polls with a grain of salt. The independent firms who poll for media outlets are without question unbiased and scientifically conducted."
- Greenberg Quinlan Rosner Research (D) VP Anna Greenberg: "It is hard to tell because you never really know how people develop their samples." Mentioning sample size, partisan breakdown and field dates, Greenberg looks "for ... how accurately it represents the population it purports to represent (e.g., polls on primaries should be of primary voters). ... You can also look at the demographic and political (e.g., partisanship) characteristics to make sure the sample accurately represents the population. It is rarely reported, but it would be helpful to know how the sample frame is generated (e.g., random digit dial, listed sample) so you can get a sense of the biases in the data. But none of these measures really help you understand response rates, completion rates or response bias, which arguably have as big an impact on sample quality as any of the items listed above. It is important to note that ALL samples have bias, it's just a matter of trying to reduce it and understand it."
- Anzalone-Liszt Research (D) partner Jeff Liszt: In addition to mentioning the importance of sample size, Liszt looks at "over how long (or short) a period the interviews were conducted. Very large samples taken in one or two nights sometimes raise a red flag because of the implications for the poll's call-back procedures. The challenge is that public polling is only very selectively public. Sampling procedure and weighting are critical, yet opaque processes, about which very few public polls provide any information. ... This often leaves you with little more than a smell test. ... Often, the best you can do is consider whether a poll is showing movement relative to its last reported results, whether other public polls are showing the same movement, and whether there is any apparent shift in demographics and party identification from previous results."
- Global Strategy Group (D) pres. Jefrey Pollock: "The first question we ask ourselves is 'what is driving voter preference?' In an urban race like NYC or LA, race is frequently the primary determinant, and therefore the most important element of ensuring a valid sample. In addition, there is a high frequency of undersampling of minorities in many surveys. In an election where race is not a leading determinant, we look first to ensure that the survey matches up to probable regional turnout."
- Cooper & Secrest Associates (D) partner Alan Secrest: "Proper sampling is the absolute bedrock of accurate and actionable polling. ... A correctly-drawn poll sample -- in concert with properly focused screen questions (the two cannot be divorced...especially in primary polling) -- should yield a representative look at a given electorate. Sadly, such methodological rigor too often is not the case." Nothing that "there is no 'single' criterion" to use in judging a poll, Secrest does point out the importance of testing dates and demographic, Secrest lists his key criteria: "Does the pollster have a track record for accurate turnout projection, winning, and being willing to report unvarnished poll results to the client?; what turnout model was used to distribute the interviews?; is the firm using an adequate sample size for the venue, especially if subgroup data is being released?; 'consider the source'...some voter list firms are perennially sloppy or lazy in the maintenance of their product; were live interviewers used? centrally located? appropriate accents? Obviously, question design and sequence matter as well."
- A joint response from all the partners at Bennett, Petts & Blumenthal (D), Anna Bennett, David Petts and Mark Blumenthal: The most valid data we have on "likely voters" involves their geographic distribution in previous comparable elections. Like most political pollsters, we spend a lot of time modeling geographic turnout patterns, and stratify our samples by geography to match those patterns. We also look at whatever comparable data is available, including past exit polls, Census estimates of the voting population, other surveys and voter file counts. We examine how the demographic and partisan profile of our survey compares to the other data available, but because there are often big differences in the methodologies and sources, we would use these to weight data in rare instances and with extreme caution."
- Hamilton Beattie & Staff (D) pres. David Beattie: "There is not 'one thing' that validates a poll -- the following are the first things we always look at: 1)what is the sample size, 2)what were the screening questions, 3)what is the racial composition compared to the electorate and were appropriate languages other than English used, 4)what is the gender, 5)what is the age breakdown (looking especially to make sure it is not too old) 6)what is the party registration or identification."
- Decision Research (D) pres. Bob Meadow: "The first thing we do is compare the sample with the universe from which it is drawn. For samples from a voter file, we compare on party, gender and geography. For random digit samples, we compare on geography first."
© 2005 by National Journal Group Inc., 600 New Hampshire Avenue, NW, Washington DC 20037. Any reproduction or retransmission, in whole or in part, is a violation of federal law and is strictly prohibited without the consent of National Journal. All rights reserved.
April 18, 2005
What the USCV Report Doesn't Say (Part II)
Mea Culpa Update - In Part I of this series I erred in describing an artifact in the data tabulation provided in the report provided by Edison Research and Mitofsky International. The artifact (which was first described by others back in January) results, not from random sampling error, but from all other randomly distributed errors in the count data obtained by Edison-Mitofsky.
Over the last week, I have seen evidence that such an artifact exists and behaves as I described in Part I. I will have more to report on this issue very soon. For now, I can only report that others are hard at work on this issue and their findings are very intriguing. As a certain well known website sometimes says, "Developing..."
[Update 4/21 - The development I am referring to is the work of DailyKos diarist (and occasional MP commenter) Febble," summarized in this blog post (and a more formal draft academic paper). She reaches essentially the same conclusion that I did in Part I:
It would seem that the conclusion drawn in the [US Count Votes] report, that the pattern observed requires "implausible" patterns of non-response, and thus leaves the "Bush strongholds have more vote-vount corruption" hypothesis as "more consistent with the data" is not justified. The pattern instead is consistent with the [Edison-Mitofsky] hypothesis of widespread "reluctant Bush responders" - whether or not the "political company" was "mixed".
Febble's paper is undergoing further informal "peer review" and -- to her great credit -- she is considering critiques and suggestions from the USCV authors. I will have much more to say about this soon. Stay tuned...]
In the meantime, let me continue with two different issues raised in the US Count Votes (USCV) report and some general conclusions about the appropriate standard for considering fraud and the exit polls.
4) All Machine Counts Suspect? - The USCV report makes much of a tabulation provided by Edison-Mitofsky showing very low apparent rates of error in precincts that used paper ballots. Paper ballot precincts, they write, "showed a median within-precinct-error (WPE) of -0.9, consistent with chance while all other technologies were associates with high WPE discrepancies between election and exit poll results." Their Executive Summary makes the same point illustrated with the following chart that highlights the low WPE median in green.
While the report notes that the paper ballots were "used primarily in rural precincts," and earlier in the report notes that such precincts amounts to "only 3% of sampled precincts altogether," it fails to point out to readers why those two characteristics call the apparent contrast between paper and other ballots into question.
What the USCV report does not mention is the finding by the Edison-Mitofsky report that the results for WPE by machine type appear hopelessly confounded by the regional (urban or rural) distribution of voting equipment. The E-M report includes a separate table (p. 40) that shows higher rates of WPE in urban areas for every type of voting equipment. Virtually all of the paper ballot precincts (88% -- 35 of 40) were in rural areas while two thirds of the machine count precincts (68% - 822 of 1209) were in urban areas. E-M concludes:
These errors are not necessarily a function of the voting equipment. They appear to be a function of the equipment's location and the voter's responses to the exit poll at precincts that use this equipment. The value of WPE for the different types of equipment may be more a function of where the equipment is located that the equipment itself (p. 40).
The USCV report points out that Edison/Mitofsky "fail to specify P-values, significance levels, or the statistical method by which they arrived at their conclusion that voting machine type is not related to WPE." Here they have a point. The public would have been better served by a technical appendix that provided measures of significance. However, public reports (as opposed to journal articles) frequently omit these details, and I have yet to hear a coherent theory for why Edison-Mitofsky would falsify this finding.
Nonetheless, USCV want us to consider that "errors in for all four automated voting systems could derive from errors in the election results." OK, let's consider that theory for a moment. If true, given the number of precincts involved, it implies a fraud extending to 97% of the precincts in the United States. They do not say how that theory squares with the central contention of their report that "corruption of the official vote count occurred most freely in districts that were overwhelmingly Bush strongholds" (p. 11). Their own estimates say the questionable strongholds are only 1.6% of precincts nationwide (p. 14, footnote). Keep in mind that their Appendix B now concedes that pattern of WPE by precinct is consistent with "a pervasive and more or less constant bias in exit polls because of a differential response by party" in all but the "highly partisan Bush precincts" (p. 25).
Presumably, their theory of errors derived from "all four automated voting systems" would also include New Hampshire, the state with fourth highest average WPE in the country (-13.6), where most ballots were counted using optical scan technology. In New Hampshire, Ralph Nader's organization requested a recount in 11 wards, wards specifically selected because their "results seemed anomalous in their support for President Bush." The results? According to a Nader press release:
In the eleven wards recounted, only very minor discrepancies were found between the optical scan machine counts of the ballots and the recount. The discrepancies are similar to those found when hand-counted ballots are recounted.
A Nader spokesman concluded, "it looks like a pretty accurate count here in New Hampshire."
5) More Accurate Projection of Senate Races? - The USCV reports that in 32 states "exit polls were more accurate for Senate races than for the presidential race, including states where a Republican senator eventually won" (p. 16). They provide some impressive statistical tests ("paired t-test, t(30) = -.248, p<.02) if outlier North Dakota is excluded") but oddly omit the statistic on which those tests were based.
In this case, the statistically significant difference is nonetheless quite small. The average within precinct error (WPE) in the Presidential race in those states was -5.0 (in Kerry's favor), the average WPE favoring the Senate Democrat was -3.6 (See Edison-Mitofsky, p. 20). Thus, the difference is only 1.4 percentage points -- and that's a difference on the Senate and Presidential race margins.
The USCV authors are puzzled by this result since "historic data and the exit polls themselves indicate that the ticket-splitting is low." On this basis they conclude it "reasonable to expect that the same voters who voted for Kerry were the mainstay of support Democratic candidates" (p. 16).
That expectation would be reasonable if there were no cross-over voting at all, but the rates of crossover voting were more than adequate to explain a 1.4 percent difference on the margins. By my own calculations, the average difference in the between the aggregate margins for President and US Senate in those 32 states was 9 percentage points. Nearly half of the states (14 of 32) showed differences on the margins of 20 percentage points or more.
And those are only aggregate differences. The exit polls themselves are the best available estimates of true crossover voting. Even in Florida, where the aggregate difference between the Presidential and Senate margins was only 4.0 percentage point, 14% of those who cast a ballot for president were "ticket splitters," according to the NEP data.
Historically low or not, these rates of crossover voting are more than enough to allow for a difference of the margin of only 1.4 percentage points between Presidential and Senate votes.
The consistency in the average WPE values for the Senate races is greater than the slight difference with the presidential race. The exit polls gave erred for the Democratic candidate across the board, including races featuring a Democratic incumbent (-3.3), a Republican incumbent (-5.2) or an open seat (-2.2).
Conclusion: The Burden of Proof?
Is it not statistically, procedurally, and (dare I say) scientifically sound to maintain skepticism over a scientific conclusion until that conclusion has been verified? Is a scientist, or exit-pollster, not called upon, in this scenario, to favor the U.S. Count Votes analysis until it is proven incorrect?
The question raises one of the things that most troubles me about the way much of the argument about the exit polls and vote fraud turn the scientific method on its head. It is certainly scientifically appropriate to maintain skepticism over a hypothesis until it has been verified and proven. That is the essence of science. What does not follow is why the USCV analysis should be favored until "proven incorrect." When the USCV report argues that "the burden of proof should be to show that election process is accurate and fair" (p. 22), it implies a similar line of reasoning: We should assume that the exit polls are evidence of fraud unless the pollsters can prove otherwise.
Election officials may have a duty to maintain faith in the accuracy and fairness of the election process, but what USCV proposes is not a reasonable scientific or legal standard for determining whether vote fraud occurred. The question (or hypothesis) we have been considering since November is whether the exit polls are evidence of fraud or systematic error benefiting Bush. In science we assume no effect, no difference or in this case no fraud until we have sufficient evidence to prove otherwise, to disprove the "null hypothesis." In law -- and election fraud is most certainly a crime -- the accused are innocent until proven guilty. The "burden of proof" only shifts if and when the prosecutor offers sufficient evidence to convict.
In this sense, good science is inherently skeptical. "Bad science," as the online Wikipedia points out, sometimes involves "misapplications of the tools of science to start with a preconceived belief and filter one's observations so as to try to support that belief. Scientists should be self-critical and try to disprove their hypotheses by all available means."
Whatever its shortcomings, the Edison-Mitofsky report provided ample empirical evidence that:
- The exit polls reported an average completion rate of 53%, which allowed much room for errors in the poll apart from statistical sampling error.
- The exit polls have shown a consistent "bias" toward Democratic candidates for president since 1988, a bias that was nearly as strong in 1992 as in 2004.
- In 2004, the exit polls showed an overall bias toward both John Kerry and the Democratic candidates for Senate.
- Errors were very large and more or less constant across all forms of automated voting equipment and tabulation in use in 97% of US precincts
- The exit poll errors strongly correlated with measures of interviewer experience and the precinct level degree of difficulty of randomly selecting a sample of voters. In other words, they were more consistent with problems affecting the poll than problems affecting the count.
Given this evidence, the hypothesis that the exit poll discrepancy was evidence of fraud (or at least systematic error in the count favoring Republicans) requires one to accept that:
- Such fraud or systematic error has been ongoing and otherwise undetected since 1988.
- The fraud or errors extend to all forms of automated voting equipment used in the U.S.
- Greater than average errors in New Hampshire in 2004 somehow eluded a hand recount of the paper ballots used in optical scan voting in precincts specifically chosen because of suspected anomalies.
Add it all up and "plausibility" argues that exit poll discrepancy is a problem with the poll, not a problem with the count. Yes, Edison-Mitofsky proposed a theory to explain the discrepancy (Kerry voters were more willing to be interviewed than Bush voters) that they cannot conclusively prove. However, the absence of such proof does not somehow equate to evidence of vote fraud, especially in light of the other empirical evidence.
Of course, this conclusion does not preclude the possibility that some fraud was perpetuated somewhere. Any small scale fraud would have not been large enough to be detected by the exit polls, however, even it if amounted to a shift of a percentage point or two in a statewide vote tally. The sampling error in the exit polls makes them a "blunt instrument," too blunt to detect such small discrepancies. Again, the question we are considering is not whether fraud existed but whether the exit polls are evidence of fraud.
You don't need to take my word on it. After the release of the Edison-Mitofsky report, the non-partisan social scientists in the National Research Commission on Elections and Voting in the NRCEV examined the exit poll data and concluded (p. 3):
Discrepancies between early exit poll results and popular vote tallies in several states may have been due to a variety of factors and did not constitute prima facie evidence for fraud in the current election.
[4/18 & 4/19 - Minor typos and grammer corrected, links added]
April 13, 2005
Disclosing Party ID: Qunnipiac
Today, another installment in a series of responses from pollsters who do not typically report party identification as a part of their survey releases. Today's comes from Doug Schwartz of the Quinnipiac University Poll:
Since party identification distribution is rarely requested, we do not include it in our press releases. It has been and continues to be our policy to provide party identification, as well as other demographic information, to anyone requesting that information.
Thanks for the reply, Doug. I can vouch for this policy. When MP had questions about a poll done by Quinnipiac last month, Quinnipiac responded with demographic and party data in a matter of hours. Nonetheless, as noted yesterday, MP continues to urge pollsters to disclose more about their samples, including party identification.
Party Disclosure Archive (on the jump)
- Pew Research Center
- Fox News/Opinion Dynamics
- American Research Group
- ABC News/Washington Post
- How to interpret shifts in Party ID
April 12, 2005
Disclosing Party ID: ABC News/Washington Post
Taking a momentary pause in the exit poll discussion, MP continues with a series of responses from pollsters who do not typically report party identification in online releases to explain their policies. Today we hear from Gary Langer and Richard Morin, who direct respectively the surveys conducted jointly by ABC News and the Washington Post. Bucking the recent trend, both organizations will continue to make party ID results from their most recent surveys available only on request.
First, here is the response from ABC News Polling Director Gary Langer
The content of ABC News poll analyses is determined by the sole, independent editorial judgment of ABC News. We include party ID when relevant. ABC News donates its complete polling datasets to the Roper Center for Public Opinion Research for public dissemination. Detailed methodological disclosure is posted on our ABCNews.com website. We adhere to the AAPOR Code of Ethics, comply with the disclosure requirements of the National Council on Public Polls and reply in as timely a fashion as possible to questions addressed to firstname.lastname@example.org.
Next, Washington Post Polling Director Richard Morin:
We have no firm policy on reporting or not reporting party ID from individual surveys. This is what we currently do: As you guessed, we release party ID (and anything else) to anyone who asks us for it. This allows us to have a fuller discussion of the specific result and, if necessary, a detailed conversation about measuring and interpreting unleaned and leaned party ID. We release our complete data sets, which includes party ID. We have not posted the results of party ID or other demographic or diagnostic questions from individual surveys because they are not, in our judgment, in themselves newsworthy. Of course we look for meaningful trends in these variables and report these changes when we are confident they are real.
Both ABC News and the Post regularly provide full results online for every survey (here and here), including the complete text and responses from every substantive question. Both have posted exceptionally detailed online summaries of their own methodologies (here and here), more generic guides to public opinion polling (here and here) and the issue of response rates (here and here) in particular.
While MP has not previously requested party ID results from an ABC/Washington Post survey, he has no doubt that they strive to comply with the ethical standards promulgated by AAPOR and the National Council on Public Polls (NCPP). Morin, Langer and their staffs are active members of AAPOR, and as Langer notes they regularly release full respondent level data to the Roper Archives, where scholars can slice and dice every variable on every poll, including party ID.
On the issue of party, both organizations do occasionally report shifts in partisanship (for an example, see the analysis of the ABC survey fielded just after the Republican convention in early September 2004). And MP attended an a session at the 2004 AAPOR conference at which Gary Langer presented this detailed paper on long term trends in party identification. It is a must read for those that study party identification that Langer kindly made available to MP's readers.
Having said all of that, MP remains surprised these exceptional survey organizations choose to provide results for party identification only on request. As MP has written previously, the result for party identification on any given survey should not be should not be considered the sole measure of the quality of a survey. However, party ID is, as Morin implies, one of many "diagnostic" measures that pollsters use to assess how each new survey compares statistically to those done previously. The readers, like the clients of any research company, deserve some window on those basic diagnostics. Morever, both organizations could use the release of such data as a vehicle for the "fuller discussion about measuring and interpreting" party identification that Morin surely provides one-on-one.
Also, while changes in the overall party balance are rarely "newsworthy," both ABC and the Washington Post routinely provide tabulations of results by party subgroups. As recently as their January survey, the Post regularly provided readers with tabulations by party of every question on every survey. Informing readers of the size and statistical significance of those subgroups, especially in online releases with few constraints on space and time, would live up to the spirit - if not the letter - of the NCPP disclosure standards.
Party Disclosure Archive (on the jump)
- Pew Research Center
- Fox News/Opinion Dynamics
- American Research Group
- How to interpret shifts in Party ID
April 08, 2005
What the USCV Report Doesn't Say (Part I)
A few weeks ago, I asked if readers were "tired of exit polls yet." The replies, though less than voluminous, were utterly one sided. Twenty-five of you emailed to say yes, please continue to discuss the exit poll controversy as warranted. Only one emailer dissented, and then only in hope that I not let the issue "overwhelm the site." Of course those responses are a small portion of those who regularly visit MP on a daily or weekly basis, who for whatever reason, saw no pressing need to email a reply. To those who responded, thank you for your input. I will continue to pass along items of interest when warranted.
One such item came up late last week. A group called US Count Votes (USCV) released a new report and executive summary -- a follow-up to an earlier effort -- that collectively take issue with the lengthy evaluation of the exit polls prepared for the consortium of news organizations known as the National Election Pool (NEP) by Edison Research and Mitofsky International. The USCV report concludes that "the [exit poll] data appear to be more consistent with the hypothesis of bias in the official count, rather than bias in the exit poll sampling." MP has always been skeptical of that line of argument, but while there is much in USCV report to chew over, I am mostly troubled by what the report does not say.
First some background: The national exit polls showed a consistent discrepancy in the vote that favored John Kerry. The so-called "national" exit poll, for example, had Kerry ahead nationally by 3% but George Bush ultimately won the national vote by a 2.5% margin. Although they warned "it is difficult to pinpoint precisely the reasons," Edison-Mitofsky advanced the theory that "Kerry voters [were] less likely than Bush voters to refuse to take the survey" as the underlying reason for much of the discrepancy. They also suggested that "interactions between respondents and interviewers" may have exacerbated the problem (p.4). The USCV report takes dead aim at this theory, arguing that "no data in the report supports the E/M hypothesis" (USCV Executive Summary, p. 2).
The USCV report puts much effort into an analysis of data from two tables provided by Edison-Mitofsky on pp. 36-37 of their report. As this discussion gets into murky statistical details quickly, let me first try to explain those tables and what the USCV report says about them. Both tables show averages values from 1,250 precincts in which NEP conducted at least 20 interviews and was able to obtain precinct level election returns. Each table divides the exit poll categories into five categories based on the "partisanship" of the precinct. In this case, partisanship really means preference for either Bush or Kerry. "High Dem" precincts are those where John Kerry received 80% or more of the vote, "Mod Dem" precincts are where Kerry received 60-80% and so on.
The first table shows a measure of the average precinct level discrepancy between the exit poll and the vote count that Edison-Mitofsky label "within precinct error" (or WPE - Note: I reproduced screen shots of the original tables below, but given the width of this column of text, you will probably need to click on each to see a readable full-size version).
To calculate the error for any given precinct, they subtract the margin by which Bush led (or trailed) Kerry in the count from the the margin by which bush led (or trailed) Kerry in the unadjusted poll sample. So (using the national results as an example) if had Kerry led on the exit poll by 3% (51% to 48%), but Bush won the precinct count by 2.5% (50.7% to 48.3%), the WPE for that precinct would be -5.5 [ (48-51) - (50.7 - 48.3) = -3 - 2.5 = -5.5. A negative value for WPE means Kerry did better in the poll than in the count, positive values mean a bias toward Bush. Edison-Mitosfky did this calculation for 1,250 precincts with at least 20 interviews per precinct and where non-absentee Election Day vote counts were available. The table shows the average ("mean") for WPE, as well as the median and the average "absolute value" of WPE. In the far right column the "N" size indicates the number of precincts in each category.
The average WPE was -6.5 percent - meaning an error in the Bush-Kerry margin of 6.5 points favoring Kerry. The USCV places great importance on the fact that average WPE (the "mean") appears to be much bigger in the "High Rep" precincts (-10.0) and than in the "High Dem" precincts (+0.3).
The second table shows the average completion rates across same partisanship categories.
MP discussed this particular table at length back in January. A bit of review on how to read the table: Each number is a percentage and you read across. Each row shows completion, refusal and miss rates for various categories of precincts, categorized by their level of partisanship. The first row shows that in precincts that gave 80% or more of their vote to John Kerry, 53% of voters approached by interviewers agreed to be interviewed, 35% refused and another 12% should have been approached but were missed by the interviewers.
The USCF report argues that this table contradicts the "central thesis" of the E-M report, that Bush voters were less likely to participate in the survey than Kerry voters:
The reluctant Bush responder hypothesis would lead one to expect a higher non-response rate where there are many more Bush voters, yet Edison/Mitofsky's data shows that, in fact, the response rate is slightly higher in precincts where Bush drew ?80% of the vote (High Rep) than in those where Kerry drew ?80% of the vote (High Dem). (p. 9)
As I noted back in January, this pattern is a puzzle but does not automatically "refute" the E-M theory:
If completion rates were uniformly higher for Kerry voters than Bush across all precincts, the completion rates should be higher in precincts voting heavily for Kerry than in those voting heavily for Bush....However, the difference in completion rates need not be uniform across all types of precincts. Mathematically, an overall difference in completion rates will be consistent with the pattern in the table above if you assume that Bush voter completion rates tended to be higher where the percentage of Kerry voters in the precincts was lower, or that Kerry voter completion rates tended to be higher where the percentage of Bush voters in the precincts was lower, or both. I am not arguing that this is likely, only that it is possible.
The USCV report essentially concedes this point but then goes through a series of complex calculations in an effort to find hypothetical values that will reconcile the completion rates, the WPE rates and the notion that Kerry voters participated at a higher rate than Bush voters. They conclude:
[While] it is mathematically possible to construct a set of response patterns for Bush and Kerry voters while faithfully reproducing all of Edison/Mitofsky's "Partisanship Precinct Data"... The required pattern of exit poll participation by Kerry and Bush voters to satisfy the E/M exit poll data defies empirical experience and common sense under any assumed scenario. [emphasis in original]
I will not quarrel with the mechanics of their mathematical proofs. While there are conclusions I differ with, what troubles me most is what USCV does not say:
1) As reviewed here back in January, the Edison-Mitofsky report includes overwhelming evidence that the error rates were worse when interviewers were younger, relatively less experienced, less well educated or faced bigger challenges in selecting voters at random. The USCV report makes no mention of the effects of interviewer expeerience or characteristics and blithely dismisses the other factors as "irrelevant" because any one considered alone fails to explain all of WPE. Collectively, these various interviewer characteristics are an indicator of an underlying factor that we cannot measure: How truly "random" was the selection of voters at each precinct? It is a bit odd - to say the least - that the USCV did not consider the possibility that the cumulative impact of these factors might explain more error than any one individually.
[Clarification 4/10 : As Rick Brady points out, errors in either direction (abs[WPE]) were highest among the least well educated. However, WPE in Kerry's favor was greatest among interviewers with a post-graduate education].
2) More specifically, the USCV report fails to mention that poor performance by interviewers also calls into question the accuracy of the hand tallied response rates that they dissect at great length. Edison-Mitofsky instructed its interviewers keep a running hand count of those who they missed or who refused to participate. If WPE was greatest in precincts with poor interviewers, it does not take a great leap to consider that non-response tallies might be similarly inaccurate in those same precincts.
MP can also suggest a few theories for why those errors might artificially depress the refusal or miss rates in those problem precincts. If an interviewer is supposed to interview every 5th voter, but occasionally takes the 4th or the 6th because they seem more willing or approachable, their improper substitution will not show up at all in their refusal tally. Some interviewers having a hard time completing interviews may have fudged their response tallies out of fear being disciplined for poor performance.
3) Most important, the USCV report fails to point out a critical artifact in the data that explains why WPE is so much higher than average in the "High Rep" precincts (-10.0) and virtually non-existent in the "High Dem" precincts (+0.3). It requires a bit of explanation, which may get even more murky, but if you can follow this you will understand the fundamental flaw in the USCV analysis.
[4/13 - CLARIFICATION: On April 11 and 12, 2005, after this message was posted, the USCV authors twice revised their paper to include and then expand a new Appendix B that discusses the implications of this artifact. 4/14 - As I am in error below in describing the artifact, my conclusion that it represents a "fundamental flaw," is premature at best ]
Given the small number of interviews, individual precinct samples will always show a lot of "error" due to ordinary random sampling variation. I did my own tabulations on the raw data released by NEP. After excluding those precincts with less than 20 interviews, the average number of unweighted interviews per precinct is 49.3 with a standard deviation of 15.9. That's quite a range -- 10% of these precincts have an n-size of just 20 to 30 interviews -- and makes for a lot of random error. For a simple random sample of 50 interviews the ordinary "margin of error" (assuming a 95% confidence level) is +/- 14%. With 30 interviews that error rate is +/- 18%. The error on the difference between two percentages (the basis of WPE) gets bigger still.
Edison-Mitofsky did not provide the standard errors for their tabulations of WPE, but they did tell us (on p. 34 of their report) that the overall standard deviation for WPE in 2004 was 18.2. If we apply the basic principals of the standard deviation to an average WPE of -6.5, we know that 68% of precincts had WPEs that ranged from from -24.7 in Kerry's favor to 11.7 in Bush's favor. The remaining third of the precincts (32%) had even greater errors, and a smaller percentage -- roughly 5% -- had errors greater than -42.9 in Kerry's favor or 36.4 in Bush's favor. That's a very wide spread.
First consider what would happen if the vote count and exit poll had been perfectly executed, with zero bias in any direction. Even though the random error would create a very wide spread of values for WPE (with a large standard deviation), the extreme values would cancel each other out leaving an average overall WPE of 0.0.
Now consider the tabulation of WPE by precinct partisanship under this perfect scenario. In the three middle partisanship categories where each candidate's vote falls between 20% or 80% (Weak Rep, Even and Weak Dem) the same canceling would occur, and WPE would be close to zero.
But in the extreme categories, there would be a problem. The range of potential error is wider than the range of potential results. Obviously, candidates cannot receive more than 100% or less than 0% of the vote. So in the "Strong Rep" and "Strong Dem" categories, where there candidate totals are more than 80% or less than 20%, the ordinarily wide spread of values would be impossible. Extreme values would get "clipped" in one direction but remain in place in the other.
Thus, even with perfect sampling, perfect survey execution and a perfect count, we would expect to see a negative average WPE (an error in Kerry's favor) in the "Strong Rep" precincts and a positive average WPE (an error in Bush favor) in the in the "Strong Dem" category. In each "Strong" category, the extreme values on the tail of the distribution closest to 50% would pull the average up or down, so the median WPE would be closer to zero than then mean WPE. Further, the absolute value of WPE would be a bit larger in the middle categories than the "Strong" categories. The table would show differences in WPE between Strong Dem and Strong Rep precincts, but none would be real.
[4/14 - My argument that random error alone causes such an artifact is not correct].
Now take it one step further. Assume a consistent bias in Kerry's favor -- an average WPE of -6.5 -- across all precincts regardless of partisanship. Every one of the averages will get more negative, so the negative average WPE in the Strong Rep precincts will get bigger (more negative) and the positive average WPE will get smaller (closer to zero). Look back at the table from the Edison-Mitofsky report. That's exactly what happens. Although the average WPE is -6.5, it is -10.00
-10.04 in the Strong Rep precincts and 0.3 0.5 in the Strong Dem.
Finally, consider the impact on the median WPE (the median is the middle point of the distribution). For the Strong Rep category, pulling the distribution of values farther away from 50% increases the odds that extreme values on the negative tail of the distribution will not be canceled out by extreme values of the positive end. So for Strong Rep, the median will be smaller than the mean. The opposite will occur on the other end of the scale. For the Strong Dem category, the distribution of the values will be pulled closer to 50%, so the odds are greater that the extreme errors on the negative end of the distribution (in Kerry's favor) will be present to cancel out some of the extreme errors favoring Bush. Here the mean and median WPE will be closer. Again, this is exactly the pattern in the Edison-Mitfosky table.
Thus, the differences in WPE by precinct partisanship appear to be mostly an artifact of the way the data were tabulated. They are not real.
[4/14 - Given the error above, this conclusion is obviously premature. A similar artifact based on randomly distributed clerical errors in the count data gathered by Edison-Mitofsky may produce such a pattern. I'll have more to say on this in a future post]. .
Yet the USCV report plows ahead using those differences in a long series of mathematical proofs in an attempt to refute the Edison-Mitofsky theory. As far as I can tell, they make no reference to the possibility of an artifact that the Edison-Mitofsky report plainly suggests in the discussion of the table (p. 36):
The analysis is more meaningful if the precincts where Kerry and Bush received more than 80% of the vote are ignored. In the highest Kerry precincts there is little room for overstatement of his vote. Similarly the highest Bush precincts have more freedom to only overstate Kerry rather than Bush. The three middle groups of precincts show a relatively consistent overstatement of Kerry.
[4/13 CLARIFICATION & CORRECTION: As noted above, the new Appendix B of the USCV report now discusses this implications of this artifact and implies that the authors took this artifact into account in designing the derivations and formulas detailed in Appendix A. A statistician I consulted explains that their formulas make assumptions about the data that minimize or control for artifact. I apologize for that oversight]
Read the USCV report closely and you will discover that their most damning words- "dramatically higher," "very large spread," "implausible patterns," "defies empirical experience and common sense" - all apply to comparisons of data derived from the phantom patterns in the Strong Dem or Strong Rep precincts. Next follow the advice from Edison-Mitofsky and examine their charts and tables (like the one copied below), but ignoring the end points. The patterns seem entirely plausible and tend to support the Edison-Mitofsky theory rather than contradict it.
But USCV goes further, arguing that the difference between the mean and median values of WPE in Strong Bush precincts supports their theory that "vote-counting anomalies occurred disproportionately in 'high-Bush' precincts" (p. 15).
To put it nicely, given what we know about why the mean and median values are so different in those precincts, that argument falls flat.
The statistical artifact that undermines the USCV argument was not exactly a secret. The Edison-Mitofsky report alluded to it. It was also discussed back in January on the AAPOR member listserv, a forum widely read by the nation's foremost genuine experts in survey methodology.
You can reach your own conclusions as to why the USCV report made no mention of it.
UPDATE: A DailyKos diarist named "Febble" has done a very similar an intriguing post that points up a slightly different artifact in the NEP data. It's definitely worth reading if you care about this subject.
[4/13 & 4/14 - CLARIFICATION & CORRECTION: As noted above, the USCV report now discusses a similar
the artifact and its implications. I overlooked that the assumptions of the derviations made by the USCV authors may be that are relevant to the artifact I should have described. In retrospect, I was in error to imply that they failed to take the artifact into account and my conclusions about the implications of their derived values was premature. As such, I struck the two sentences above. I apologize for the error which was entirely mine.
In clarifying these issues in Appendix B, the USCV authors indicate that the derived response rates they consider most implausible occur in the 40 precincts that were most support of George Bush (pp. 26-27):
WPE should be greatest for more balanced precincts and fall as precincts become more partisan. The data presented on p. 36, 37 of the E/M report and displayed in Table 1 of our report above, show that this is the case for all except the most highly partisan Republican precincts for which WPE dramatically increases to -10.0%. Our calculations above show the differential partisan response necessary to generate this level of WPE in these precincts ranges from 40% (Table 2) to an absolute minimum of 20.5% (Table 4)...
The dramatic and unexpected increase in (signed) mean WPE in highly Republican precincts of -10.0%, noted above, is also unexpectedly close to mean absolute value WPE (12.4%) in these precincts. This suggests that the jump in (signed) WPE in highly partisan Republican precincts occurred primarily because (signed) WPE discrepancies in these precincts were, unlike in [the highly Democratic precincts] and much more so than in [the middle precincts], overwhelmingly one-sided negative overstatements of Kerry's vote share.
To assess the plausibility of the dervied response rates in the Republican precincts, we need to take a closer look at the assumptions the authors make and their implications. I will do so in a subsequent post.]
There is more of relevance that the USCV leaves out, but I have already gone too long for one blog post. I will take up the rest in a subsequent post. To be continued...
April 05, 2005
Disclosing Party ID: Newsweek/PSRA
Continuing with responses from pollsters who do not typically report party identification in online releases to explain that policy. Today we hear from Larry Hugick of Princeton Survey Research Associates, the firm that conducts the Newsweek poll:
The reasons why party ID figures for the Newsweek poll haven't been routinely released in the past is straightforward. From the very beginning, the releases have included only substantive questions, not demographics and other "background" questions like party ID. I suspect this was done initially because of space considerations and the poll's quick turnaround schedules. This policy has been followed through the 2004 election. To my knowledge, there has never been a Newsweek poll release that dealt specifically with the findings of a party ID question (although we have covered related topics like party image.) But we have always made the results of the party ID, demographic and other background questions available on request.
Until relatively recently (I'd say since the 2000 presidential election) we haven't had all that many requests for party ID numbers. But like PRC [the Pew Research Center], in light of the current demand for this information, we will begin routinely including it on the topline documents released for each Newsweek poll. We often release UNWEIGHTED party ID figures in reporting the bases for subgroup breakdowns and it is frustrating to see them misinterpreted so often. Even before your request, we were considering making this change.
And thank you Newsweek/PSRA! More responses will follow over the next few days.
P.S. Sorry for the infrequent posts the last few days. Some days, alas, my day job comes first...
Party Disclosure Archive (on the jump)
- Pew Research Center
- Fox News/Opinion Dynamics
- American Research Group
- How to interpret shifts in Party ID