
« April 10, 2005 - April 16, 2005 | Main | April 24, 2005 - April 30, 2005 »
April 22, 2005
The Hotline Asks Pollsters: What Validates a Poll?
This week, the hard working folks at The Hotline, the daily online political news summary published by the National Journal, did a remarkable survey of pollsters on the question of how they check their samples for accuracy. The asked virtually every political pollster to answer these questions: "What's the first thing you look at in a survey to make sure it's a good sample? In other words -- what validates a poll sample for you?" They got answers from six news media pollsters and thirteen campaign pollsters (including MP and his partners).
Now, MP readers are probably not aware of it, since few can afford a subscription to Washington's premiere political news summary, but The Hotline has been closely following MP's series on disclosure of Party ID. In fact they have reproduced much of the series almost in full, with all the requisite links (for which we are appropriately grateful). In fact, we believe it was the Washington Post polling director Richard Morin's reference to party identification as a "diagnostic question" in his answer to MP that inspired The Hotline's poll of pollsters about how they validate polls. Thus, we asked the-powers-that-be at the National Journal to grant permission to reproduce their feature, and they have kindly agreed.
The responses of the various pollsters are reproduced on the jump page. Thank you National Journal!
(Under the circumstances, MP doesn't mind shameless shilling for The Hotline, especially since he is a regular reader: Their "Poll Track" feature is one of the most comprehensive and useful archives of political polling results available. It's just a shame they don't offer a more affordable subscription rate to a wider audience).
As the Hotline editors note, most pollsters listed a series of demographic and attitudinal questions that they tend to look at in evaluating a poll (particularly gender, age, race and party ID). However, a few themes deserve some emphasis:
- A point worth amplifying: Several pollsters - especially Pew's Andrew Kohut, ABC's Gary Langer and Democrat Alan Secrest - stressed that the procedures used to draw the sample and conduct the survey are more important to judgments about quality than the demographic results.
- Ironically, there was a difference in the way the pollsters heard and answered the Hotline's questions (next time, pre-test!): Some described how they evaluate polls done by other organizations; some (including MP and his partners) described how we evaluate our own samples.
- Although most described what factors they look at, few went on to describe what they do when a poll (either theirs or someone else's) fails their quality check. Do they weight or adjust the results to correct the problem? Leave it as is, but consider the errant measure in their analysis? Ignore the survey altogether?
There is much food for thought here. Like the Hotline editors, MP would like to know what you think. After reading over the comments below, please take a moment to leave a comment: Of all the suggestions made, what information do you want to know about a poll? More broadly, what other questions would you like to ask the pollster consultants about how we do our work?
See the complete responses from pollsters as published in yesterday's Hotline on the jump page.
GOPers
- MWR Strategies (R) pres. Michael McKenna: "From a technical perspective, we look at demographic breakouts (age, gender, region, etc.) and make sure they are in the ballpark. Then we look at ideological/partisan breaks ... Then finally, and perhaps most importantly, we look at the responses themselves and sort of give them the real-world test -- does that set of answers conform to the other things I know about the world? ... It seems to me that the trick to samples is not being excessively concerned about one set of survey results, but rather to look at the results of all surveys on a given topic. It is pretty rare for a single deficient sample to twist the understanding of an issue or event, in part because everyone (I think) in the business looks at each other's results.
- Hill Research (R) principal David Hill: "I always look at the joint distributions by geography, party, age, gender, and race/ethnicity. Based on prior election databases, we know the correct percentage of the sample that should be in each combination of these categories. ... We try to achieve the proper sample distribution through stratified sampling and the imposition of quotas during the interviewing process, but sometimes it still isn't right because of quirks in cooperation rates, forcing us to impose weights on the final results."
- Ayres, McHenry & Associates (R) VP Jon McHenry: "When we get data from the calling center, the first thing I check is the racial balance and then party ID. Variation in either of those two numbers can and will affect the other numbers throughout the survey. Looking at someone else's survey, ... I'll also see how long the survey has been in the field. You can do a good survey in two days, but it's tricky. It's pretty tough in one day, which is part of the reason tracking nights can bounce around. ... But ... knowing you're in the ballpark with party id is a pretty good proxy for seeing that you have a balanced sample."
- Public Opinion Strategies (R) partner Glen Bolger: "If it's a party registration state/district, I check party registration from the survey against actual registration. I also look closely at ethnicity to ensure proper representation of minorities. We double check region and gender quotas to make sure those were followed. We check age to ensure seniors are not overrepresented.
- Probolsky Research (R) pres. Adam Probolsky: "If the poll is about a specific election, I look at whether the respondents are likely voters. If not, it hard to take the results seriously. If it is a broader public policy or general interest poll, I look to see if the universe of respondents matches the universe of interested parties, stated more plainly, that the population that is being suggested as having a certain opinion is well accounted for in the universe of respondents."
- Moore Information (R) pres. Bob Moore: "Name of pollster and partisanship of sample."
Media
- Gallup editor-in-chief Frank Newport: "Technically there are a wide variety of factors which combine to make a "good sample." As an outside observer ... I focus first and foremost on the known integrity and track record of the researcher/s involved. If it's a research organization unknown to me, the "good sample" question becomes harder to answer without more depth investigation -- even with the sample size, target population, dates of interviewing information usually provided. Parenthetically, question wording issues are often more important in evaluating poll results than the sample per se."
- ABC News dir. of polling Gary Langer: "A good sample is determined not by what comes out of a survey but what goes into it: Rigorous methodology including carefully designed probability sampling, field work and tabulation procedures. If you've started worrying about a "good sample" at the end of the process, it's probably too late for you to have one. When data are completed, we do check that unweighted sample balancing demos (age, race, sex and education) have fallen within expected norms. But this is diagnostic, not validative."
- Schulman, Ronca & Bucuvalas, Inc. pres. Mark Schulman: "When you've been in the polling business as long as I have, you've learned all the dirty tricks and develop an instinct for the putrid polls. 'Blink blink,' as Gladwell calls it. Blink -- who sponsored the poll, partisans or straights? Blink -- are the questions and question order leading the respondent down a predictable path? Blink -- does the major finding just by chance (!) happen to support the sponsor's position? This all really does happen in a blink or two. The dirty-work usually does not involve the sample itself."
- Pew Research Center pres. Andrew Kohut: "There is no one way to judge a public opinion poll's sample. First thing we look for was whether the sample of potential respondents was drawn so that everyone in the population had an equal, or at least known chance of inclusion. Secondly, what efforts were made to reach potential respondents -- Were there call backs -- how many -- over what period of time? And what measures were used to convince refusals to participate? ... How does the distribution of obtained sample compare to Census data? We will also look at how the results of the survey line up on trend measures that tend to be stable. If a poll has a number of outlier findings on such questions it can set off a warning bell for us. I want to add that the major source of error in today's polls is more often measurement error than sampling error or bias. When I see a finding that doesn't look right, I first look to the wording of the question, and where the question was placed in the questionnaire. The context in which a question is asked often makes as much difference as how the question is worded."
- Zogby Int'l pres. & CEO John Zogby: "I go right to the harder to reach demographics. Generally, that means younger voters, Hispanics, African Americans. They are usually under-represented in a typical survey sample, but if their numbers are far too low, then the sample is not usable. I also look at such things as union membership, Born-Agains, and education. If any of these are seriously out of whack then there is a problem."
- Research 2000 pres. Del Ali: "There are two things right off the top: the firm that conducted the survey and for whom it was conducted for. If it is a partisan firm conducted for a candidate or a special interest group, the parameters and methodology become critical to examine. However, regardless of the firm or the organization who commissioned the poll, the most important components to look for are: Who was sampled (registered voters, likely voters, adults, etc.), Sample size/margin for error (at least 5% margin for error), Where was poll conducted (state wide, city, county, etc.), What was asked in the poll (closed ended/open ended questions), When was a horse race question asked in the poll. Bottom line, I take all candidate and policy group polls with a grain of salt. The independent firms who poll for media outlets are without question unbiased and scientifically conducted."
Dems
- Greenberg Quinlan Rosner Research (D) VP Anna Greenberg: "It is hard to tell because you never really know how people develop their samples." Mentioning sample size, partisan breakdown and field dates, Greenberg looks "for ... how accurately it represents the population it purports to represent (e.g., polls on primaries should be of primary voters). ... You can also look at the demographic and political (e.g., partisanship) characteristics to make sure the sample accurately represents the population. It is rarely reported, but it would be helpful to know how the sample frame is generated (e.g., random digit dial, listed sample) so you can get a sense of the biases in the data. But none of these measures really help you understand response rates, completion rates or response bias, which arguably have as big an impact on sample quality as any of the items listed above. It is important to note that ALL samples have bias, it's just a matter of trying to reduce it and understand it."
- Anzalone-Liszt Research (D) partner Jeff Liszt: In addition to mentioning the importance of sample size, Liszt looks at "over how long (or short) a period the interviews were conducted. Very large samples taken in one or two nights sometimes raise a red flag because of the implications for the poll's call-back procedures. The challenge is that public polling is only very selectively public. Sampling procedure and weighting are critical, yet opaque processes, about which very few public polls provide any information. ... This often leaves you with little more than a smell test. ... Often, the best you can do is consider whether a poll is showing movement relative to its last reported results, whether other public polls are showing the same movement, and whether there is any apparent shift in demographics and party identification from previous results."
- Global Strategy Group (D) pres. Jefrey Pollock: "The first question we ask ourselves is 'what is driving voter preference?' In an urban race like NYC or LA, race is frequently the primary determinant, and therefore the most important element of ensuring a valid sample. In addition, there is a high frequency of undersampling of minorities in many surveys. In an election where race is not a leading determinant, we look first to ensure that the survey matches up to probable regional turnout."
- Cooper & Secrest Associates (D) partner Alan Secrest: "Proper sampling is the absolute bedrock of accurate and actionable polling. ... A correctly-drawn poll sample -- in concert with properly focused screen questions (the two cannot be divorced...especially in primary polling) -- should yield a representative look at a given electorate. Sadly, such methodological rigor too often is not the case." Nothing that "there is no 'single' criterion" to use in judging a poll, Secrest does point out the importance of testing dates and demographic, Secrest lists his key criteria: "Does the pollster have a track record for accurate turnout projection, winning, and being willing to report unvarnished poll results to the client?; what turnout model was used to distribute the interviews?; is the firm using an adequate sample size for the venue, especially if subgroup data is being released?; 'consider the source'...some voter list firms are perennially sloppy or lazy in the maintenance of their product; were live interviewers used? centrally located? appropriate accents? Obviously, question design and sequence matter as well."
- A joint response from all the partners at Bennett, Petts & Blumenthal (D), Anna Bennett, David Petts and Mark Blumenthal: The most valid data we have on "likely voters" involves their geographic distribution in previous comparable elections. Like most political pollsters, we spend a lot of time modeling geographic turnout patterns, and stratify our samples by geography to match those patterns. We also look at whatever comparable data is available, including past exit polls, Census estimates of the voting population, other surveys and voter file counts. We examine how the demographic and partisan profile of our survey compares to the other data available, but because there are often big differences in the methodologies and sources, we would use these to weight data in rare instances and with extreme caution."
- Hamilton Beattie & Staff (D) pres. David Beattie: "There is not 'one thing' that validates a poll -- the following are the first things we always look at: 1)what is the sample size, 2)what were the screening questions, 3)what is the racial composition compared to the electorate and were appropriate languages other than English used, 4)what is the gender, 5)what is the age breakdown (looking especially to make sure it is not too old) 6)what is the party registration or identification."
- Decision Research (D) pres. Bob Meadow: "The first thing we do is compare the sample with the universe from which it is drawn. For samples from a voter file, we compare on party, gender and geography. For random digit samples, we compare on geography first."
© 2005 by National Journal Group Inc., 600 New Hampshire Avenue, NW, Washington DC 20037. Any reproduction or retransmission, in whole or in part, is a violation of federal law and is strictly prohibited without the consent of National Journal. All rights reserved.
Posted by Mark Blumenthal on April 22, 2005 at 01:16 PM in Miscellanous | Permalink | Comments (8)
April 18, 2005
What the USCV Report Doesn't Say (Part II)
Mea Culpa Update - In Part I of this series I erred in describing an artifact in the data tabulation provided in the report provided by Edison Research and Mitofsky International. The artifact (which was first described by others back in January) results, not from random sampling error, but from all other randomly distributed errors in the count data obtained by Edison-Mitofsky.
Over the last week, I have seen evidence that such an artifact exists and behaves as I described in Part I. I will have more to report on this issue very soon. For now, I can only report that others are hard at work on this issue and their findings are very intriguing. As a certain well known website sometimes says, "Developing..."
[Update 4/21 - The development I am referring to is the work of DailyKos diarist (and occasional MP commenter) Febble," summarized in this blog post (and a more formal draft academic paper). She reaches essentially the same conclusion that I did in Part I:
It would seem that the conclusion drawn in the [US Count Votes] report, that the pattern observed requires "implausible" patterns of non-response, and thus leaves the "Bush strongholds have more vote-vount corruption" hypothesis as "more consistent with the data" is not justified. The pattern instead is consistent with the [Edison-Mitofsky] hypothesis of widespread "reluctant Bush responders" - whether or not the "political company" was "mixed".
Febble's paper is undergoing further informal "peer review" and -- to her great credit -- she is considering critiques and suggestions from the USCV authors. I will have much more to say about this soon. Stay tuned...]
In the meantime, let me continue with two different issues raised in the US Count Votes (USCV) report and some general conclusions about the appropriate standard for considering fraud and the exit polls.
4) All Machine Counts Suspect? - The USCV report makes much of a tabulation provided by Edison-Mitofsky showing very low apparent rates of error in precincts that used paper ballots. Paper ballot precincts, they write, "showed a median within-precinct-error (WPE) of -0.9, consistent with chance while all other technologies were associates with high WPE discrepancies between election and exit poll results." Their Executive Summary makes the same point illustrated with the following chart that highlights the low WPE median in green.

While the report notes that the paper ballots were "used primarily in rural precincts," and earlier in the report notes that such precincts amounts to "only 3% of sampled precincts altogether," it fails to point out to readers why those two characteristics call the apparent contrast between paper and other ballots into question.
What the USCV report does not mention is the finding by the Edison-Mitofsky report that the results for WPE by machine type appear hopelessly confounded by the regional (urban or rural) distribution of voting equipment. The E-M report includes a separate table (p. 40) that shows higher rates of WPE in urban areas for every type of voting equipment. Virtually all of the paper ballot precincts (88% -- 35 of 40) were in rural areas while two thirds of the machine count precincts (68% - 822 of 1209) were in urban areas. E-M concludes:
These errors are not necessarily a function of the voting equipment. They appear to be a function of the equipment's location and the voter's responses to the exit poll at precincts that use this equipment. The value of WPE for the different types of equipment may be more a function of where the equipment is located that the equipment itself (p. 40).
The USCV report points out that Edison/Mitofsky "fail to specify P-values, significance levels, or the statistical method by which they arrived at their conclusion that voting machine type is not related to WPE." Here they have a point. The public would have been better served by a technical appendix that provided measures of significance. However, public reports (as opposed to journal articles) frequently omit these details, and I have yet to hear a coherent theory for why Edison-Mitofsky would falsify this finding.
Nonetheless, USCV want us to consider that "errors in for all four automated voting systems could derive from errors in the election results." OK, let's consider that theory for a moment. If true, given the number of precincts involved, it implies a fraud extending to 97% of the precincts in the United States. They do not say how that theory squares with the central contention of their report that "corruption of the official vote count occurred most freely in districts that were overwhelmingly Bush strongholds" (p. 11). Their own estimates say the questionable strongholds are only 1.6% of precincts nationwide (p. 14, footnote). Keep in mind that their Appendix B now concedes that pattern of WPE by precinct is consistent with "a pervasive and more or less constant bias in exit polls because of a differential response by party" in all but the "highly partisan Bush precincts" (p. 25).
Presumably, their theory of errors derived from "all four automated voting systems" would also include New Hampshire, the state with fourth highest average WPE in the country (-13.6), where most ballots were counted using optical scan technology. In New Hampshire, Ralph Nader's organization requested a recount in 11 wards, wards specifically selected because their "results seemed anomalous in their support for President Bush." The results? According to a Nader press release:
In the eleven wards recounted, only very minor discrepancies were found between the optical scan machine counts of the ballots and the recount. The discrepancies are similar to those found when hand-counted ballots are recounted.
A Nader spokesman concluded, "it looks like a pretty accurate count here in New Hampshire."
5) More Accurate Projection of Senate Races? - The USCV reports that in 32 states "exit polls were more accurate for Senate races than for the presidential race, including states where a Republican senator eventually won" (p. 16). They provide some impressive statistical tests ("paired t-test, t(30) = -.248, p<.02) if outlier North Dakota is excluded") but oddly omit the statistic on which those tests were based.
In this case, the statistically significant difference is nonetheless quite small. The average within precinct error (WPE) in the Presidential race in those states was -5.0 (in Kerry's favor), the average WPE favoring the Senate Democrat was -3.6 (See Edison-Mitofsky, p. 20). Thus, the difference is only 1.4 percentage points -- and that's a difference on the Senate and Presidential race margins.
The USCV authors are puzzled by this result since "historic data and the exit polls themselves indicate that the ticket-splitting is low." On this basis they conclude it "reasonable to expect that the same voters who voted for Kerry were the mainstay of support Democratic candidates" (p. 16).
That expectation would be reasonable if there were no cross-over voting at all, but the rates of crossover voting were more than adequate to explain a 1.4 percent difference on the margins. By my own calculations, the average difference in the between the aggregate margins for President and US Senate in those 32 states was 9 percentage points. Nearly half of the states (14 of 32) showed differences on the margins of 20 percentage points or more.
And those are only aggregate differences. The exit polls themselves are the best available estimates of true crossover voting. Even in Florida, where the aggregate difference between the Presidential and Senate margins was only 4.0 percentage point, 14% of those who cast a ballot for president were "ticket splitters," according to the NEP data.
Historically low or not, these rates of crossover voting are more than enough to allow for a difference of the margin of only 1.4 percentage points between Presidential and Senate votes.
The consistency in the average WPE values for the Senate races is greater than the slight difference with the presidential race. The exit polls gave erred for the Democratic candidate across the board, including races featuring a Democratic incumbent (-3.3), a Republican incumbent (-5.2) or an open seat (-2.2).
Conclusion: The Burden of Proof?
In the comments section of Part I of this post, "Nashua Editor" (of the Nashua Advocate) asked an interesting question:
Is it not statistically, procedurally, and (dare I say) scientifically sound to maintain skepticism over a scientific conclusion until that conclusion has been verified? Is a scientist, or exit-pollster, not called upon, in this scenario, to favor the U.S. Count Votes analysis until it is proven incorrect?
The question raises one of the things that most troubles me about the way much of the argument about the exit polls and vote fraud turn the scientific method on its head. It is certainly scientifically appropriate to maintain skepticism over a hypothesis until it has been verified and proven. That is the essence of science. What does not follow is why the USCV analysis should be favored until "proven incorrect." When the USCV report argues that "the burden of proof should be to show that election process is accurate and fair" (p. 22), it implies a similar line of reasoning: We should assume that the exit polls are evidence of fraud unless the pollsters can prove otherwise.
Election officials may have a duty to maintain faith in the accuracy and fairness of the election process, but what USCV proposes is not a reasonable scientific or legal standard for determining whether vote fraud occurred. The question (or hypothesis) we have been considering since November is whether the exit polls are evidence of fraud or systematic error benefiting Bush. In science we assume no effect, no difference or in this case no fraud until we have sufficient evidence to prove otherwise, to disprove the "null hypothesis." In law -- and election fraud is most certainly a crime -- the accused are innocent until proven guilty. The "burden of proof" only shifts if and when the prosecutor offers sufficient evidence to convict.
In this sense, good science is inherently skeptical. "Bad science," as the online Wikipedia points out, sometimes involves "misapplications of the tools of science to start with a preconceived belief and filter one's observations so as to try to support that belief. Scientists should be self-critical and try to disprove their hypotheses by all available means."
Whatever its shortcomings, the Edison-Mitofsky report provided ample empirical evidence that:
- The exit polls reported an average completion rate of 53%, which allowed much room for errors in the poll apart from statistical sampling error.
- The exit polls have shown a consistent "bias" toward Democratic candidates for president since 1988, a bias that was nearly as strong in 1992 as in 2004.
- In 2004, the exit polls showed an overall bias toward both John Kerry and the Democratic candidates for Senate.
- Errors were very large and more or less constant across all forms of automated voting equipment and tabulation in use in 97% of US precincts
- The exit poll errors strongly correlated with measures of interviewer experience and the precinct level degree of difficulty of randomly selecting a sample of voters. In other words, they were more consistent with problems affecting the poll than problems affecting the count.
Given this evidence, the hypothesis that the exit poll discrepancy was evidence of fraud (or at least systematic error in the count favoring Republicans) requires one to accept that:
- Such fraud or systematic error has been ongoing and otherwise undetected since 1988.
- The fraud or errors extend to all forms of automated voting equipment used in the U.S.
- Greater than average errors in New Hampshire in 2004 somehow eluded a hand recount of the paper ballots used in optical scan voting in precincts specifically chosen because of suspected anomalies.
Add it all up and "plausibility" argues that exit poll discrepancy is a problem with the poll, not a problem with the count. Yes, Edison-Mitofsky proposed a theory to explain the discrepancy (Kerry voters were more willing to be interviewed than Bush voters) that they cannot conclusively prove. However, the absence of such proof does not somehow equate to evidence of vote fraud, especially in light of the other empirical evidence.
Of course, this conclusion does not preclude the possibility that some fraud was perpetuated somewhere. Any small scale fraud would have not been large enough to be detected by the exit polls, however, even it if amounted to a shift of a percentage point or two in a statewide vote tally. The sampling error in the exit polls makes them a "blunt instrument," too blunt to detect such small discrepancies. Again, the question we are considering is not whether fraud existed but whether the exit polls are evidence of fraud.
You don't need to take my word on it. After the release of the Edison-Mitofsky report, the non-partisan social scientists in the National Research Commission on Elections and Voting in the NRCEV examined the exit poll data and concluded (p. 3):
Discrepancies between early exit poll results and popular vote tallies in several states may have been due to a variety of factors and did not constitute prima facie evidence for fraud in the current election.
[4/18 & 4/19 - Minor typos and grammer corrected, links added]
Posted by Mark Blumenthal on April 18, 2005 at 05:25 PM in Exit Polls | Permalink | Comments (11)


