January 31, 2005
NEP Data Available Online
Unfortunately, my blogging time is short today but want to quickly pass on one bit of news (thanks to Rick Brady of Stones Cry Out for the tip): The so-called "raw" data from the National Election Pool exit polls are now available on-line through the Inter-University Consortium for Political and Social Research (ICPSR), based at the University of Michigan. (The same data are also due to be released by the Roper Center Archives, based at the University if Connecticut, within the next few weeks)
The files have been made available through ICPSR's "Fast Track" service, which they describe as follows:
Studies on FastTrack are public use files that have not yet been fully processed by our staff. This system provides quick access to the study while the files undergo full processing. An announcement will be made as soon as the fully processed files are available.
The fast track link will lead to this FTP file directory which includes documentation from Edison-Mitofsky and sub-directories containing cross-tabulations and data files for the surveys in all 50 states, the District of Columbia and the national survey. Datafiles are in both ASCII and SPSS formats and include documentation to help identify and use included variables.
I have not had time to do more than skim the some of the documentation, but on first glance, the files appear to be consistent with the previous releases of respondent level exit poll data. The files include only one "weight" variable -- the one that includes a "correction" to match results with the actual count. I also see no precinct level data nor any other means of replicating the "within precinct error (WPE)" analysis from the Edison-Mitofsky report. If I that turns out to be true, those who have been demanding the release of "raw data" are going to disappointed, to put it mildly.
Of course, I may have simply overlooked something obvious, or perhaps the "Fast Track" release is incomplete. So I would urge those who are interested and have more time on their hands today to post comments below on what is and is not included.
UPDATE: Rick Brady posts in the comments section an email response from Edison-Mitofsky:
"The Roper Center told us it would be about two weeks before everything is posted. They received the data over a week ago now, so it shouldn't be too long. But they haven't received anything different from what Michigan received, or from what they've received in the past from VNS."
UPDATE II (2/8): The NEP data is also now available from the Roper Center Archives. The Roper Center has prepared an exit poll CD available for free to its members and for $79 to the general public. In addition to the data available online from ICPSR, the Roper Center CD also includes comparable exit poll data from 2000 and crosstabs of the national exit polls from 1976 to 2000.
UPDATE III (2/8): I have been able to clarify what is and what is not included in the "raw data." Those who dive into the data files will find a field for "precinct," as Basil Valentine noted in the comments section. Although the data for each sampled precinct is designated by a code number, the precinct numbers in the data file do not correspond to the actual precinct number in any state. The data do no disclose the actual precincts sampled.
In response to an email query, an Edison Mitofsky spokesperson referred me to the following passage from the Code of Professional Ethics and Practices of the American Association for Public Opinion Research (AAPOR):
"Unless the respondent waives confidentiality for specified uses, we shall hold as privileged and confidential all information that might identify a respondent with his or her responses."
They feel that if they identify the polling locations it might be possible for a computer match to identify a small portion of actual individuals in the data. Some precincts are small enough that it would be possible to identify actual voters from their demographic data. They also feel that any effort to provide a precinct level estimate of actual vote or "within precinct error" would allow a user to identify the actual precinct and, theoretically at least, identify actual voters.
I will leave it to the reader to evaluate this rationale except to say this: The protection of respondent confidentiality is not some minor technicality. It is arguably one of survey research's most important ethical bedrocks. No pollster or pollster and survey researchers should ever consider it a trifling matter.
Something else to consider: The U.S. Census has struggled with the issue of how to make "micro-data" available to the general public while still protecting respondent confidentiality as required (in the case of the Census) by federal law. A Census report on the history of confidentiality and privacy issues notes that the potential for disclosure of the identity of individual responses in publicly released data may result in any of the following measures (quoting verbatim from p. 22):
- Removal or reduction in detail of any variable considered likely to identify an especially small and visible population such as persons with high incomes.
- Introduction of "noise" (small amounts of variation) into selected data items.
- Use of data swapping (i.e., locating pairs of matching households in the database and swapping those households across geographic areas to add uncertainty for households with unique characteristics
- Replacement of a reported value by an average in which the average associated with a particular group may be assigned to all members of a group, or to the "middle" member (as in a moving average).
Yes, there are ways to release more data and still protect respondent confidentiality, but it is hard to imagine that anyone would find the deliberate "introduction of noise" or "data swapping" to be an acceptable strategy for the release of exit poll data.
Like it or not, the released data are all we are likely to see.
January 28, 2005
Following up on a lesson learned from the last post, that a story can sometimes make a point more powerfully than a lot of arcane data, I decided to share an excerpts from a series of emails I received about the experiences of four NEP interviewers from the state of Minnesota. The information comes from the Minnesota college professor - let's call him Professor M -- who helped recruit these college student interviewers for NEP. I have shared much of the substance of this story in previous posts, but in light of the findings of the Edison-Mitofsky report, I thought it would be useful to share his verbatim comments.
In early November, intrigued by the controversy surrounding the exit polls, Professor M decided to interview his four students about their experiences as interviewers. As he points out, a sample size of 4 is truly "anecdotal" -- it is by no means representative of the experiences of the 1400 odd interviewers who worked for NEP on Election Day. However, it is remarkable how many of the problems he notes help explain patterns in the data on "within precinct error" in the Edison-Mitofsky report.
The following are excerpts from our email dialogue:
The information that I got from my students is quite intriguing, but of course it cannot in any way be considered a representative sample. Also, the students I spoke with kept no independent notes of response rates or other details while serving as interviewers; their impressions of who responded and who didn't were entirely from memory.
The geographic distribution of the four interviewers was as follows: one outer exurbia, one inner ring suburb, one "exclusive" upscale suburb and one precinct in an ethnically and economically diverse Minneapolis neighborhood.
The [NEP] badge did display the logos of the networks prominently.
However, this could not be easily seen from a distance, and at least one of my students was hampered by the fact that a contingent of folks from MoveOn.Org was stationed right next to her at the 100-ft line. This made her appear from a distance as being connected with them, and being forced to stand 100 ft from the polls, people were able to easily turn aside to avoid her and MoveOn. Also, it gets dark early up here, so the badge was not visible from a distance after 4pm or so.
The two students in suburban areas commented that they had the hardest trouble getting participation in the early morning -- probably due to lines and people needing to get to work.
I believe all of the students reported receiving requests from voters who wanted to participate in the survey despite the fact that they were not the nth person to emerge. Nearly all of the students reported some inclusions that were somewhat less than random. Most commonly this occurred when a couple emerged together and the person the poll worker approached refused but the partner offered to participate. None of the students saw any "difference" in which one of the two participated as long as one of them did.
As I understood what the students told me (who did not see themselves as doing anything wrong, by the way) they would not have coded a refusal at all in that situation.
One student reported at least one instance of a person simply taking a survey from her supplies (which were out in the open at her table) filling it out, and dropping it into the survey box. By the time she realized what had happened (she was busy trying to buttonhole legitimate respondents), there was no way to determine for certain which of the surveys in the box had been incorrectly included.
A few additional observations from [a fourth student] -- she noted that she had more refusals among white males, although she was not sure if that was related to her own appearance (she is African-American). Also, she observed (and this makes sense, when you think about it) that her response rate improved over the course of the day as she became better at honing her "sales pitch." Still, despite the fact that she had perhaps the most advantageous placement of any of my four students (she was indoors at the only entrance/exit and had full cooperation from the staff on-site), she still recalls a fairly low response rate -- 40-50% perhaps.
To clarify one point: Each interviewer was given an "interviewing rate" which ranged from 1 to 10 nationally. Here is the way the Edison-Mitofsky training materials (passed along by Professor M) describe what was supposed to happen:
We set an interviewing rate based on how many voters we expect at your polling place. If your interviewing rate is 3, you will interview every 3rd voter that passes you. If it is 5, you will interview every 5th voter that passes you, etc. We set an interviewing rate to make sure you end up with the correct number of completed interviews over the course of the day, and to ensure that every voter has an equal chance of being interviewed.
If the targeted voter declines to participate or if you miss the voter and do not get a chance to ask him or her to participate, you should mark them as a "Refusal" or "Miss" on your Refusals and Misses Tally Sheet and start counting voters again (for a more thorough explanation of refusals and misses, refer to page 9). For example, if your interviewing rate is 3 and the 3rd "person refuses to participate, you do not interview the 4th person. Instead, start counting again to three with the next person. [Emphasis added]
The point: If interviewers allowed "inclusions that were somewhat less than random" but did not tally refusals appropriately, then the "completion rates" now getting so much scrutiny in the Edison-Mitofsky report are not only inaccurate, but the inaccuracies will probably occur most, on average, in the same precincts shwoing the biggest "within precinct error" (WPE).
The bigger point: Consider that all of the above comes from just four interviewers. Imagine how much we might learn if we could talk to hundreds. Apparently, that is exactly what Edison Mitofsky says they will soon do (or are already doing) with the interviewers in Ohio and Pennsylvania (p. 13):
We are in the process of an in-depth evaluation of the exit poll process in Ohio and Pennsylvania...We will follow up with in-depth interviews with the exit poll interviewers in the precincts in which we saw the largest errors in an attempt to determine if there were any factors that we have missed thus far in our investigation of Within Precinct Error.
I think I can speak for others in the survey research profession when I say we hope they ultimately share more of what they learn. It will help us all do better work.
[Typo fixed 1/28]
January 27, 2005
The War Room
Of the newly disclosed data in the Edison/Mitofsky report on this year's exit polls, some of the most important concern results from past elections. Although I had found snippets before, the data was not nearly as comprehensive as what is now available in the report. The implication of all the numbers is clear: The 1992 exit polls were off by nearly as much as those in 2004. Even better, there's a movie version.
But I'm getting a bit ahead of the story. Much of the speculation on the blogosphere and elsewhere about the problems of this year's exit polls begins with the premise that these problems are new. While it is is true that the average "within precinct error" (WPE) of 6.5 percentage points in Kerry's favor was, as the report states, "the largest...we have observed on the national level in the last five elections" (p. 31) there was also a similar error in 1992 and a bias favoring Democrats in every national exit poll conducted since the networks started doing a combined exit poll in 1988. To be more specific, the report shows that:
- The average WPE this year (-6.5) was almost as high (-5.0) in 1992 and favored the Democrats in 2000 (-1.8), 1996 (-2.2) and 1998 (-2.2 - p. 31).
- Within states, the degree of error in 2004 at the state level tended to correlate with the degree of error in past elections, especially 2000 and 1992 (p. 32). As the report put it, "seven of the ten states with the largest WPE in 1992 were also among the fifteen states with the largest WPE in 2004 (California, Connecticut, Delaware, Maryland, New Hampshire, New Jersey and Vermont)" (p. 32).
But, as Ruy Teixeira points out, the presentation of these statistics is a bit arcane for some. So let's watch the movie version.
The classic documentary "The War Room" followed the backroom exploits of James Carville and George Stephanopoulos on behalf of Bill Clinton's 1992 campaign. The film ends on Election Day, and one of the final scenes shows Clinton's operatives reacting to the first leaked exit poll results.
At one point the camera peers over the shoulder as a staffer jots down state-by-state poll results coming in over the telephone. A clock on the wall shows the time: 1:15 Central (2:15 EST). Next, a campaign worker appears reading to someone over the phone the "latest numbers, 2:00 o'clock" (presumably 3:00 EST) for California and Colorado. With a little help from the DVD, I reproduced the handwritten numbers in the table below, and added the the actual Clinton margin in each state, the difference and also the average "within-precinct-error" that VNS for each state included in last weeks' Edison/Mitofsky report.
[A few caveats: First, given the time of day, the numbers had to be the so-called "first call" numbers, the most raw and partial results that interviewers called in at roughly noon local time. The difference I calculated at the state level is a very different statistic than the average WPE that Edison/Mitofsky reported this week for each state in 1992 - the latter was a precinct level estimate based on the full day's exit poll. Also keep in mind the possibility that the leaker or leakee may have confused some of the numbers in haste].
Nonetheless, it is obvious that these mid-day reports in 1992 were off by nearly as much as the leaked numbers everyone saw on the Internet in the middle of the Election Day, 2004. Eleven of twelve states had an error in Clinton's favor; seven had errors on the margin of six points or greater. These first call estimates showed Clinton ahead in three
four states (Alabama, Florida, Indiana and Kansas) that he ultimately lost. The average state level error for these twelve states (-5%) was the same as the overall nationwide WPE on the complete exit poll.
But there were two big differences between 1992 and 2004. First, Bill Clinton still won the election, so few came away feeling fooled or suspicious that an election had been stolen. Second, the Internet was in its infancy, so the leaked results spread to a few hundred reporters and politicos, not the millions that saw the leaked exit polls in 2004.
Finally, one last gossipy tidbit with appeal to those still smarting over the "bloggers blew it" meme: The cameras caught Carville charging into George's office with the very first leaked exit poll results. Carvile says:
"Popkin talked to Warren Whatever, the head of the VRS, and he's going to talk to him again in four minutes, but his initial impression is landslide, could be up to 12 [points], maybe 400 electoral votes."
Translation: Samuel L. Popkin, a University of California San Diego
State University political science professor who had been advising the Clinton campaign in 1992 had talked to Warren Mitofsky, the head of Voter News Services (VNS, the forerunner of NEP) about the "first call" numbers. You can draw your own conclusions about whether the "initial impressions" belonged to Popkin or Mitofsky. The final result was a bit different. Clinton won the national popular vote by a 6.4% margin, not 12% and won 370 Electoral College votes, not 400.
So, assuming we believe James, the leak came not from some irresponsible blogger but from Mitofsky to Popkin to Carville.
Gotta love cinema verite.
[Typos fixed - 1/27]
January 25, 2005
This past Sunday Daniel Okrent, the new "Public Editor" of the New York Times, pondered the shortcomings of reporting on numbers, both in his newspaper and by journalists generally:
When it comes to how it handles numbers, The Times is an equal opportunity offender. Like a bad cough that spreads its germs indiscriminately, numbers misapplied and ill-explained irritate the sensibilities of the right and the left, the drug company official and the animal rights activist, the art collector and the Jets fan?...
Number fumbling arises, I believe, not from mendacity but from laziness, carelessness or lack of comprehension. I'll put myself in the latter category (as some readers no doubt will as well, after they've read through my representation of the numbers that follow). Most of the journalists I know who enter the profession comfortable with numbers write about sports, where debate about the meaning of statistics is a daily competition, or economics, a field in which interpretation of numbers will no more likely produce inarguable results than will finger painting. So it is left to the rest of us who write for the paper to stumble through numbers, scatter them on the page and hope that readers understand.
After reviewing a half dozen or so examples (which MP was, of course, too lazy to count), Okrent makes a suggestion:
Although everyone who writes for The Times is presumably comfortable with words, every sentence nonetheless goes through the hands of copy editors, highly trained specialists who can bring life to a dead paragraph or clarity to a tortured clause with a tap-tap here and a delete-insert there. But numbers, so alien to so many, don't get nearly this respect. The paper requires no specific training to enhance numeracy, and no specialists whose sole job is to foster it. David Leonhardt and Charles Blow, the deputy design director for news, have just begun to conduct occasional seminars on "Using and Misusing Numbers" and that's a start. But as I read the paper and try to dodge the context-absent numbers that are thrown about like shot-puts, I long for more.
MP applauds any effort to enhance numeracy. Toward that end, he recommends "The Numbers Guy," a new free online column (no subscription required) from WSJ.com that seems exactly what Dr. Okrent ordered. It promises to "examine numbers and statistics in the news, business, politics and health" especially those that are "flat-out wrong, misleading or biased." The author is Carl Bialik, a freelance writer with a math degree who (true to Okrent's observation) was once a sportswriter. Looks interesting and a definite addition to MP's regular reading list.
UPDATE: The URL link I orginally included for The Numbers Guy was incorrect and may have taken you to a WSJ subscription page. That was my error. I've corrected the URL (it's www.wsj.com/numbersguy) -- it should work and be free to all comers. Apologies.
January 21, 2005
The "Reluctant Bush Responder" Theory Refuted?
First, my apologies for not posting yesterday. The Inauguration was also a federal holiday, which meant no child care in my household and a day of being Mystery Daddy not Mystery Pollster.
So without further ado: Though the Edison/Mitofsky Report gives us much to chew over, the table drawing the most attention is the one on page 37 that shows completion rates for the survey by the level of partisanship of the precinct.
Click for Full Size Image
A bit of help for those who may be confused: Each number is a percentage and you read across. Each row shows completion, refusal and miss rates for various categories of precincts, categorized by their level of partisanship. The first row shows that in precincts that gave 80% or more of their vote to John Kerry, 53% of voters approached by interviewers agreed to be interviewed, 35% refused and another 12% should have been approached but were missed by the interviewers.
Observers on both sides of the political spectrum have concluded that this table refutes the central argument of the report, namely that the discrepancy in Kerry's favor was "most likely due to Kerry voters participating in the exit polls at a higher rate than Bush voters" (p. 3). Jonathan Simon, who has argued that the exit poll discrepancies are evidence of an inaccurate vote count, concluded in an email this morning that the above table "effectively refutes the Reluctant Bush Responder (aka differential response) hypothesis, and leaves no plausible exit poll-based explanation for the exit poll-vote count discrepancies." From the opposite end of the spectrum, Gerry Dales, editor of the blog DalyThoughts argues [in a comment here] uses the table to "dismiss the notion that the report has proved...differential non-response as a primary source of the statistical bias" (emphasis added). He suspects malfeasance by the interviewers rather than fraud in the count.
The table certainly challenges the idea that Kerry voters participated at a higher rate than Bush voters, but I am not sure it refutes it. Here's why:
The table shows response rates for precincts not voters. Unfortunately, we have no way to tabulate response rates for individuals because we have no data for those who refused or were missed. Simon and Dales are certainly both right about one thing: If completion rates were uniformly higher for Kerry voters than Bush across all precincts, the completion rates should be higher in precincts voting heavily for Kerry than in those voting heavily for Bush. If anything, the table above shows slightly higher completion rates look slightly higher in the Republican precincts.
However, the difference in completion rates need not be uniform across all types of precincts. Mathematically, an overall difference in completion rates will be consistent with the pattern in the table above if you assume that Bush voter completion rates tended to be higher where the percentage of Kerry voters in the precincts was lower, or that Kerry voter completion rates tended to be higher where the percentage of Bush voters in the precincts was lower, or both. I am not arguing that this is likely, only that it is possible.
Note also that the two extreme precinct categories are by far the smallest (see the table at the bottom of p. 36): Only 40 precincts of 1250 (3%) were "High Rep" and only 90 were "High Dem" (7%). More than three quarters were in the "Even" (43%) or "Mod Rep" (33%) categories. Not that this explains the lack of a pattern - it just suggests that the extreme precincts may not be representative of most voters.
Second, as Gerry seems to anticipate in his comments yesterday, the completion rate statistics are only as good as the interviewers that compiled them. Interviewers were responsible for counting each voter they missed or that refused to be interviewed and keeping tallies on their race, gender and approximate age. The report presents overwhelming evidence that errors were higher for interviewers with less experience. One hypothesis might be that some interviewers made improper substitutions without recording refusals appropriately.
Consider my hypothetical in the last post: A husband is selected as the nth voter but refuses. His spouse offers to complete the survey instead. The interviewer breaks with the proscribed procedure and allows the spouse to do the interview (rather than waiting for the next nth voter). [Note: this is actually not a hypothetical - I exchanged email with a professor who reported this example after debriefed students he helped recruit as NEP interviewers]. Question is: would the interviewer record the husband as a refusal? The point is that the same sloppiness that allows an eager respondent to volunteer (something that is impossible, by the way, on a telephone survey) might also skew the completion rate tallies. Presumably, that is one reason why Edison/Mitofsky still plans to conduct "in-depth" interviews with interviewers in Ohio and Pennsylvania (p. 13) - they want to understand more about what interviews did and did not do.
Third, there is the possibility that some Bush voters chose to lie to the exit pollsters. Any such behavior would have no impact on the completion rate statistics. So why would a loyal Bush voter want to do that? Here's what one MP reader told me via email. As Dave Barry used to say, I am not making this up:
Most who are pissed off about the exit polls are Democrats or Kerry supporters. Such people are unlikely to appreciate how profoundly some Republicans have come to despise the mainstream media, just since 2000. You have the war, which is a big one. To those who support Bush much of the press have been seditious. So, if you carry around a high degree of patriotism you are likely to have blood in your eye about coverage of the Iraq war, alone. Sean Hannity and Rush Limbaugh had media in their sights every day leading up to the 2004 election, and scored a tremendous climax with the Rather fraud and NY Times late-hit attempt. I was prepared to lie to the exit pollster if any presented himself. In fact, however, I can't be sure I would have, and might have just said "none of your f---ing business." We can't know because it didn't happen, but I do know the idea to lie to them was definitely in my mind.
Having said all that, I think Gerry Dales has a point about the potential for interviewer bias (Noam Scheiber raised a similar issue back in November, and in retrospect, I was too dismissive). The interviewers included many college students and holders of advanced degrees who were able to work for a day as an exit pollster. Many were recruited by college professors or (less often) on "Craigslist.com." It's not a stretch to assume that the interviewers were, on average, more likely to be Kerry voters than Bush voters.
My only difference with Gerry on this point is that such bias need not be conscious or intentional.
Fifty years or more of academic research on interviewer effects shows that when the majority of interviewers share a point of view, the survey results often show a bias toward that point of view. Often the reasons are elusive and presumed to be subconscious.
Correction: While academic survey methodologists have studied interviewer effects for at least 50 years, their findings have been inconsistent regarding correlations between interviewer and respondent attitudes. I would have done better to cite the conventional wisdom among political pollsters that the use of volunteer partisans as interviewers -- even when properly trained and supervised -- will often bias the results in favor of the sponsoring party or client. We presume the reasons for that bias are subsconscious and unintentional.
In this case, the problem may have been as innocent as an inexperienced interviewer feeling too intimidated to follow the procedure and approach every "nth" voter and occasionally skipping over those who "looked" hostile, choosing instead those who seemed more friendly and approachable.
One last thought: So many who are considering the exit poll problem yearn for simple, tidy answers that can be easily proved or dismissed: It was fraud! It was incompetence! Someone is lying! Unfortunately, this is one of those problems for which simple answers are elusive. The Edison/Mitofsky report provides a lot of data showing precinct level characteristics that seem to correlate with Kerry bias. These data make a compelling case that whatever the underlying problem (or problems), they were made worse by young, inexperienced or poorly trained interviewers especially when up against logistical obstacles to completing an interview. It also makes clear (for those who missed it) that these problems have occurred before (pp. 32-35), especially in 1992 when voter turnout and voter interest were at similarly high levels (p. 35).
Many more good questions, never enough time. Hopefully, another posting later tonight... if not, have a great weekend!
January 20, 2005
Impressions on the Exit Poll Report
I wanted to read the full 77 pages of the Edison/Mitofsky report in full before weighing in. It took much of the day, but here are some first impressions.
This report will not answer every question nor assuage every doubt, but credit is due for its uncharacteristic degree of disclosure. Having spent much time recently reviewing the scraps in the public domain on past exit polls, I am impressed by the sheer volume of data and information it contains. Yes, the report often accentuates the positive -- not surprising for an internal assessment -- but it is also reasonably unflinching in describing the exit poll errors and shortcomings. Yes, this data has been long in coming, but better late than never. We should keep in mind that we are dealing with organizations that are instinctively resistant (to put it mildly) to "hanging out [their] dirty underwear."
Say what you will about its conclusions, this report is loaded with never before disclosed data: The final exit poll estimates of the vote for each state along with the sampling error statistics used internally, exit poll estimates of senate and gubernatorial contests and their associated errors in every state, other "estimator" data used on election night, tabulations of "within precinct error" (WPE) for each state going back to 1988, and a very thorough review of the precinct level characteristics where that error was highest in 2004.
For tonight, let me review the three things that stand out most for me in this report:
First, the report confirms a great deal we suspected about the exit poll interviewers - who they were, how they were recruited and trained (pp. 49-51). The interviewer is where the rubber hits the road for most surveys. While exit polls do involve "secret ballots," the interviewer is responsible for randomly selecting and recruiting voters according to the proscribed procedure and keeping an accurate tally of those they missed or that refused (something to keep in mind when analyzing the completion rate statistics). The interviewer is also the human face of the exit poll, the person that makes a critical first impression on potential respondents.
The report confirms that interviewers were often young mostly inexperienced. Interviewers were evaluated and hired with a phone call and trained with a 20-minute "training/rehearsal call" and an interviewer manual sent via FedEx. They were often college students -- 35% were age 18-24, half were under 35. Perhaps most important, more than three quarters (77%) had never before worked as exit poll interviewers. Most worked alone on Election Day.
One obvious omission: I may have missed it, but I see no comparable data in the report on interviewer characteristics from prior years. Was it not available? Also, the report mentions a post-election telephone survey of the interviewers (p. 49). It would seem logical to ask the interviewers about their partisan leanings, especially in a post-hoc probe, but the report makes no mention of any such measure.
Second: There was no systematic bias in the random samples of precincts chosen in each state. The proof of this is relatively straightforward: Replace the interviews in each precinct with the actual votes and compare the sample to the complete count. There were errors, as with any sample, but they were random across the 50 states (see pp. 28-30). If anything those errors favored Bush slightly. Blogger Gerry Dales explains it well:
In other words, when the actual vote totals from the sampled precincts were used, they did successfully represent the overall voting population quite well. Had they sampled too many Democratic leaning precincts, then when the actual vote results were used rather than the exit poll results, the estimate would not have provided a very good estimate of the final vote count (it would have overstated Kerry's support). The problem was not in the selection of the sample precincts- it was that the data in the chosen precincts was not representative of the actual voting at those precincts [Emphasis added].
Third and finally: The centerpiece of the report concerns the investigation of that "within precinct error" (WPE) found on pages 31 to 48. If you have time to read nothing else, read that. The authors review every characteristic that correlates with a greater error. They found higher rates of "within precinct error" favoring Kerry in precincts with the following characteristics:
- An interviewer age 35 or lower
- An interviewer with a graduate degree
- A larger number of voters, where a smaller proportion were selected
- An interviewer with less experience
- An interviewer who had been hired a week or less prior to the election
- An interviewer who said they had been trained "somewhat or not very well."
- In cities and suburbs
- In swing states
- Where Bush ran stronger
- Interviewers had to stand far from the exits
- Interviewers could not approach every voter
- Polling place officials were not cooperative
- Voters were not cooperative
- Poll-watchers or lawyers interfered with interviewing
- Weather affected interviewing
The report pointedly avoids a speculative connecting of dots. They apparently preferred to present "just the facts" and leave the conjecture to others. Unfortunately, none of the characteristics above, by itself, "proves" the Kerry supporters were more likely than Bush supporters to participate in the poll. However, it is not hard to see the underlying attitudes and behaviors at work might create and exacerbate the within-precinct bias.
Consider age, for example. What assumptions might a voter make about a college student approaching with a clipboard? Would it be crazy to assume that student was a Kerry supporter? If you were a Bush voter already suspicious of the media, might the appearance of such an interviewer make you just a bit more likely to say no, or to walk briskly in the other direction? Would it be easier to avoid that interviewer if they were standing farther away? What if the interviewer were forced to stand 100 feet away, among a group of electioneering Democrats - would the Bush voter be more likely to avoid the whole group?
Writing in the comments section of the previous post, "Nathan" made a reasonable hypothesis about the higher error for interviewers with advanced degrees:
Voters (almost certainly accurately) concluded that interviewers were liberal and thus Kerry voters were more likely to talk to them...throw in any sort of additional colloquy engaged in between the interviewers and interviewees and there you have it.
Now consider the Kerry voter approaching the same college student interviewer. Might that voter feel something opposite of a Bush voter -- a bit more trusting or sympathetic toward the interviewer? And suppose the randomly selected voter did not want to participate but his wife - a Kerry supporter - eagerly volunteers to take his place. Would the less experienced interviewer be more likely to bend the selection rules so she could take the poll?
The problem with all of this speculation - plausible as it may be - it that it is nearly impossible to prove to anyone's satisfaction. That is the nature of non-response. We know little about those who refuse because...we did not interview them.
I want to try to answer my friend Gerry Dale's observation (on his blog and in the comments section here) about the pattern of response rates by the partisanship of the precinct (I think he is too quick to dismiss "differential non-response"), but it's really late and I need to get some sleep. I'll post tomorrow morning...promise.
Also, if anyone has any questions about the report, please post them or email me. It is a bit dense and technical and, at very least, I can help translate.
January 19, 2005
"Media Whore" Alert*
I am likely to be on ABC's Nightline tonight. This evening's broadcast, barring other "breaking news," will examine the various questions raised about the accuracy of the vote count, including the exit poll controversy. Chris Bury interviewed me for the broadcast, so a soundbite is likely. MP's immediate family will want to set their VCRs and DVRs appropriately.
In news of more import beyond MP, I am told that Warren Mitofsky has been interviewed for the program as well and will likely discuss findings from the report released today.
*Thanks to Dan Drezner for inspiring the subject line and recommending MP to the folks at Nightline.
Breaking News: NEP Releases Full Report
A bit of breaking news: The full 77-page "evaluation" of the exit polls prepared by the firms that conducted the exit polls was released this morning and can be downloaded from their website, exit-poll.net .
Some highlights from the Executive Summary:
Our investigation of the differences between the exit poll estimates and the actual vote count point to one primary reason: in a number of precincts a higher than average Within Precinct Error most likely due to Kerry voters participating in the exit polls at a higher rate than Bush voters...
Exit polls do not support the allegations of fraud due to rigging of voting equipment. Our analysis of the difference between the vote count and the exit poll at each polling location in our sample has found no systematic differences for precincts using touch screen and optical scan voting equipment....
Our detailed analysis by polling location and by interviewer has identified several factors that may have contributed to the size of the Within Precinct Error that led to the inaccuracies in the exit poll estimates. Some of these factors are within our control while others are not.
It is difficult to pinpoint precisely the reasons that, in general, Kerry voters were more likely to participate in the exit polls than Bush voters. There were certainly motivational factors that are impossible to quantify, but which led to Kerry voters being less likely than Bush voters to refuse to take the survey. In addition there are interactions between respondents and interviewers that can contribute to differential non-response rates. We can identify some factors that appear to have contributed, even in a small way, to the discrepancy. These include:
- Distance restrictions imposed upon our interviewers by election officials at the state and local level
- Weather conditions which lowered completion rates at certain polling locations
- Multiple precincts voting at the same location as the precinct in our sample
- Polling locations with a large number of total voters where a smaller portion of voters was selected to be asked to fill out questionnaires
- Interviewer characteristics such as age, which were more often related to precinct error this year than in past elections
Seperately, Warren Mitofsky informed the email listserv of the American Association for Public Opinion Research (AAPOR) that the individual level respondent data are being sent today to the Roper Center Archives at the University of Connecticut and the Inter-University Consortium for Political and Social Research (ICPSR) at the University of Michigan. The data at the Roper Center should be available in "about two weeks."
Obviously, I want to read the full report carefully...more later...
January 18, 2005
NEP to Release Edison/Mitofsky Report?
This morning, USA Today's Mark Memmott reports that this week, Edison/Mitofsky, the firm that conducted the National Election Pool (NEP) exit polls, "will tell the news organizations that paid for them what, if anything, they think went wrong."
But will that report see the light of day? That, according to Memmott, "remains unclear:"
Edie Emery, a spokeswoman for the six-member media consortium that paid for the exit polls, says representatives from ABC, CBS, CNN, Fox News, NBC and the Associated Press want to review the report before making any decisions about what to make public.
The behind-closed-doors delivery of the report could come as soon as today. Because the report's conclusions might not be made public, the report is unlikely to appease critics who say the six media companies have moved too slowly to release information collected in the exit polls and have said too little about possible problems with those surveys.
"It's amazing to me that there's even a possibility that the report won't be released to the public," says Larry Sabato, director of the Center for Politics at the University of Virginia. "There was a major national controversy involving the integrity of the news organizations and of the polling firms involved."
Deeper in the story, Memmott adds a bit more:
Warren Mitofsky and Joseph Lenski, two experienced pollsters hired before last year's election to overhaul and run the exit poll system, have been reviewing whether their early work on Election Day was flawed. They and the six media companies have said almost nothing about that review...
The news groups defend their actions. Their position: Since they paid for the information to be gathered from voters, they can handle the data and questions about them as they see fit. They plan, Emery says, to follow past practice. That means information gathered by the exit pollsters - showing, for example, breakdowns in support for Bush and Kerry by age, gender and race - will soon be made public.
The story has more to interest MP readers about the exit poll controversy and some general thoughts about news media coverage of polling. Read it in full.
A cynical hunch: The report will be released to the public, but at about noon on Thursday.
A less cynical, more serious hunch: The networks will release the report. Moreover, breaking from past practice, the "raw data" that eventually lands at the Roper Center archives will include all the necessary weighting variables to allow scholars to replicate the "complete" exit poll results that Mitofsky and Lenski are reporting on that were not "corrected" to match the final vote tallies. This will allow scholars to check the exit pollster's conclusions and test their own hypotheses. That's mostly speculation, of course, but put me down as an optimist on this one.
January 14, 2005
Junk Polls: People's Choice Awards
And now for something completely different....
Just for a change of pace, let's shift from a topic as serious as election reform to something as trivial as the People's Choice Awards. I do so, not to condemn this crown jewel of American pop culture, but to kick-off what I hope to be a running series on Mystery Pollster on the misuse of junk polls.
Last Sunday night, the People's Choice Awards were presented on live national television. In a surprise twist, Michael Moore's film "Fahrenheit 9/11" won the award for favorite movie, while Mel Gibson's The Passion of the Christ" won in the category of favorite drama. What caught MP's eye was this wrinkle reported by the website GoldDerby.com:
Controversy will certainly erupt after the victories of both films when critics ask: Do the People's Choice Awards REALLY reflect the views of the American public? Arguably, they did so in the past when winners were determined by a Gallup Poll survey, but voting was switched this year to less expensive -- and less scientific -- Internet balloting that's easily manipulated by the zealous political and religious supporters. "Tinseltown has been buzzing about organized campaigns on behalf of Moore's Bush-bashing 'Fahrenheit 9/11,'" reports today's New York Post and gossipmeisters say the same is true about Mel Gibson's disciples crusading fervently for "Passion of the Christ."
Generally, when the news media describe a survey as "scientific" they mean that it was based on a random sample that aims to represent some larger population. A better word might be "projectable" - a survey or poll based on random sampling (sometimes called "probability sampling") allows for statistical estimates of some larger population. Non-random "votes" may have entertainment value, but they do not allow for such projectable estimates, especially when the voters select themselves. They cannot project estimate the views of some larger population. (The National Council on Public Polls has a list of questions journalists should ask to determine the merits of online polls. It discusses these issues in more detail).
Now, MP has no quarrel with the People's Choice Awards nor other popular online or call-in "votes" like the All Star balloting for professional baseball and basketball, American Idol or the myriad "non-scientific" online polls that appear daily. MP himself ran such a reader "survey" just before the election (the oh-so serious term is "convenience sample"), that told him only about the roughly 3,000 readers who filled it out, not the much larger number who browsed the site in that period but did not bother to fill out the survey (actually, MP considered the most significant finding that 3,000 readers were willing to complete such a survey at all, but that's another story. MP further notes that he still owes his readers a summary of those results).
The problem with these "unscientific" surveys is that someone inevitably makes the mistake of treating them as if they were projective random samples. Case in point: Michael Moore, whose soundbite appeared earlier this week in a story by Sandra Hughes on the CBS Evening News (my transcription):
[Hughes:] Filmmaker Michael Moore says the win may be just what he needs to convince Academy Awards voters. [Moore:] "It's safe to vote for this film, because the People's Choice is a poll of red state and blue state America."
Not exactly. It was a vote in which anyone living in red state or blue state America, or anywhere in the world for that matter, could choose to participate. It was not, by any stretch of the imagination, a representative survey of Americans, especially if Moore waged a campaign to get his fans to vote for his film. (Moore's newfound faith in polls is heartening, especially since he was trashing political polls as recently as September: "You are being snookered if you believe any of these polls").
As a public service, MP will continue to report examples of the misuse of "unscientific" polls, trivial and serious. Please email me your nominations for Junk Poll of the...hmm..Week? Month? We'll have to see how it goes.