Fun With Unweighted Interview Counts

Exit Polls Legacy blog posts

[UPDATE:  The original version of this post included a small error that may have added to the confusion on this issue.   See my corrections below]

After the appearance of newly leaked exit poll data last week, one commenter on this site asked about some apparent conflicts in the number of interviews the National Election Pool (NEP) reported doing in their "national" sample on Election Day. The answer – chased down in part by another alert MP reader/commenter – is that a complex wrinkle in the weighting procedure created an artificially high "unweighted" total that slipped past the exit poll webmasters at CNN and CBS. As such, while the exit poll tabulations on those sites are correct, the total number of interviews that appears for the national survey is in error.

Let’s start with a statement I made last week:

Keep in mind that the 7:33 PM sample from election night was incomplete. It had 11,027 interviews, but the next day NEP reported 13,660. The missing 2,633 interviews, presumably coming mostly from states in the Midwest and West, amounted to 19% of the complete sample [emphasis added].

Alert commenter "Luke" pointed out that my conclusion seemed to conflict with information provided in the official methodology statement on the national exit poll on the official National Election Pool (NEP) website:

The National exit poll was conducted at a sample of 250 polling places among 11,719 Election Day voters representative of the United States. In addition, 500 absentee and/or early voters in 13 states were interviewed in a pre-election telephone poll.

As Luke points out, those numbers clash with those in the pdf files: 11,719 + 500 = 12,219. That total is quite a bit less than the 13,660 interviews in the "final" tabulations available either in the pdf on the Scoop web site and displayed in the "final tabulations" now available on the CNN and CBS websites. It is also less than the 13,047 interviews in the screen shot of the national survey taken by Jonathan Simon from CNN at 12:23 a.m. on November 3. Luke suggested that these totals indicate "much interview-stuffing on Nov3 to produce the requisite swing to Mr Bush."

What’s the story?

After having a busy morning away from the blog, I put in a call to Edison/Mitofsky Research (EMR) to see if they could help clear up the mystery. It turns out that Rick Brady, the author of the blog Stones-Cry-Out and a ubiquitous presence in MP’s comments section, had already emailed EMR with the same question, and posted the answer in the comments section of Friday’s post. I have copied the email to Rick after the jump, but let me try to explain. Unfortunately, it gets a bit technical and confusing, so bear with me, as it does explain the apparent inconsistency.

Here’s the gist: The confusion stems mostly from the way NEP applied the split sample design used on the national sample to the 500 interviews done by telephone among early and absentee voters. If you look at the questionnaire PDF for the national exit poll, you will notice that it has four different versions. The obvious questions (the vote for president, for Congress and the basic demographics) appear on all four versions, while other questions (such as those about Iraq, gay marriage or cell phone usage) appear on only one form or perhaps two. This is a common survey practice that allows the pollster to ask more questions while keeping to a reasonable average questionnaire length. The main disadvantage of "splitting the form" that way is greater sampling error for those questions asked of a random half or a quarter of the sample. In this case, however, even the quarter samples (with over 3,000 respondents) were large by conventional polling standards.

All of this is reasonably clear from the official NEP documents. The confusion arises from the way they handled the 500 interviews done by telephone among those who voted early or by absentee ballot. For the telephone surveys, NEP did not use a split form. These 500 respondents had a very long interview that asked every question that appeared on any of the four forms.

Why would this inflate the sample size? The programmers at NEP used a shortcut to enable separate tabulations of the split form questions: They replicated the data for each of the telephone interviews four times in the data file, so that every telephone respondent had a separate record to match each of the four forms used for the in-person data.

Follow that? If not, just understand that in the unweighted data file, the 500 telephone interviews were quadruple counted. This procedure did not throw off the tabulations, because when they ran tabulations for all forms in the full national sample, they weighted the telephone interviews down by the value of 0.25. However, the PDF crosstabs (as reproduced on the CNN and CBS websites) show the unweighted number of interviews, without labeling them as such.

Now those of you who are following this discussion with calculator in hand may notice that the numbers still do not quite add up. They did 11,027 11,719 interviews at polling places and 500 interviews by telephone for the national exit poll. The unweighted total would be 11,027 11,719 + (500*4) = 13,027 13,719. The total in the Scoop PDFs (and on CNN and CBS) was 13,660. That still leaves the total 633 interviews short of the number listed on the presidential cross tabs.

UPDATE: Let’s try that again.  I originally used the wrong number in the paragraph above.  NEP reported doing 11,719 interviews plus the 500 done by telephone. Thus, their "unweighted" total was 13,719 — that’s 59 respondents short of what appears on the Scoop, CNN and CBS crosstabs.   Now..continuing with what I originally wrote…

Turns out that the difference is the very small number of respondents who left the question on the presidential vote blank (or who would not provide an answer on the telephone survey). Unfortunately (for an analysis of "undervoting" – a topic for another day) we do not know how many of these 633 missing respondents were from polling place interviews and how many were the quadruple counted telephone respondents. The email to Rick Brady has one clue: They say, as an example, that in Alabama 4 of 740 respondents (0.5%) – all interviewed at their polling place – left the presidential vote question blank.

UPDATE – NEP emails with this additional bit of information:  "The difference of 59 respondents comes from 31 respondents who answered that they did not vote for president and 28 respondents who omitted that question.  This is 0.4% of the respondents which is very similar to the 0.5% omits that we had in Alabama for example."

So, what we have is a very confusing bit of data processing – so confusing that it also fooled the folks at CBS and CNN that put the numbers online. Much as some would hope otherwise, it is not evidence of anything sinister.

UPDATE: The confusion was compounded by my own human error.  Responsibility for this last goof is mine alone.  Apologies to all and thanks to Luke (see the comments section) who ultimately caught my error. 

A big hat tip to Rick Brady for taking the initiative on this issue. As a result, this is not news to those of you who have been reading the comments. The full text of the email he received from NEP follows after the jump. Tomorrow, I’ll try to take up the issue of the confusing interview counts in the regional crosstabs…

UPDATE: Make that Wednesday.  I’ve got to stop posting late at night without a thorough proof-reading.

Rick, The CNN web site is displaying the number of unweighted cases that are being used in the crosstab for the presidential vote.

The methodology statement includes the actual number of respondents who filled out the questionnaire.

These two numbers can differ for two technical reasons:

The first reason is that some respondents fill out the questionnaire but skip the presidential vote question. For example in Alabama the CNN site shows 736 respondents. The methodology statement show 740 respondents. This is because 4 respondents chose not to answer the question on how they voted for president and are not included in those crosstabs but they are still included in the data file because they may have filled out how they voted in other races that day such as the Senate race.

The second reason is that respondents from the national absentee/early voter telephone survey received all four versions of the national questionnaire while election day respondents only received one version of the national questionnaire. Thus, these respondents are included 4 times in the unweighted data (once for each version of the questionnaire) but their survey weights are adjusted down so that in the weighted data each national absentee/early voter telephone survey respondent only represents one person.

Again the methodology statements state the correct number of total respondents interviewed.

I hope that this explains the differences in how these numbers were reported.

Jennifer

Mark Blumenthal

Mark Blumenthal is the principal at MysteryPollster, LLC. With decades of experience in polling using traditional and innovative online methods, he is uniquely positioned to advise survey researchers, progressive organizations and candidates and the public at-large on how to adapt to pollingโ€™s ongoing reinvention. He was previously head of election polling at SurveyMonkey, senior polling editor for The Huffington Post, co-founder of Pollster.com and a long-time campaign consultant who conducted and analyzed political polls and focus groups for Democratic party candidates.