October 09, 2004

The Bush Landslide of 2000?

A number of readers have asked about a question that regularly appears in surveys by conducted jointly by CBS and the New York Times: "Did you vote in the 2000 presidential election, did something prevent you from voting, or did you choose not to vote? IF VOTED, ASK: Did you vote for Al Gore, George W. Bush, Pat Buchanan, or Ralph Nader?"

The answers on their most recent survey had Bush at 35%, Gore 29%, Nader 2%, Buchanan 1%, won't say 2%, didn't vote 32%, don't know/no answer 1%.

Readers asked whether the six point Bush advantage is evidence of a Republican bias, since the real result was, of course, much closer. The short answer is no, and the reason is a good introduction to the challenge of selecting likely voters. Pollsters ask about the 2000 race for two reasons, first to allow tabulations of the current vote by those who voted for Gore or Bush four years ago, and more importantly, to use to help identify likely voters.

The first clue about what is really going on is that 69% of registered voters said they voted in the 2000 election. Since the CBS/NYT survey started with a sample of 979 adults and reported the results from 851 self-identified registered voters, we can calculate that those who reported voting represent 60% of the U.S. voting age population. However, the actual turnout in 2000 was only 51% of the voting age population.

This overreporting of past voting is not unusual. In fact, political scientists have consistently observed this pattern in surveys dating back to the 1940s and confirmed it with validation studies that check public records to see if individual respondents actually voted. The 9% overreport in the CBS/NYT survey is very much in line with the typical pattern.

The reason why some respondents falsely report past voting is something social scientists call "social discomfort." Some people are so embarrassed by not voting that they cannot admit it to a stranger on the telephone. For the same reasons, some respondents will avoid admitting they voted for the losing candidate. Combine overreporting and a reluctance to admit a Gore vote, and four years later, Gore's tiny popular vote victory turns into a retrospective Bush landslide.

One thing is curious: Both CBS and the Times release seperate summaries of the same data on their respective web sites. When I went looking for the retrospective vote question in the CBS release, I couldn't find it. Then with the help of alert readers Kyle D and blogger Eric Umansky (of Slate's Today's Papers) I discovered that the Times version of the same survey release includes 12 mostly demographic items that the CBS version omits, including the question on the 2000 vote. If you compare the last few pages of both documents, you'll notice that the Times version has question numbers for only the demographic items that also appear in the CBS version, so when you read the CBS release, it seems like nothing is missing. There is nothing particularly sinister at work here, except that the folks at CBS - like the very busy people at most every other national survey organization - prefer to avoid persnickety questions about their demographic items. So they hide them. Call me an idealist, but if survey researchers would open ourselves to annoying questions like these more often, somebody might learn something.

Posted by Mark Blumenthal on October 9, 2004 at 01:27 AM in Likely Voters, Measurement Issues | Permalink


On transparency:

I wish that all polls would include complete lists of raw data coupled with the polling firms results after appling their model. Now that would allow for some nice questions...

Posted by: Scott Pauls | Oct 9, 2004 9:46:15 AM

I'm still unconvinced. I buy the social science argument, but
the whole correction business seems pretty weak. I notice,
for example, that the NYT (in the link specified above) has
*already* reweighted their sample. [Either that, or
they are spectacularly lucky to get the identical 2002 Census
Bureau registered voter demographics.]

But do they (does anyone) report (1) the raw, preweighted
numbers--particularly the demographics of the actual sample
and (2) the response rate? If Mark could do that, it
would help those of us trying to understand the sausage

And as an example of what bothers me, although the NYT
reports the reweighted demographics of age and of sex and
of income, etc., *separately*, there is no reporting of the
combined demographics. That is, if they have corrected for
age and sex separately, they could end up with a sample
that has older men and younger women, but that averages
out ok. It is this sort of thing that makes the GOP/Democrat
spits so suspicious to me.

Posted by: Matt Newman | Oct 11, 2004 11:32:56 PM

