« Good Analysis of Post/ABC Poll | Main | MoveOn vs. Gallup »

September 29, 2004

Why & How Pollsters Weight, Part I

I want to return to the issue of weighting, beginning with a question commenter Ted H:

What puzzles me about the idea of weighting by party ID is that surveys are supposedly designed to sample from the population randomly. This assumption is also the basis of the margin of error they calculate. So if you sample randomly but then adjust the results because you didn't get the percentages of D and R that you expected, you have thrown out the design of the survey.

Unfortunately, as many of you guessed, perfect random samples are impossible under real world conditions. Although we begin with randomly generated telephone numbers that constitute a true random sample of all US telephone households, some potential respondents are not home when we call. Others use answering machines or Caller ID to avoid incoming calls, and increasingly large numbers simply refuse to participate. All of these unreachable respondents create the potential for what methodologists call "non-response bias.” (This is a major topic for further discussion – if you can’t wait, this review of the vexing issue by ABC Polling Director Gary Langer is one of the best available anywhere).

Telephone surveys also exclude the small percentage of households lacking home telephone service. Unfortunately, this percentage is growing rapidly due to the number (estimated at 2-5%) that has switched off their wired phone service altogether in favor of cell phones (another important topic that I am purposely glossing over for now – we’ll definitely come back to it).

These missing respondents can cause error – also known as bias – only if the missing respondents are different from those interviewed. Pollsters weight data as a strategy to reduce observed bias.

Pollster strategies for weighting seem to fall into three general categories. I’ll take up the first tonight and discuss the others in subsequent posts over the next few days.

The first involves the classic strategy used by most of the major national media surveys, including CBS/New York Times, ABC/Washington Post,* Gallup/CNN/USA Today, Newsweek, Time, The Pew Research Center and the Annenberg National Election Survey among others. They begin by interviewing randomly selected adults in a random sample of telephone households. Even if they ultimately report results for only registered voters, they ask demographic questions of all adults. They then typically weight the results to match the estimates provided by the U.S. Census for gender, age, race education and usually by some geographic classification. This weighting eliminates any demographic bias, including chance variation due to sampling error.

The key point is that they weight only by attributes that are fixed at any given moment, easily described by respondents and matched to bulletproof Census estimates using language that typically replicates the Census questions.

Finally, to be clear, none of these organizations weights by party! Contrary to what I have seen written elsewhere neither CBS/New York Times, ABC/Washington Post,* Gallup/CNN/USA Today, Time, Newsweek, The Pew Research Center and nor the Annenberg National Election Survey weights their results by Party ID. [UPDATE: ABC News does weight its October tracking survey of likely voters by party (but not registerd voters and not surveys prior to October)]

What about CBS? I’ve read several posts – including one on the comments section of this blog – that repeat the myth that CBS News weights by party. The proof? The table below (which I reproduced from the last page of this release) which shows weighted an unweighted samples sizes for Democrats, Republicans and Indepedents. CBS provides this data because their report includes tabulations by party for each question, and they are providing disclosure of their sub-sample sizes exactly as required by the disclosure standards of the National Council for Public Polls. They go even one step further, providing unweighted counts to assist in those who want to calculate sampling error.

UNWEIGHTED WEIGHTED
Total Respondents  1083
Total Republicans  377 339
Total Democrats  376 381
Total Independents  330 364
Registered Voters  931 898
Reg. Voters –- Republicans 343 323
Reg. Voters –- Democrats 324 332
Reg. Voters –- Independents 264 246

So why does it appear that Republicans have been weighted down? Among total respondents, for example, the share of Republicans is lowerhigher for the unweighted sample (35% or 377/1083) than the unweighted sample (31% or 339/1083). The reason is that respondents are less cooperative in urban and suburban areas and more cooperative in rural areas. Presumably, when they correct the sample to match verifiable census data, the weighting indirectly – but appropriately – altered the party balance.

If I’m getting this wrong and anyone at CBS is reading this, please do not hesitate to email me with a correction.

Tomorrow, those who weight by exit polls.

9/29: Typo corrected above

[Continue with Why & How Pollsters Weight, Part II]

Related Entries - Weighting by Party

Posted by Mark Blumenthal on September 29, 2004 at 12:06 AM in Weighting by Party | Permalink

Comments

"this review of the vexing issue by ABC Polling Director Gary Langer is one of the best available anywhere"

The problem with Langer's analysis is that it doesn't address the idea that the hard-core non-responsive population might be significantly different from the rest of the population.

He merely contents himself with data supporting the idea that those easy to reach are similar to those somewhat more difficult to reach.

Posted by: Petey | Sep 29, 2004 2:13:44 AM

You have your weighted and unweighted Republican numbers reversed in your text.

Posted by: Fran | Sep 29, 2004 8:55:40 AM

Mark:

The disclosure standards you reference require what exactly of pollsters? And these requirements would obviously apply to Gallup, correct?

Posted by: Steve Soto | Sep 29, 2004 9:42:29 AM

When matching education to the Census, inevitably there is a near doubling of those who did not graduate from high school. I worked for PSRA who does the Newsweek and Pew data collection and weighting and the highest they would ever get in a phone survey of non high school graduates is 9-10% among adults and the number would be similar for registered voters. When they match CPS they bring it up to 16% which is a large weight and that group is one of the most pro-Democratic, but the group that swings wildly from poll to poll, in a sample.

However, the demographic where the exits polls and CPS (both among adults and in their voter supplement sample) varies most is education. The exit polls have non high school graduates at 8-9%, where most phone surveys come in, and CPS at 15%-16%. So while CBS and Pew, etc. might not weight by party, where they weight education has a major bearing on the results. Also, this non-educated group does fluctuate more wildly than other groups and is where there likely is a larger non-response bias as you are dealing with a population of people in poverty and least engaged with politics.

The problem with a lot of these weight schemes that match region, age, sex, education is that they use the same formula every time and do not look at where the demographics within region or subgroup are consistent with previous surveys so if non high school graduates switch from 50% Dem-30% GOP to 30% Dem - 50% GOP in a smaller sample that gets weighted by two, there is no systematic look to see if the distribution of that group is balanced regionally or by age or by gender. The weights force the overall demographics to be the same, and the iterative programs are complex, but since the entire sample is weighted to these variables, interal fluctuations vary wildly between surveys. It would be better if each region is weighted individually, but from my experience with those two organization, this does not happen.

Sorry if this is rambling but this is an area that drives me crazy with a lot of these polls. These news organizations have interviewed thousands of people in the last fews months and they know extensively what the demographics of voters are in terms of where they are politically and the demographic balance within subgroup that is a better representation to weight to than just blindly weighting to the Census.

Posted by: Stephen Clermont | Sep 29, 2004 10:20:33 AM

Great work here. Thanks for such informative postings. A few more questions to address in terms of weightings:

What about people who work at night when pollsters call?

What about people who go to school at night?

What about people who work two jobs?

What about people who work late?

What about people who work swing shifts?

What about people who are traveling? How many Americans are on the road at any given time for work or family? Lots. Do they skew Democratic or Republican?

Also, the number of people who only have cell phones is one problem that could be bigger than statistics show. Why? Because many many people have both landline and cell phone, but rarely pick up the landline since only strangers end up calling them there. So you can't simply go by the published statistics on cell-only households. The problem is bigger than that.

Posted by: Simka | Sep 29, 2004 11:48:34 AM

If the sampling is truly random, I would expect some polls to show over-sampling of Democrats as often as over-sampling of Republicans. Yet, Gallup seems to consistently over-sample Republicans, assuming a major shift in Party ID has not occurred. Doesn't this raise a red flag? Or maybe the headline story is really "Pollsters conclude a major shift in party ID has occurred." Reasons why this might have occurred over the last four years would then need to be supplied.

Posted by: Karl Vischer | Sep 29, 2004 5:23:34 PM

I'm hoping you'll have time to write a little about *how*
weighting for demographics is done. Is there some sort
of model that's used that relates a number of different
demographic categories (sometime like Principal
Components Analysis), or is each category adjusted
independently? I wonder, for example, if it's possible
to make errors by dumb weighting of the data, such as
a sample that has too few men getting overly weighted to
be Republican if men are also more likely to be Republican.

This came to mind when I saw a post at Eschaton a short while
back, where he claimed that a poll (I can't remember which)
had also asked who the respondents had voted for in 2000,
and the results were something like Bush over Gore by
about 10 points.

Posted by: matt | Sep 29, 2004 6:20:23 PM

Matt - that isn't necessarily a sign of sample bias, in fact it's quite normal. There has been a lot of study of it in the UK, where some pollsters weight by past vote to get a politically balanced sample rather than by party ID.

In short people aren't very good at recalling their past vote - people who didn't vote pretend they voted to look socially responsible, people tend to claim they voted for whoever won the last election, people tend to project how they would vote today onto how they voted at the last election and people tend to forget if they voted for a third party.

The result is that even a properly representative sample's recall of its past vote will not match the reality of the last election (Hence UK pollsters weight to an adjusted past vote, not what actually happened).

The phenomenon is certainly not confined to the UK: to give an extreme example - JFK got about 49% of the vote in 1960, but polls taken after his assassination apparantly found 66% of people were claiming to have voted for him.

Posted by: Anthony | Sep 30, 2004 6:59:12 AM

The comments to this entry are closed.