« January 2005 | Main | March 2005 »

February 25, 2005

Should Jeb Run...Or Not:?

The Hotline (subscription required) caught an intriguing polling conflict yesterday.  A just released Quinnipiac poll of 1,007 Florida "voters," conducted February 18-22, showed 25% said yes when asked, "would you like Jeb Bush to run for President in 2008?"  A survey of "registered voters" conducted February 16-20 by Strategic Vision showed 57% saying yes on a virtually identical question. 

How could that be?   Let's take a closer look.

First, Quinnipiac and Strategic Vision typically sample different target populations, what survey methodologists call the "sample frame."  In its 2004 pre-election surveys Quinnipiac sampled all households with a working telephone using a random-digit-dial (RDD) methodology and started with a sample of all adults and screened for "likely voters."  Strategic Vision was one of the few public polling organizations that draws its sample from lists of registered voters.   

(An aside:  I italicized "typically" above because neither organization makes the sample frame clear in their press releases.  Quinnipiac says only that they surveyed "voters," Strategic Vision says they surveyed "registered voters."  Back in October, both organizations Quinnipiac responded to my requests for information back in October when I put together a lengthy summary of likely voter selection procedures.  Were I a reporter, I would contact both organizations before posting this, and I am confident that both Quinnipiac and Strategic Vision would be responsive.  But I am a blogger, and I am approaching this issue from the perspective of the consumer.  We are in the dark.  I cannot understand why public pollsters cannot include such basic sampling information in press releases that span 10 pages.  I will contact both pollsters and pass along more information as they make it available).   

Update/Clarification:  Strategic Vision is a public relations firm that polls for Republican candidates.  When I first drafted this post, I had them confused with another public polling organization that I did contact last October.  Nonetheless, David Johnson of Strategic Vision emailed to confirm that their Florida sample was in fact drawn from a list of registered voters.  I have copied his comments below.  Doug Schwartz of the Quinnipiac poll emailed from his vacation to confirm their survey was based on an RDD sample, but they also screened to include only self-indentified registered voters in the final sample.  Thus, both surveys aimed for the same target population -- registered voters.

The debate about the use of voter lists is interesting and complex (and MP is reminded that he needs to weigh in on it).  The key point here is that RDD and registered voter list methodologies sample different target populations:  First, the population of registered voters is obviously smaller than the population of all adults.  Second, a list of voters will miss those with unlisted telephone numbers and those whose numbers cannot be appended from available databases. Are these differences enough to explain a 32 percentage point difference on the question of whether Jeb Bush should run?

Another question I had was about the questions that preceded the item about Jeb Bush running.  Is it possible that these created a bias?

So, with the help of the releases (thanks to the folks at the Hotline) let's look at the questions asked by both organizations.  I am assuming that the order of questions in the release matches the order in which they were asked.  If I'm assuming correctly, the Should-Jeb-Run item was near the beginning of the questionnaire:

Quinnipiac:

  • Do you approve or disapprove of the way Jeb Bush is handling his job as Governor? (52% approve)
  • Do you approve or disapprove of the way the state legislature is handling its job? (46% approve)
  • Do you approve or disapprove of the way George W. Bush is handling his job as President? (44% approve)
  • Would you like Jeb Bush to run for President in 2008 or not?  (25% yes)

Strategic Vision:

  • Do you approve or disapprove of President Bush's overall job performance?  (56% approve)
  • Do you approve or disapprove of President Bush's handling of the economy? (55% approve)
  • Do you approve or disapprove of President Bush's handling of Iraq? (57% approve)
  • Do you support or oppose President Bush's Social Security reform? (42% approve)
  • Do you approve or disapprove of Governor Jeb Bush's job performance? (62% approve)
  • Would you like to see Governor Jeb Bush run for President in 2008? (57% yes)

Thus, the questions that precede the Should-Jeb-Run item are very similar.  Quinnipiac asks about Jeb Bush and the Florida legislature before asking about the President's job rating.  Strategic Vision asks about the President's Job rating and about three specific issue ratings before asking about Governor Bush's performance.  I cannot see any obvious reason why these subtle differences would explain the big difference on the Should-Jeb-Run item.

The overall job ratings are higher on the Strategic Vision survey - 12 points higher for the President (56% vs. 44%) and ten points higher for his brother (62% vs 52%).  It looks to me like the registered voter list sample frame explains the difference on both items . The Strategic Vision sample is probably more Republican than the Quinnipiac survey.

But the Should-Jeb-Run item is 32 points different, so even a 10-12 point difference in partisanship between the two surveys would not come close to explaining it.  This is quite a puzzle.

Other than an unlikely tabulation or programming error, I have one other admittedly far-fetched theory:  The two Should-Jeb-Run questions are not exactly alike.  The Quinnipiac version includes two important words: "or not" (as in, "would you like Jeb Bush to run for President in 2008 or not?").  Those words are there for a reason.  Some questions are prone to a phenomenon that survey methodologists call "positvity bias," which is related to the concept of the "non-attitude" that I discussed a few weeks back.  When a respondent doesn't have a strong opinion but does not want to admit it, they sometimes give a positive response.  Thus, Quinnipiac includes the words "or not" to tell respondents it is OK to say no. 

Frankly, I would not expect positivity bias alone to account for a 32 percentage point difference (or even the roughly 20 point difference that would remain if we assumed that the differences in sample frame contributed 10-12 points).   However, stranger things have happened.  One big clue is that while Strategic Survey reports that 57% of registered voters say they would like to see Jeb run, only 34% of Republicans are ready to support him.  That suggests that the 57% number, if correct, is really soft and might come out differently with subtle changes in wording. 

This theory begs a simple experimental test. Another pollster could ask two versions of the question (one with "or not," the other without) to two half samples of a single poll.  Is anyone out there going into the field in Florida anytime soon? 

One more thought:  If there were ever a time for pollsters to disclose more about the partisanship, demographics and regional composition of their samples, this is it. In this case, the two sample frames appear to be different (although even that is not clear).  The pollsters owe it to the consumers of their data to describe their samples as much as possible:  What was the demographic composition in terms of age, gender, race, Cuban or other Hispanic origin, geographic regions and partisanship?  And with respect to partisanship, did you ask respondents to report their party registration or identification?  What was the text of your party question? 

These are basic questions that need to be more available to consumers of data if we are to make sense of polls that show such divergent results.

---
Corrected grammer and typos  - 2/25

Update:    Doug Schwartz of the Quinnipiac Poll emailed from vacation.  He promises to try to answer some of the questions raised in this post next week.  .

David Johnson of Strategic Vision emailed with the following comments:

Strategic Vision interviews registered voters who have indicated that they voted in the 2002 and 2004 primaries and general elections.  It is weighed to reflect that voter registration numbers throughout the state according to region and actually had slightly more Democrats then Republicans this is because in areas like North Florida that while they may vote Republican the registration numbers indicate Party id as being Democrat. We draw our samples from lists of registered voters.
 
I believe one aspect of the differences in the two polls may be the wording as you stated but that alone does not explain 32%. Another aspect may be also the timing of when we did the two surveys, we were concluding as the Schiavo case was again re-entering the news and in the past we have seen majorities oppose Governor Bush's involvement in the case and while we had polled many before this with the re-emergence of the Schiavo case negative feelings toward Governor Bush remanifisted.  That also would not explain it.  Finally, our numbers also diverge on the gubernatorial race but are more consistent with private polling that I have seen. And also we polled about 100 more voters.

UPDATE II (2/28):

Late Friday afternoon, the folks at Quinnipiac sent over demographic results on every item I have asked about:   Age, gender, race, Hispanic origin, geographic regions and partisanship.  They also sent the text of their party ID question.  I'm still waiting to receive the same from Strategic Vision and will post a comparison when that data arrives.   

David Johnson, are you still out there?

UPDATE III (3/1):  I posted Quinnipiac's numbers  here

Posted by Mark Blumenthal on February 25, 2005 at 11:17 AM in Divergent Polls, Measurement Issues, Sampling Issues | Permalink | Comments (3)

February 24, 2005

R. Chung's "Bias" Charts

Having linked last week to a graphical representation of the President's job approval rating, I want to point to another set of graphics that takes Dr. Pollkatz's graph a step further.  The esteemed "R. Chung," a blogger of sorts who has posted occasionally to MP's comments section, did some similar charts that label individual pollsters.  Chung finds that "some polls do show systematic bias" that he says "can be larger than sampling error.  I point these out, because the charts are intriguing and because some of the differences that Chung observes could have more to do with question language and interviewer training than sample bias.

Chung's charts label each result with the name of the pollster, though he does one thing differently from the Professor Pollkatz chart I pointed to last week.  He displays the "spread" (or the arithmetic difference) between the approval and disapproval ratings.  In doing so, he identifies a few clear trends:  The polls by Fox News were consistently above average (better for Bush) and the polls by the Zogby organization were consistently lower (worse for Bush). 

Rchung

In other graphics and tables, R. Chung also shows that surveys by the Harris organization also tend to show Bush's job approval spread lower than most other surveys. 

Let's take a closer look. 

First, consider the actual text of job approval questions asked by Zogby and Harris (as reported by the Polling Report).  Most pollsters ask some version of this question:  "Do you approve or disapprove of the way George W. Bush is handling his job as president?" Zogby and Harris ask a different question:  "How would you rate the overall job President George W. Bush is doing as president: excellent, pretty good, only fair, or poor?" (Emphasis added - Zogby uses "fair" rather than "only fair").  To get an approval score both pollsters combine the excellent and good/pretty good responses, and to get a disapproval score they combine fair/only fair and poor.

MP can say from experience using both forms of the question that the "excellent, good, fair or poor" categories tend to produce higher "disapproval" percentages.  We assume that some respondents hear "fair" as a neutral category indicating neither approval nor disapproval.  Look at the Harris and Zogby numbers during 2004 and you will see that they show a consistently higher disapproval, and much lower "don't know" responses, than other pollsters.  Using the numbers reported by the Polling Report, I calculated some quick averages:  The average "don't know" response on surveys conducted during 2004 was less than 1% for Zogby, 1% for Harris and 6% for the other major pollsters (see comment below).

The lower "don't know" responses that Zogby and Harris get may result from differences in the questionnaire language or from the way they train their interviewers (all the Harris studies used were done by telephone).  The Zogby and Harris interviewers may simply push respondents harder for an answer. 

Now let's look at the Fox News surveys.  They do several things differently. 

First, there is a subtle difference in the language of the Fox job rating question.  While most other pollsters ask about "the way George W. Bush is handling his job as president," Fox asks about "the job George W. Bush is doing as president."   I have no theory as to why this difference would lead to a better score for Bush, the hypothesis would be interesting to test with a controlled experiment. 

Second, most of the national pollsters ask the presidential job ratings of the sample frame of all adults.  The Fox survey typically interviews only registered voters, and it interviewed only "likely voters" on 5 of 25 surveys.  The conventional wisdom holds that the registered voter pool is slightly more Republican (see clarification in the comments section)   Democratic, which if true, would produce the opposite effect: If registered voters are more Democratic than all adults, Bush's ratings should be lower on the Fox surveys, not higher.

(Note:  MP presented data showing that samples of likely voters tend to be more Democratic Republican than registered voters, but could not find any analysis of samples of registered voters vs. all adults).

Third, it is also worth noting that much of the difference between Fox and the other polls (excluding Zogby and Harris) during 2004 was in the percentage who expressed disapproval of President Bush.  The average approval rating for Fox surveys was 49%, exactly the same as the average of the other national surveys (again, see the explanation/caveats below).  The average disapproval for Fox was 43% compared to 46% for the other surveys, with the difference being a higher "don't know" response for the Fox surveys. 

Fourth, there is a possibility of some non-response bias.   Perhaps those who approve of President Bush were slightly more likely to participate in a survey sponsored by Fox News than by other organizations.    However, for what it's worth, Chung also charted the presidential trial heat results during 2004, and the Fox Kerry-Bush trial heat results were much closer to average.

Either way, some very elegant graphs raise some intriguing questions.

CLARIFICATION:  Lawrence Shiman of Opinion Dynamics Corporation -- the company that conducts the Fox News survey -- emails with the following helpful clarification:

You mentioned the possibility of a non-response bias based on the sponsor of the poll.  In fact, we do not identify the sponsor of the polls [in this case, Fox News] unless specifically requested, which very rarely occurs.  Even if requested, the name of the sponsor is provided only at the end of the survey, specifically to avoid non-response bias.  At the beginning of each survey, we indicate only that the survey is being conducted "on behalf of one of the major national television networks."  Therefore, while there may be valid reasons for the differences in job approval ratings for the president, it is unlikely to be a result of Fox News being the sponsor.

----

Note on averaging: I gathered job approval ratings from 2004 as reported by the Polling Report for: ABC/Washington Post, AP/IPSOS, CBS, Fox, Gallup, Harris (telephone surveys), NBC/Wall Street Journal, Newsweek, Zogby, Pew and Quinnipiac.  Although all of those organizations did surveys periodically during 2004, some polled more frequently than others.  Also, as noted in the text above, most surveyed all adults, but several surveyed only registered voters or likely voters for some or all surveys.  The differences in frequency and sample frames mean the averages are not strictly comparable.

Posted by Mark Blumenthal on February 24, 2005 at 10:21 AM in Measurement Issues | Permalink | Comments (4)

February 18, 2005

On Outliers and Party ID

Last week, the Gallup organization released a survey sponsored by CNN and USAToday, fielded February 4-6, that appeared to show a surge in President Bush's job approval rating from 51% to 57% since mid-January.  "The Iraqi elections...produced a bump in President Bush's approval rating," said CNN.  "Americans gave President Bush his highest job approval rating in more than a year," read USAToday

Gallup immediately went into the field with a second poll conducted February 7-10 that showed the Bush job rating back down at 49%, "slightly below the levels measured in three January polls, and well below the 57% measured in Gallup's Feb. 4-6 poll."  Unlike the first survey, this one was not co-sponsored with CNN and USAToday, and thus as blogger Steve Soto put it, this poll did not get "bull-horned through the media" the same way as the first.

As such, I want to consider the question Soto raised Monday on TheLeftCoaster: "How often is there a 16% swing in a public opinion poll in one week?"

The short answer is, not very. 

But then I never seem satisfied with short answers, do I?  Let's take this one step at a time. First, a minor quibble:  "shifts" in polling numbers always seem more dramatic when you compare the margins, or in this case the difference between the approval and disapproval ratings because doing so artificially doubles the rate of change.  The February 4-6 survey showed 57% approving Bush's performance, 40% disapproving (for a net +17).   The second survey showed 49% approval and 48% disapproval (net +1).  Thus, 17-1 = a 16 point shift.  The problem - if this shift were real - is that it would have only involved about 8% of the population changing their opinion.   That number is still quite large, and would certainly be outside the reported sampling errors for samples of 1,000 interviews, but does not sound quite as astounding as a "sixteen point shift."  Better to focus on the change in percentage expressing approval than the margin of approval minus disapproval. 

Second, I might rephrase Steve's question a bit:  "How often do we see real shifts in presidential job approval of this magnitude?" 

Rarely.  That answer is evident in the remarkable graphic maintained by Professor Stuart Eugene Thiel (a.k.a. Professor Pollkatz) and copied below.  The Pollkatz chart shows the approval percentage on every public poll released during George W. Bush's presidency.  It is obvious that the approval percentage may vary randomly at any point in time within a range of roughly 10 percentage points, but trends are evident over the long term that tend to be slow and gradual.  The exceptions are a few very significant events:  9/11, the invasion of Iraq and the capture of Saddam Hussein. 

Pollkatzgraph

During 2004, the average Bush job rating did not vary much.  It dropped a few points in April and May during the 9/11 Commission hearings and the disclosure of prisoner abuse at the Abu Ghraib prison.  It rose a few points following the Republican convention and has held remarkably steady ever since.

The graph also shows that "big swings" do appear by chance alone for the occasional individual survey.  These are "outliers."  On the chart, a few polls fall outside the main band of points, and the February 4-6 Gallup survey is an obvious example. It shows up as the diamond-shaped pink point at the far right of the Pollkatz graphic (click on the image or here to see the fullsize version at pollkatz.com).

How often does such an outlier occur?  Remember that the "margin of error" reported by most polls assumes a "95% confidence interval."  That means that if we drew repeated samples, we can assume that 19 of 20 would produce results for any given question that fall within a certain margin of error.  However, we should expect at least 1 in 20 to fall outside of sampling error by chance alone. 

With the benefit of hindsight, it seems obvious that the February 4-6 Gallup was just such an outlier.  Other surveys done just before and  after (see the always user friendly compilation on RealClearPolitics.com) show no comparable surge and decline in Bush's job rating in early February. 

The bigger question then is what CNN, USAToday and Gallup should have done - without the benefit of hindsight - when they released the February 4-6 survey.  Steve Soto immediately requested the party identification numbers from Gallup and found that the survey also showed an unusual Republican advantage.  His commentary and that of Ruy Teixeira again raise the issue of whether surveys like Gallup should weight by party ID. 

I wrote about this issue extensively in September and October.  Here's a quick review:

Most public survey organizations, including Gallup, do not weight by party identification (Gallup has restated their philosophy on this issue in their blog, see the February 10 entry). Unlike pure demographic items like age and gender, Party ID is an attitude which can change especially from year to year (although academics continue to debate just how much and under what conditions, see the recent reports by the National Annenberg Election Survey and the Pew Research Center for discussion of long term trends in party identification).

The problem is that partisan composition of any sample can also vary randomly -- outliers do happen.  Unfortunately, when they do we get news stories about "trends" that are really nothing more than statistical noise.  To counter this problem, some pollsters, such as John Zogby, routinely weight their surveys by some arbitrary level of party identification.  The problem with this approach is deciding on the target and when, if ever, to change it.  Zogby often uses results from exit polls to determine his weight targets.  Raise your hand if you consider that approach sound given what we have learned recently about exit polls.

The conflict leads to some third-way approaches that some have dubbed "dynamic weighting."  I discussed these back in October. The simplest and least arbitrary method is for survey organizations to weight their polls by the average result for party identification on recent surveys conducted by that organization -- perhaps over the previous three to six months.  The evolving party identification target from the larger combined sample would smooth out random variation while allowing for gradual long-term change (see also Prof. Alan Reifman's web page for more commentary on this issue). 

I am not an absolutist about this, but I am less comfortable with dynamic weighting in the context of periodic national news media surveys than for pre-election tracking surveys.  There are times when dynamic party weighting would make a poll less accurate.  Consider the Pew party ID data which showed a sharp increase in Republican identification after 9/11.  Dynamic weighting surveys done at the time would have artificially and unfairly decreased Republican identifiers. 

With hindsight, it is easy to see those patterns in the data, just as it is easy to see that the February 4-6 Gallup numbers were likely a statistical aberration.  But without the benefit of hindsight, how does a news media pollster really know for certain? Media pollsters are right to strive for objective standards rather than ad hoc decisions on weighting by party. 

When Steve Soto looked at the unusual Republican tilt in Gallup's party ID numbers on their February 4-6 survey, he concluded that "Gallup appears to be firmly a propaganda arm of the White House and RNC."  I don't think that's fair.   Gallup did not "look at the electorate" and "somehow feel" that party ID should be a certain level, as Soto describes it.  Actually, Gallup did just the opposite.  They measured public opinion using their standard methodology and refused to arbitrarily tamper with the result.  We may not agree with that philosophy, but Gallup believes in letting the chips (or the interviews) fall where they may. 

I do agree with Soto on one important point:  The party ID numbers ought to be a standard part of the public release of any survey, along with cross-tabulations of key results by party identification.  Gallup should be commended for releasing data on request even to critics like Soto, but it really should not require a special request. 

Also, when a survey shows a sharp shift in party identification, news coverage of that survey should at least note the change -- something sorely lacking in the stories on CNN and in USAToday about the February 4-6 survey.  Consider this example from the Wall Street Journal's John Harwood on the recent NBC News/WSJ poll:

Public approval of Mr. Bush's job performance held steady at 50%, while 45% disapprove. Because the new WSJ/NBC poll surveyed slightly more Democrats than the January poll, that overall figure actually masks a slight strengthening in Mr. Bush's position. While the views of Democrats and Republicans remained essentially unchanged, independents were more positive toward Mr. Bush than a month ago.

Of course, this sort of analysis raises its own questions:  What was the sample size of independents? Was the change Harwood observed among independents statistically significant?  The story does not provide enough detail to say for sure. But at least it acknowledges the possibility that change (or lack of change) in the overall job rating may result from random differences in the composition of the sample.  That is an improvement worth noting.

Outliers  happen.   Truly objective coverage of polls needs to do a better job acknowledging that possibility.

minor typos corrected

Posted by Mark Blumenthal on February 18, 2005 at 09:57 AM in Sampling Error, Weighting by Party | Permalink | Comments (7)

February 11, 2005

When Respondents Lie

Yesterday, I talked to two junior high school students doing a school project on political polling.    One of their questions was, "Do people tell the truth when they answer poll questions?"  The answer is, they usually do though there may be times when they do not, especially when the question asks about something that might create embarrassment or what social scientists call "social discomfort."   If you did not vote, for example, you might be reluctant to admit that to a stranger. 

Today's column from the Wall Street Journal's Carl Bialik (aka "The Numbers Guy") provides another highly pertinent example (link is free to all):  A survey by Gallup showed that 33% reported giving donations that averaged $279 per household.  Bailik did the math and found that would add up to $10 billion contributed by US households as of January 9.  He also cited official estimates of the total donated by private sources at well under $1 billion. 

The culprit?  Social discomfort:

People tend to fudge when they feel social pressure to answer questions a certain way -- in this case, by saying they've given to a good cause.

"The interview is a social experience," says Jeffrey M. Jones, managing editor of the Gallup Poll. A USA Today/Gallup survey was the most widely cited source of the one-third statistic in news articles. "As would be the case in a cocktail party or at a job interview, you want to give a good impression of yourself. Even though you can't see the other person, and may never talk to them again, you want them to think well of you."...

Surveys about charity aren't the only ones skewed by social pressure. Negative pressure about drug use or sexual behavior can suppress responses on those topics. Conversely, questions about behavior deemed positive, such as voting, tend to elicit false positive responses. Mr. Jones cites a Gallup poll conducted between November 19 and 21 in which 84% of respondents said they had voted a few weeks before. Actual turnout among eligible (though not necessary registered) voters was 60.7%.

Bialik also has a good review of some other methodological explanation for why an initial survey had 45% reporting tsunami relief donations and the second survey a week later showing self-reported donations at 33%, even though both surveys asked the same question with identical language.      

Posted by Mark Blumenthal on February 11, 2005 at 12:59 PM in Measurement Issues | Permalink | Comments (17)

February 09, 2005

The Hotline's SurveyUSA Interview

Yesterday, the National Journal's Hotline (subscription required) took up topic of great interest to MP:  "Whether the polling community will admit it publicly or not," The Hotline's editors wrote on their front page, "there's a crisis in their industry. From media pollsters to partisan pollsters, more and more consumers of these polls are expressing skepticism over the results, no matter how scientifically they are designed."

They went on to debut a new series of debates they are "hoping to spark" in the political community, the first on the topic of "Interactive Voice Response" (IVR) polling.  They kicked it off with a long interview with Jay Leve, editor of SurveyUSA.

Now for those who are not familiar with it, The Hotline is a daily news summary that provides a comprehensive coverage of politics at the national state and congressional district level.  Unfortunately, it is only available through a pricey subscription that is out of reach to most individual readers, so I cannot link to it directly.   

However...the folks at the National Journal have kindly granted MP permission to reproduce the interview in full, as long as I cite it properly and include their copyright (and did I mention that this interview came from The Hotline, National Journal's Daily Briefing on Politics, a bargain at any price?  Good...didn't want to forget that).

Seriously, thanks to the Hotline for providing a forum for this important debate.  Whether you believe that political survey research is "in crisis" or not, there is no question that the challenges already facing random sample telephone surveys will increase significantly over the next decade.  Those challenges and the responses to them are worthy of debate, and not just among those who produce surveys but among our consumers as well. 

I'll chime in with some thoughts on Leve's interview later today or tomorrow.  Until then, here is the full interview.  The comments section is open as always. 

The following is an interview with SurveyUSA Editor Jay Leve. SurveyUSA has come under a great deal of criticism over the years for not using real people to conduct their polls, and use instead the recorded voice of a professional announcer. However, in the '04 election cycle, SurveyUSA had an impressive track record which can be viewed on their web site. In the wake of these results, we decided to ask the questions everyone wants to know the answers to and allow Leve to address his critics head on. We plan to invite many pollsters to respond to this interview; but any pollsters that would like to respond before we ask, email here.
      You have enjoyed a great deal of success in this past election cycle. How can you explain your accuracy to your critics?
      There are two ways to measure an election pollster's performance: "absolute" accuracy and "relative" accuracy. SurveyUSA keeps track of both, maintains up-to-date scorecards and, alone among pollsters, publishes the scorecards to our website for public inspection. By any measure, SurveyUSA is at the top or near the top of election pollsters - not just for 2004, but ever since SurveyUSA started polling in 1992.
       Pollsters talk and write a lot about reducing "Total Survey Error," but most obsess over the mathematical sources of error. I focus more on the how the questions get written, who asks them, how the questions sound to the respondent and how the questions get answered. SurveyUSA has re-thought from scratch exactly how polls can best be conducted, given what professional voices make possible. If the cost of each additional interview is expensive, which it is for others, you think about TSE one way. If the cost of each additional interview is relatively inexpensive, which it is for SurveyUSA, you can make other choices. The amount of intellectual horsepower that gets applied to exactly how a question is asked, and exactly what the respondent hears, as a percentage of total expended intellectual energy, is greater at SurveyUSA than at other firms.
      With this accuracy, what prevents any "Homer Simpson" from purchasing an auto-dialer and conducting polls from home?
       SurveyUSA has spent a fortune writing software and building hardware. But even if I gave our technology away for free, to Homer or Pythagoras, they would not know what to do with it. SurveyUSA's technology is neutral. It's just a tool, neither good nor evil.
      On your web site, you state that "Many media polls are ordered and completed same day. Many market research projects are ordered one day and delivered the next." How can this leave time to accurately develop the questionnaire, ensuring that the questions are unbiased?
      SurveyUSA researchers do not start from scratch when a poll is commissioned. Like most pollsters, SurveyUSA asks the same questions over and again. SurveyUSA's library has thousands of poll questions. Every so often, something truly new comes up, and our writers must wrestle with constructs, language, phrasing and the range of possible answer choices. In such cases we may test multiple ways of asking the same question. Ultimately a "keeper" goes into our library. Your question implies that questionnaires must be long and complex. Not true. For others, who go into the field infrequently, questionnaires take weeks to prepare, because both pollster and client know they may not get another chance for 3 months. SurveyUSA goes into the field every day. Our questionnaires are short, by design. Some see this as a limitation. We see it as an advantage. The more questions you put in a questionnaire, the more those questions interact with each other, and the more the early questions color the answers to later questions. Others ask (ballpark) 100 questions, which take 20 minutes to answer. SurveyUSA asks (ballpark) 10 questions, which take 2 minutes to answer.
      Do you feel the increased turnaround time has any negative effects (if not the above mentioned)?
      Every piece of research has a proper field period. Some SurveyUSA polls are in the field for minutes, some for weeks. SurveyUSA election polls are typically conducted over 3 consecutive days. Minutes after the first presidential debate last fall, ABC News completed one poll of 531 debate watchers. CNN completed one poll of 615 debate watchers. CBS News completed one poll of 655 debate watchers. NBC News did nothing. SurveyUSA completed 35 separate polls in 35 separate geographies, of 14,872 debate watchers. NBC affiliates in Seattle, Salt Lake and Denver had scientific SurveyUSA reaction in-hand minutes after the debate, while Tim Russert and Chris Matthews pondered how many DailyKos bloggers had stuffed the ballot box at the MSNBC website.
      On the day before DIA opened in Denver in 1995, SurveyUSA took a poll for KUSA-TV. We asked whether building the new airport was good or bad. The next day, after the airport opened, we re-took the same poll. Approval for the airport went up 20 points overnight. Should we have held the story for 3 days so we could do more callbacks? We had news. Our client led with it. The other stations had nothing. We owned the story.
      How do you formulate a representative sample? Do you use random digit dialing or voter lists? Which do you feel has the more accurate results? Why?
      SurveyUSA purchases RDD sample from Survey Sampling of Fairfield CT. We have conducted side-by-side testing using RBS (Registration Based Sample). In the testing we have done, RBS did not outperform RDD.
      On your website, you state that in order to end up with an accurate sample you use demographic breakdown to ensure you are portraying the population. However, since the questionnaire is being asked of the first person who answers the phone, how can you accurately establish a sample that is appropriate for a poll? Even with screening questions to establish the likelihood of a voter, how can you assure that a caller is actually over the age of 18? Or for that matter, how can you assure that they are citizens, or registered to vote? Is there any systematic way you can verify the accuracy of this once a poll has been completed?
      You've asked 5 questions here, the first of which contains a false premise. SurveyUSA can choose to talk to the person who answers the phone, or we can ask to speak to someone else. There is nothing hard about that. By your question, you create the impression that, a) SurveyUSA doesn't understand the importance of selecting a respondent from within a household and, b) even if we did understand it, our technology prevents us from doing it. Both are false. SurveyUSA has read all of the literature on intra-household selection, and SurveyUSA has done side-by-side testing on the different ways that one might do intra-household selection. We have tested the methods that are mathematically defensible in theory, such as asking for the respondent with the most recent birthday (which has problems in practice), and methods that are mathematically indefensible, such as asking for the youngest male over the age of 18. Intra-household selection, in practice, does not make the kind of polls that SurveyUSA conducts more accurate.
       2.4 percent of those who take a SurveyUSA poll tell us they are under the age of 18. We exclude them. There is no evidence that people lie to us more often than they lie to a headset operator. There is evidence to the contrary.
       Some SurveyUSA competitors want you to think SurveyUSA gets an occasional election right, the way Miss Cleo occasionally gets a psychic prediction right. The facts are published and available for inspection. The odds that chance alone can explain SurveyUSA's success relative to other pollsters is 1,000,000,000:1, by many measures. To those who would like me at this point to disclose that SurveyUSA got the Newark mayor's election wrong in 2002, the San Francisco mayor's runoff wrong in 2003, and that SurveyUSA overstated Dean in the 2004 Iowa caucuses, we did. When you have as many at-bats as SurveyUSA, you are going to strike out from time to time. The question is: how does our entire body of work stand-up? By multiple objective Mosteller measures, SurveyUSA's data need take a back seat to no one's.
      In 1999, a subsidiary of the research firm IPSOS wanted to see if interactive voice was a viable alternative to CATI. Senior IPSOS scientists put together a side-by-side test with 93,000 interviews. The test was deliberately designed to isolate and identify biases in interactive voice. As such, respondents were asked as diverse a collection of questions as possible. The testing was designed, carried out and paid for by the IPSOS subsidiary. After the 93,000 parallel interviews were conducted, IPSOS wrote a white paper, summarizing the research-on-research. Findings:

  • "IVR produces samples that more closely mirror US demographics than does CATI ... Three demographics stand out as being the reason for these differences: education, income and ethnicity. In all three cases, IVR was much closer to the census than CATI."
  • "IVR interviewing generally succeeds on all three fronts: sample projectability, accuracy and production rates. These findings suggest that IVR is a valid method for administering short questionnaires to RDD samples."
  • "In the few cases where differences are noted in the data, some can be resolved by the way we ask questions and some, we believe, are already more accurate in IVR."

After this white paper was written, this IPSOS subsidiary began using SurveyUSA for data collection.
      Due to the manner in which you obtain your sample, is there a differential in accuracy in general vs. primary elections?
      SurveyUSA has polled on 310 general candidate elections. Our average error on the candidate is 2.33 points. SurveyUSA has polled on 167 primary elections. Our average error is 4.13 points (1.8 times greater). We do not believe we are less accurate on primary elections because of the way we obtain sample. Because no pollster has ever been asked for, nor publicly made, this kind of disclosure before, I don't know whether a 1.8 factor deterioration on primary polls is above average or below average.
      Do you include "traps" in your screening process? If so, such as? Do they prove to be effective?
      We have experimented with as few as 3 and as many as 8 screens for likely voters over the years. In addition to asking the obvious question, "Are you registered?", we have experimented with many different variations on the direct, "How likely are you to vote" question, including running side-by-side testing for many of our 2004 polls comparing a 4-point likely scale to a 5-point scale. We have, in past years, but not in 2004, asked people where they vote. In 2004 we asked respondents whether and how they voted in 2000. We ask people their interest on a 1-to-10 scale. In 2004, we used fewer screening questions than in past years. Our results were superior. We find no simple relationship between the number of screening questions and the accuracy of our results. When SurveyUSA consistently produces a candidate error of 0.0 on pre-election polls, we'll assume we have solved this riddle, and will stop experimenting. Until then, it's a work in progress.
      Under what circumstances are your polls more beneficial than traditional telephone polls as conducted by Gallup? What makes automated polls more accurate?
       Have you been to Gallup's website lately? Have you watched Frank Newport deliver the Daily Briefing? Have you been to the Gallup Brain? Have you read Gallup's blog? Do you receive the occasional introspective from David Moore? What a tour de force. No other pollster is a close second to Gallup in these areas. I aspire to run my company as openly and transparently as does Gallup, and to provide interactive real-time access to our library of questions and answers. In this regard, I have the highest respect for Gallup. Further, Gallup has a 70-year track on many important questions, which gives Gallup a 60 year head-start on SurveyUSA. That said, I would not trade data with Gallup: 42% of Gallup's final statewide polls in 2004 produced a wrong winner (5 wrong winners out of 12 state polls), compared to 3.4% of SurveyUSA's final statewide polls (2 wrong winners out of 58 state polls).
      Professionally-voiced polls are not inherently superior to headset-operator polls, and I do not make that claim. I just rebut the assertion that professionally-voiced polls are inherently inferior. Used properly, SurveyUSA methodology can have advantages. In 1994, SurveyUSA polled California on Proposition 187 for TV stations in Los Angeles, San Francisco and Sacramento. Prop 187 was a plan to deny benefits to illegal immigrants. When others polled, some respondents heard the 187 question this way, "Are you a bigot?" They answered in the politically correct way. "No, I would never vote to deny benefits to illegal immigrants" (before going out and doing just that). It did not matter how much confidentiality Field or LA Times interviewers promised the respondent, or how well trained those interviewers were. Both pollsters understated support for this measure. When SurveyUSA polled 187, respondents did not have to confess anything, but rather, had only to press a button on their phone, paralleling the experience the respondent would later have in the voting booth, where no one speaks his/her choice aloud. SurveyUSA said Prop 187 would pass 60% to 40%. It passed 59% to 41%.
      If your only access to polling data is Hotline, you may think Arnold Schwarzenegger scored a remarkable come-from-behind win in the 2003 Gray Davis recall. The only polls Hotline reported showed Cruz Bustamante ahead early in the campaign. What SurveyUSA knows is that Cruz Bustamante never led in California. Californians may have been reluctant at first to tell other pollsters that they planned to vote for the body builder, but they had no problem telling KABC's Marc Brown this every time SurveyUSA was in the field, which was on 38 of the 59 nights of that campaign. Publications, such as Hotline, which abide by the Gentleman's Agreement not to publish SurveyUSA polls, do a terrible disservice to their subscribers on occasions such as this.
      In 1998, I received a call at my house from a well known Washington DC polling firm. The interviewer eventually zeroed-in on questions about Bill Parcells, then the coach of the New York Jets, and a Cadillac spokesman. I listened carefully. Why would the interviewer want to know if I thought Bill Parcells was honest? Then I connected the dots. This was not a poll about Bill Parcells, this was a poll about Bill Pascrell, who is my Congressman, and who was running for re-election in New Jersey 8th District. The interviewer was reading the name wrong. I said to the interviewer, "Ma'am, excuse me. Stop. You are mispronouncing the gentleman's last name. It is Pas-crell. Not Par-cells." "No," she said. "It says right here, 'Bill Parcells'." How many times a day do you think something like that happens with headset operators? How many different ways can you think of for an $8/hour employee doing monotonous work to make a mistake? Does it matter how many PhDs worked to draw the sample for that survey? Does it matter how many PhDs pored over the data to write the analysis that the candidate ultimately was handed? It doesn't. The data was worthless. And this - importantly - was one of the best outfits, an outfit that actually runs its own call center. Imagine how much worse it gets at firms that just outsource their calls to a 3rd party, and who have no direct control over who asks the questions.
      Now, about the word "automated." Almost all polling firms use purchased auto-dialers. The dialer automatically dials the phone, detects a connection and, once the dialer believes a human is on the line, automatically passes the call to an interviewer. In some cases, that interviewer is well-trained and articulate, sensitive without being intrusive, and in all things neutral. Perfect. But in other cases, that interviewer is an unpaid, untrained college student hoping to get a credit, or the interviewer is convicted criminal, calling from a call center located within a Canadian prison. The people who staff call centers know the dirty little secrets, and they know the kind of people they can attract to do this work. They can tell you about interviewers who come to work drunk, stoned, or hacking phlegm. They can tell you about interviewers who flirt with the respondents, deliberately, to coax answers, interviewers who coach respondents, leading them to the "right" answers, and interviewers who don't ask the questions at all, but who just make up the answers to save time. Not every headset operator is horrible, to be sure, and the majority are well-meaning, but every call center has horror stories.
      In SurveyUSA's case, when our proprietary dialer detects a human, the respondent immediately hears the voice of a TV news anchor. News anchors are not paid $8/hour. In some cases they are paid $800 an hour. No one is more acutely aware of the limitations of SurveyUSA methodology than I. But the choice is not between SurveyUSA and perfection. The choice is between a news anchor, who has been on the air 30 years in some cases, and a headset operator, who, if he/she lasts a year in the job, is exceptional. I'll take the news anchor. Were Winston Churchill alive, he might say: "Many forms of data collection have been tried, and will be tried in this world of sin and woe. No one pretends that using TV news anchors to ask the questions is perfect or all-wise. Indeed, it has been said that using TV news anchors is the worst form of data collection ... except all those others that have been tried from time to time."
      Because of the nature of your polling system, the types of questions that can be asked are limited. Without giving the respondent an opportunity to choose "other" and then specify what that is on more than one question, doesn't this prevent the client from being privy to the wants/needs of the sample?
      "Other" can be included in any question we ask. Structured probing can be done to whatever level is appropriate. Unstructured, open-ended, iterative probing cannot be done, but if you want unstructured, open-ended iterative probing, you need a focus group. In 1992, SurveyUSA identified an opportunity to serve TV newsrooms that were not being served by Gallup and Harris. We built a better mousetrap; the world beat a path to our door. Just as the TVA brought water to small-town America, and the REA brought electricity to small-town America, SurveyUSA brought true, random-sample, extrapolatable opinion research to Wichita, Roanoke and Spokane. Our clients are delighted with the work we do. Some have been customers for 12 years now. A number are under contract through 2008.
      How do you compare to Rasmussen? Do you feel you are more/less accurate when it comes to competitive races?
      SurveyUSA has competed with Scott William Rasmussen on 68 occasions. We have outperformed Rasmussen using any of 8 academic measures. Our mean error and standard deviation on those 68 contests, and Rasmussen's, are posted to SurveyUSA's website.
      What is your response to critics who state that while automated polls are fast, rendering them headline worthy for TV stations, they are not accurate enough to use within a campaign to determine strategy based on the reaction of the electorate to issues or events? In addition to The Hotline, a number of other news organizations have a policy of not running automated dialing polls, stating that it would be a disservice to readers to portray the results as accurate -- Roll Call and the AP to name a couple. How would you convince us of otherwise?
      Campaign managers scour SurveyUSA's data, then make media-buy decisions and change strategy. I know because campaign managers call me. They tell me how eerily similar SurveyUSA's data is to their own internal polling. By any objective criteria or honest measure, SurveyUSA years ago earned the right to be included in Hotline's "Poll Track." Yet we're still blacklisted. Evil triumphs when good men do nothing. Here's a chance to do something

© 2004 by National Journal Group Inc., 600 New Hampshire Avenue, NW, Washington DC 20037. Any reproduction or retransmission, in whole or in part, is a violation of federal law and is strictly prohibited without the consent of National Journal. All rights reserved.

Posted by Mark Blumenthal on February 9, 2005 at 02:58 PM in Innovations in Polling | Permalink | Comments (4)

February 08, 2005

Update

Just a quick update:  I am over the flu-bug, but it's left me behind the 8-ball a bit.  My apologies for the infrequent posts of late.    Thanks to all who sent kinds wishes in comments or via email.

For daily readers:  I just updated my original post on what is (and is not) included in the "raw data" release from NEP.  Hoping to have more on Social Security polling tomorrow.

Posted by Mark Blumenthal on February 8, 2005 at 03:03 PM in MP Housekeeping | Permalink | Comments (0)

February 04, 2005

Recent Polling on Social Security

I had hoped to post a primer on recent surveys on the issue of Social Security earlier in the week, but my bout with a flu-bug got in the way.  Fortunately, I discovered that the heavy lifting had already been done in a concise report by the Pew Research Center that conveniently summarizes virtually all of the recent polling on the issue.  It is absolutely a must read for those who want a quick overall review.

I'll say a bit more on the Pew report below, but first let me do a quick primer on their primer.  One of the great challenges for all surveys on public policy issues is the danger of measuring what social scientists sometimes call "non-attitudes."  The idea, first hypothesized by the legendary political scientist Phil Converse in the 1960s, is that the social pressure to appear opinionated during an interview induces respondents to report opinions on topics about which they have little knowledge or have not formed prior opinions. 

Many years of research and practical experience show that respondents in this situation do not generate attitudes at random, as Converse suggested, but rather draw cues from the language of the question and reason their way to an answer.  The answers of respondents without prior opinions have meaning, but it is highly dependent on the wording of the question.  Change that wording just slightly, and respondents may provide very different answers. 

President Bush's proposal for private Social Security Accounts provides a classic example.  The Pew report tells us that many know little or nothing about it:

In Pew's early January survey, only about a quarter of Americans (23%) said they heard a lot about the proposal; another 43% said they had heard a little, while 33% had heard nothing at all.

Yet while three quarters know little or nothing about the privatization proposals, Pew reports that only 3% to 16% offer a "don't know" response when asked about them on various polls. 

Of course, "non-attitude" may be a bit misleading in this case because the President's proposals stir up a variety of more general attitudes that are very real and strongly held.  Virtually every adult American comes into regular contact with the Social Security program, either as a payroll taxpayer or a beneficiary.  Thus, the questions about private accounts may draw upon attitudes involving: 

  • The Social Security program generally
  • The prospect of reducing Social Security benefits
  • The likelihood of future generations receiving full benefits
  • Taxes generally and Social Security payroll taxes in particular 
  • Investing generally and the stock market in particular
  • President Bush
  • Democrats in Congress

Now let's consider those ideas in the context of the numbers in the Pew Center report.  Social Security is certainly a popular program; 90% oppose "reducing Social Security benefits" (on a 1999 NPR/Kaiser/Harvard Study, though opposition is not quite as monolithic for less sweeping proposals to cut benefits). 

At the same time, 55% to 60% question whether they will receive Social Security benefits when they retire.  My hunch is that doubts about future benefits are driving the results in recent surveys (also cited in the Pew Report) that show 72-74% saying Social Security or has at least "major problems," but only 18% to 24% agreeing that the program is "in crisis.   

However, one of the most striking tables in the Pew Research Center Report, copied below, shows results from questions asked by five different organizations about private accounts during mid January that show support varying from 55% to 29%

Pew_ss

[click on image for full size version]

"Yes Virginia," as Ruy Teixeira put it in his summary of the report, "wording does matter" (and Ruy has been all over this topic lately).   In general, on the questions the Pew Report, the more that the question language mentions increased risk, the role of stock market investments, the potential for reduced benefits, the lack of guaranteed benefits or describes the proposal as an initiative of President Bush, the less support it receives.   Put another way, support for private  accounts seems lower when respondents learn more about it.

It is hard to argue with the Pew Report's conclusion that "public opinion on the various proposals being circulated is fluid and highly dependent on how the options are framed."  Expressed in the present tense, that is undeniable.    However, as time passes and Americans learn more about Bush's private account proposals, the results suggest that these currently fluid tenuous attitudes will solidify and move against Private Accounts in the same way public opinion moved against the Clinton Health Care proposals a dozen years ago. 

Posted by Mark Blumenthal on February 4, 2005 at 12:57 PM in Polls in the News | Permalink | Comments (9)

February 02, 2005

Flu

Just wanted to explain the lack of posts.  Apparently, my two year old has been hanging out with Hilary Clinton or her staff, because she brought home the flu that has been making the rounds in DC.  It disabled the Mystery Pollster yesterday, and as of this morning, the Mystery Spouse as well.  Ugh. 

They tell me this is just a 24 hour thing.  Hope so.... 

Posted by Mark Blumenthal on February 2, 2005 at 08:43 AM in MP Housekeeping | Permalink | Comments (5)