March 03, 2006
Zogby Troop Poll: The Random Probability Sample
On the Zogby poll of U.S. troops in Iraq, I need to make one point that was implicit in my comments on Wednesday a bit more explicit. While much is shrouded in secrecy, one aspect of the methodology is clear from the information that John Zogby has provided on-the-record: The survey did not involve a "random probability" sample of all American troops serving in Iraq.
The principle of random sampling is what makes a poll "scientific." To meet that standard in this case, every member of the U.S. armed services in Iraq should have had some chance of being selected (or to put it statistical terms, the probability of selection had to be either equal or known for every member of the population). As I wrote yesterday, the constraints Zogby faced in gaining access to troops at "undisclosed locations throughout Iraq" made random selection of those locations impossible.
It is also unclear -- both from information in the public domain and from what Zogby shared with me in confidence -- whether his selection procedures amounted to random probability sampling even at the undisclosed locations. I did not press Zogby for details on that process, because under the terms of our agreement I could not report the details. While I could speculate about his procedures, unfortunately, doing so would require disclosure of information I promised not to disclose.
However, as this example provides an opportunity to learn something about the survey process, consider how the Gallup organization went about conducting a "strict probability based" random sample of ordinary Iraqis citizens in 2004. To grossly oversimplify the design, Gallup randomly selected 350 neighborhoods ("primary sampling units") from a list of over 116,314 in Iraq (the "sample frame"), using population statistics to make sure the probability of selection was proportional to the size of the neighborhood. In other words, every neighborhood had an equal probability of being selected.
Then, interviewers went to each neighborhood and compiled a list of every family living within every dwelling in that neighborhood and randomly selected families from that list. Within each selected household that agreed to participate, they took an inventory of all family members over 18 years of age and randomly selected one adult to be interviewed in a way that insured that both genders had an equal chance of inclusion, with no one allowed to self-select into the sample.
Thus, Gallup used a standardized procedure that gave every Iraqi adult (excluding those in institutions and those in two small Kurdish "governates" - see the footnote) a chance of selection, and the selection procedure was random at every step. That, as Gallup puts it, is a "strict, probability based sample design." Note also how much detail Gallup was willing to disclose about their sampling methods, despite very real concerns about the safety of their interviewers.
Now consider one of the examples I mentioned on Wednesday, the survey of Katrina evacuees conducted jointly last year by the Washington Post, the Kaiser Family Foundation and the Harvard School of Public Health. In that case, it was not feasible to use random techniques to try to survey the population of all of those who had to evacuate their homes because of Katrina. The full population had scattered widely, and the researchers lacked a public list of all evacuees or count of evacuees by their new geographic locations. So instead they did a survey limited to some (but not all) of the shelters housing evacuees in the Houston area. The selection of these locations was neither random nor representative of all evacuees, but the pollsters made no such claims. Instead they were careful to report the results as projective of "evacuees living in shelters" in Houston.
But review the procedures the researchers used and you will see the effort made to randomly select respondents at each location.
For areas where the evacuees either had limited mobility or were non-mobile -- --for example, cot areas occupied largely by elderly or infirm evacuees, or TV lounge areas -- interviewers moved through the respondent population. Specifically, interviewers were given a random number and instructed to count off this number of people before beginning the first/next interview. After an interview was completed (or a refusal obtained), interviewers would again count off using the random interval before selecting the next respondent.
For areas where evacuees were mobile -- for example hallways and evacuee service areas -- interviewers stayed in one particular spot throughout the interviewing period. They then counted people who passed their defined location and chose the (randomly generated) nth person to interview. This selection criteria was duplicated at the conclusion of each contact attempt, whether it was a completed interview or a refusal.
So what can we say about the degree to which the Zogby survey used random probability sampling to survey U.S. troops? Again, as I wrote earlier in the week, the method Zogby used to gain access to the undisclosed locations constrained his ability to select them. The selection was not random, but since he will not disclose the locations, we cannot take their identity into account in evaluating the results. The release specifies a sampling error of 3.3% (a statistic that, given the sample size, is based on the assumption of simple random sampling), but that margin is a bit deceiving. Plus or minus 3.3% compared to what? All we know for certain is that the poll was not a random sample of the population of all U.S. troops in Iraq.
As to the selection of respondents at those unspecified locations, we also do not know the procedures used to select respondents. Again, I did not press Zogby on the details of those procedures in our conversation because I would not have been able to report them here. I believe MPs readers deserve more than "trust me" as an explanation. Obviously, for me to speculate now about what Zogby might have done would require getting into the details I promised I would not reveal.
Some news organizations (like ABC News) have adopted strict standards that require the use of probability sampling and bar reporting of surveys based on intercept selection techniques in absence of a "credible sampling frame." Others are obviously less rigorous.
In the business world, commercial market researchers sometimes use non-random sampling (including many Internet based "panel" surveys) when rigorous probability samples are impractical or prohibitively expensive. However, the most ethical of these market researchers do not attempt to dress up such "convenience" samples as more than they are. Their clients pay for such projects on the assumption that the information obtained, while imperfect, is the best available.
John Zogby insists it is enough that those of us who have heard more about his survey's methodology conclude that it was "honestly and objectively done." I think he misses an important point. Consumers of Zogby's Iraq troop poll data also need to understand where it fits on the continuum between strict probability-based sampling and non-random convenience sampling. Zogby certainly believes that "security concerns" prevent further disclosure, that we do not "need to know" more. Perhaps. But without knowing more, it is hard to decide whether to trust the results.
I have done two tours in Iraq. I stayed on the FOB most of the time. I never encountered a journalist, I heard of one visiting my post but that they had a PAO along, which is standard procedure. SO, how did these poll takers get access to 944 troops all over Iraq at places that can not be disclosed because of security concerns? You cannot just walk onto an FOB by flashing a driver's license at the gate guard, especially those in hot areas. Did they get permission to enter the base and conduct the polling from PAO or the base CDR? Something does not seem right with this. Just to be upfront I have posted this comment at other blogs, Smarsh's for instance.
Posted by: KJB43 | Mar 3, 2006 10:00:47 AM
Its very, very simple.
Zogby (or his pollsters) made the whole thing up.
Posted by: Eric Blair | Mar 3, 2006 2:27:10 PM
Thanks for this analysis on the Zogby poll. I've been trying to find out more information on it. It sounds as if he did his best to come up with a random sample given constraints, but probably wasn't successful?
Posted by: Cal | Mar 3, 2006 2:50:52 PM
You write: "in other words, every neighborhood had an equal probability of being selected."
I believe this is incorrect. The larger neighborhoods had a greater chance of being selected than did the smaller neighborhoods; the probabilities were unequal, being proportional to the size (population) of the neighborhood. This procedure was applied at all sampling levels (including family, presumably), so that in the end each adult had an equal chance of being selected.
Hence a family of two adults was assigned twice the probability of a single adult, but within that family a coin was flipped, so that all adults ended up with an equal chance of being surveyed.
Posted by: ryan b | Mar 3, 2006 4:21:37 PM
How do you determine the margin of error? Is this the same as a confidence interval, and if so, what is the industry standard for reporting (95%)?
Since the sample was not a SRS, is there a way to account for the design effect like you can with regressions? If so and this manifests itself in the margin of error, isn't the 3.3 percentage points awfully small for a multi-stage design?
Posted by: Shek | Mar 3, 2006 6:10:20 PM
Mark, is his justification for the secrecy surrounding his methodology based on trade secrets? That doesn't wash for me the way it would around election time.
I do evaluation research on Federal programs. Large-scale social-experimental designs, some quasiexperimental stuff, but I live and die as a social scientist on the replicability of my results. If people want to ask why they can trust what I do, I can show them how to do it themselves. Zogby's not remotely approaching this here.
Oh, and I like the scatterplot in the post at the top of the page. Are you using R?
Posted by: DrSteve | Mar 3, 2006 8:54:40 PM
The comments to this entry are closed.