Creative Destruction

October 27, 2006

Substantive Criticisms of the Lancet Report: Part 2

Filed under: Iraq,Science,Statistical Method — Robert @ 10:06 pm

Only a week later than promised (hey, I’m not getting paid), my review of the problems I see in the Lancet article on mortality in the Iraq war.

The article is much briefer than the study, which I examined here. So this review will also, theoretically, be briefer (cheers from the gallery). In fact, I only found three issues. However, one of them is potentially damaging to the study’s methodological choices (although I lack the mathematical skills to make a determination of that point), another casts direct doubt on the reliability of the authors’ reporting, and the third makes it clear that the study’s sampling method was not, in fact, random. These are major issues, in other words.

To repeat my disclaimer from last time:
I am not a trained statistician; any numerical analysis which crawls its way into this post should be viewed with a skeptical eye and read broadly and generally. I am skeptical towards this article’s conclusions on grounds of its consistency with the other things that I know, but this post is not about that inconsistency, and is instead a list of what valid critiques I can come up with against the study and the article. I have skimmed the IBC press release slamming the study, and have glimpsed other criticisms, but have not done any extensive reading in the “opposition research”.

Criticisms of the article which also apply to the first document I reviewed will not be repeated unless new information is noted.

1. The study authors selected a target survey size of 12,000 people in 50 clusters through the country. The sample size is adequate. The small number of clusters raises a statistical concern. With each single cluster contributing 2% of the total study data, any unusual cluster will have a disproportionately large effect on the total outcome of the study. The authors make the (legitimate) point that movement in Iraq is difficult and dangerous, and word-of-mouth about the benign purpose of the interviewers propagating through the households of each cluster reduced this risk, an effect which would be greatly attenuated by a larger number of clusters. That is true, but immaterial to the degree of confidence we can have in the study result.

The mathematical statistics needed to figure out how many clusters you ought to use in a study are complex. An article in the International Journal of Epidemiology provides a nomogram (that there is fancy language for a “chart”) that tells you how many clusters you should use for a given prevalence rate (how often you expect to find what you’re trying to find), design effect (how much variation your methodology will create relative to an ordinary random sample), and cluster size (number of respondents per cluster). I do not know the design effect value, but we do know the prevalence rate (about 2.5%) and the cluster size (about 240). For middling values of design effect, the nomogram suggests between 125 and 1500 clusters be used.

It will take a better statistician than your humble correspondent to nail this one down, but it does seem plausible that the number of clusters selected is inadequately small.

2. On page 2, the study authors detail their selection methodology. Each cluster’s origin point was selected from a province and then a town weighted by population (fair enough). The cluster’s starting household, however, was picked in this fashion: “The third stage consisted of random selection of a main street within the administrative unit from a list of all main streets. A residential street was then randomly selected from a list of residential streets crossing the main street. On the residential street, houses were numbered and a start household was randomly selected.”

This is hugely problematic. If you do not live on a residential street which adjoins a main street in your town, then your household is excluded from the statistical universe the study is measuring. The study did not sample Iraq; it sampled the subsection of Iraq that happens to adjoin a major road in town. This is a problem for a study attempting to measure anything, but in the case of a study measuring wartime fatalities, it is a critical flaw. Main streets are densely populated areas. Densely populated areas are the locales to which insurgents in an urban conflict flock. There’s no point in carbombing Farmer Ahmed’s cow; you go to the market. Which is on a main street.

The study authors could have at least partially corrected for this non-random element of their sample by assessing the proportion of the Iraqi population that could have been sampled by this method, and using that total population figure in their overall calculations. They did not do this, and in fact make no mention of the non-random element of their selection.

This is a serious objection to the study’s validity; the most serious I have found.
3. Also on page 2, the study authors write “The survey purpose was explained to the head of household or spouse, and oral consent was obtained. Participants were assured that no unique identifiers would be gathered.”

This is problematic.  Not intrinsically, but because it directly contradicts claims made by the study authors concerning their validation work on the study, specifically in the area of detecting and accounting for multiple accounts of the same death. Study author Burnham, in a media interview (h/t Amp), said “Double counting of deaths was a risk we were concerned with. We went through each record by hand to look for this, and did not find any double counting in this survey. The survey team were experience in community surveys, so they knew to avoid this potential trap.”

If no unique identifiers were gathered, then it is not possible that they went through and checked for duplicates. Either they lied to the respondents, or they lied to the press, or their article inaccurately reflects the methodology that was in place.

Overview and Conclusion

When I completed the first half of this critique, my overall impression was that there were some issues with the study that I found troubling, specifically the strength of their claims regarding the study’s validity and the difficulty their methodology created for other researchers attempting to verify their work. However, I thought that on balance the authors had done an adequate job of a very difficult task, and that – while their numbers were probably a little bit high – they were on the right lines.

I am forced to reconsider that proposition. The exclusion of an indeterminate, but large, fraction of the Iraqi population from the study’s potential range of survey respondents – particularly in view of the fact that the excluded fraction is also the group most likely on common-sense grounds to have avoided mass fatalities – is extremely troubling.  It isn’t a priori proof that the study authors are dishonest or incompetent; it is proof that the study does not measure what it purports to measure. What appears to be an attempt to cover over another flaw, the impossibility of avoiding duplicate reporting under the study’s purported methodology, amplifies my concerns about the study’s integrity.

What are the real civilian casualty figures in Iraq? “Depressingly high” is an unsatisfactory answer, but until someone conducts a proper population-based study, that’s the best we have to go on.

7 Comments »

  1. […] I will hopefully post Part 2 of this on Friday, covering the article itself, which contains some fairly serious problems. Thanks for reading thus far. Comments are welcome. (Update: Part 2 posted.)   […]

    Pingback by Substantive Criticisms of the Lancet Report: Part 1 « Creative Destruction — October 27, 2006 @ 10:34 pm | Reply

  2. First, I’m glad you found time to post part 2. However, I thoroughly agree with you that you are under no obligation to post this or anything else. Blog when you want to. Now, on to your points.

    1. I’m almost reluctant to disagree because in principle you are correct: more clusters might have resulted in more definitive results. However, I have three quibbles with your criticisms:
    A. This study, as with any human study, had to be approved by an ethics board (Johns Hopkins’ in this case) or IRB. The IRB almost certainly demanded that they compromise on the number of clusters in order to maximize the safety of the local researchers. Complaining that they should have used more clusters if they believed that adding clusters was unsafe is sort of like criticizing an epidemiologic study of the effects of radiation on people on the grounds that a double blinded, controlled study in which people were exposed to known doses of radiation would be more accurate. True, but not practical.
    B. The number of clusters used is not unusual for cluster surveys in areas of conflict. See, for example, this study of deaths in Kosovo. (Full text is available by following the link to Lancet and signing up.) None that I’ve seen in an admittedly brief medline search used >125 clusters and some of them were investigating conflicts with much lower per case fatality rates than the current Iraq war’s. If you conclude that the Roberts group is not conducting its studies correctly then you must conclude that essentially all conflict research is fatally flawed.
    C. Using too few clusters is far more likely to result in an undercount than an overcount. Suppose there were 10 deaths in an area with a population of 10,000. If 5 of those deaths were in one household and the other 5 spread out as one death per household, and 50 households are sampled, there is only a 50/10,000 (0.5%) chance that they will randomly find the 5 deaths. It is far more likely that such a search would end up counting no or only one of the deaths in this group. Rare events (and a death, even with death rates as high as found in the Roberts study, are relatively rare events) are more likely to be undercounted than overcounted in cluster surveys. You are supposing that the authors got unexpectedly lucky–or unlucky–many times if you suppose that this study somehow overcounted. Unless, of course, you are alledging fraud, in which case all these arguments are meaningless. They only work for good faith errors.

    I’m going to have to go to a second comment for point 2.

    Comment by Dianne — October 30, 2006 @ 3:38 pm | Reply

  3. Continued…
    2. I don’t see the selection of streets as particularly troubling. Any little town is going to have a street identifiable as “main street”. And a cross street may well extend well out into the country. I don’t know anything about the layout of towns in Iraq, but if it is anything like towns in the US, streets intersecting the main street will tend to run from the center of town out to the country. So the randomly picked start house may be on the border of town or even outside the actual city limits. It is really only a fatal flaw if you assume that the experience of people who live on the main street or streets parallel to the main street will be substantially different from that of people who live on side streets. The main difference might be that, as you point out, the main street might be more of a target so that by eliminating the main street from consideration the researchers may have actually biased the data towards an undercount.

    You could make the argument that they are unfairly excluded people who live in entirely rural settings, but those are going to be a relatively small percentage of the population. And many people who officially live in “rural” settings actually live in areas that are connected to the nearest village by a street that connects to “main street.” So unless you have some sort of evidence that the population of Iraq is distributed in a substantially different way from that of the US and that there has been essentially no conflict in rural areas (which is hard to believe given that oil wells, roads, and other resources that aren’t necessarily found near population centers have been attacked), then I’m not very impressed with the argument.

    If no unique identifiers were gathered, then it is not possible that they went through and checked for duplicates.

    Theoretically, this is correct. Practically, it is not. A good deal of information about the subjects was gathered, including age and gender of each household member, length of time at current address, cluster, and characteristics of any members of the household who died, including cause of death and, if violent, responsible party, if known. The CIA factbook gives the size of the average Iraqi family as 6. So it would be easy enough to go through the data and look for duplicates by circumstance–ie if you find two families with a father, age 43, killed by a car bomb of unknown origin, a mother, age 39, two sons, ages 15 and 13, one killed by a gun shot wound dealt by coalition forces, and two daughters, 14 and 12, one killed by stabbing by vigalantes, both in the same cluster, it would be reasonable enough to assume that they are, in fact, a single family which was double counted. There remains the theoretical possibility of a family being double counted but the data being different either because of happenstance (ie a birthday between the first and second count), lying (denying a death from fear the first time or declaring a death to be due to a particular enemy without evidence out of anger or other motive), or erroneous recall (they made a mistake the first or second time), but even those are easy enough to identify by the closeness of the match barring massive fraud or error. Finally, duplicates would most likely not come singly. If a household was double counted then, presumably, the whole street was double counted. This would make it easier to find duplicates by cross checking: if two families appear suspiciously similar but all the others in a cluster are clearly unique it is much less suspicious than if the second half of the cluster seems hauntingly similar to the first half.

    Comment by Dianne — October 30, 2006 @ 4:09 pm | Reply

  4. Oops, forgot one other point in the massive comment above: The use of anonymous sampling without unique identifiers is quite common when performing studies in areas of conflict. See this study for an example. Also note that the cluster size given is 30. Again, if you want to claim that the Roberts group’s study is poorly done you essentially have to throw out all data about mortality from any area of conflict.

    Finally, a point that you didn’t cover in any way but I’d be interested to hear your thoughts on: The numbers obtained are not inconsistent with estimates of what the true numbers should be given Iraq Body Count’s direct measurement. Passive counting methods, especially those that use media reports as their basis, are inevitably undercounts. As the references from the Roberts paper show, the best numbers obtained by this method counted about 20% of all deaths. 5-10% would be closer to average for a country in a major conflict. So, using the IBC numbers, one could expect a true value of 224,000 to 995,000–not so different from the figures Roberts and his group obtained.

    Comment by Dianne — October 30, 2006 @ 4:26 pm | Reply

  5. Sample bias from only looking at people who live in towns probably isn’t as great as you might guess. This is because Iraq, unlike say, Midwestern or Southern America, is a place where the vast majority of residents live in a relatively densely populated area. Iraq is more like Arizona or Nevada, in which the vast majority of people cluster around urban centers near water sources, with a handful of people strewn across the desert, than it is like Indiana or Alabama or Iowa, were population is more evenly distributed.

    Comment by ohwilleke — October 30, 2006 @ 4:46 pm | Reply

  6. According to UNICEF, 33% of the Iraqi population is rural.

    I think you underestimate the non-urbanized population of Iraq, ohwilleke. There are still a lot of nomads and hill/marsh dwellers out there – people without main streets to intersect, so to speak.

    Comment by Robert — October 30, 2006 @ 5:00 pm | Reply

  7. See, for example, this study of deaths in Kosovo. (Full text is available by following the link to Lancet and signing up.

    Or just download a PDF

    Comment by Daran — October 30, 2006 @ 11:00 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: