Only a week later than promised (hey, I’m not getting paid), my review of the problems I see in the Lancet article on mortality in the Iraq war.
The article is much briefer than the study, which I examined here. So this review will also, theoretically, be briefer (cheers from the gallery). In fact, I only found three issues. However, one of them is potentially damaging to the study’s methodological choices (although I lack the mathematical skills to make a determination of that point), another casts direct doubt on the reliability of the authors’ reporting, and the third makes it clear that the study’s sampling method was not, in fact, random. These are major issues, in other words.
To repeat my disclaimer from last time:
I am not a trained statistician; any numerical analysis which crawls its way into this post should be viewed with a skeptical eye and read broadly and generally. I am skeptical towards this article’s conclusions on grounds of its consistency with the other things that I know, but this post is not about that inconsistency, and is instead a list of what valid critiques I can come up with against the study and the article. I have skimmed the IBC press release slamming the study, and have glimpsed other criticisms, but have not done any extensive reading in the “opposition research”.
Criticisms of the article which also apply to the first document I reviewed will not be repeated unless new information is noted.
1. The study authors selected a target survey size of 12,000 people in 50 clusters through the country. The sample size is adequate. The small number of clusters raises a statistical concern. With each single cluster contributing 2% of the total study data, any unusual cluster will have a disproportionately large effect on the total outcome of the study. The authors make the (legitimate) point that movement in Iraq is difficult and dangerous, and word-of-mouth about the benign purpose of the interviewers propagating through the households of each cluster reduced this risk, an effect which would be greatly attenuated by a larger number of clusters. That is true, but immaterial to the degree of confidence we can have in the study result.
The mathematical statistics needed to figure out how many clusters you ought to use in a study are complex. An article in the International Journal of Epidemiology provides a nomogram (that there is fancy language for a “chart”) that tells you how many clusters you should use for a given prevalence rate (how often you expect to find what you’re trying to find), design effect (how much variation your methodology will create relative to an ordinary random sample), and cluster size (number of respondents per cluster). I do not know the design effect value, but we do know the prevalence rate (about 2.5%) and the cluster size (about 240). For middling values of design effect, the nomogram suggests between 125 and 1500 clusters be used.
It will take a better statistician than your humble correspondent to nail this one down, but it does seem plausible that the number of clusters selected is inadequately small.
2. On page 2, the study authors detail their selection methodology. Each cluster’s origin point was selected from a province and then a town weighted by population (fair enough). The cluster’s starting household, however, was picked in this fashion: “The third stage consisted of random selection of a main street within the administrative unit from a list of all main streets. A residential street was then randomly selected from a list of residential streets crossing the main street. On the residential street, houses were numbered and a start household was randomly selected.”
This is hugely problematic. If you do not live on a residential street which adjoins a main street in your town, then your household is excluded from the statistical universe the study is measuring. The study did not sample Iraq; it sampled the subsection of Iraq that happens to adjoin a major road in town. This is a problem for a study attempting to measure anything, but in the case of a study measuring wartime fatalities, it is a critical flaw. Main streets are densely populated areas. Densely populated areas are the locales to which insurgents in an urban conflict flock. There’s no point in carbombing Farmer Ahmed’s cow; you go to the market. Which is on a main street.
The study authors could have at least partially corrected for this non-random element of their sample by assessing the proportion of the Iraqi population that could have been sampled by this method, and using that total population figure in their overall calculations. They did not do this, and in fact make no mention of the non-random element of their selection.
This is a serious objection to the study’s validity; the most serious I have found.
3. Also on page 2, the study authors write “The survey purpose was explained to the head of household or spouse, and oral consent was obtained. Participants were assured that no unique identifiers would be gathered.”
This is problematic. Not intrinsically, but because it directly contradicts claims made by the study authors concerning their validation work on the study, specifically in the area of detecting and accounting for multiple accounts of the same death. Study author Burnham, in a media interview (h/t Amp), said “Double counting of deaths was a risk we were concerned with. We went through each record by hand to look for this, and did not find any double counting in this survey. The survey team were experience in community surveys, so they knew to avoid this potential trap.”
If no unique identifiers were gathered, then it is not possible that they went through and checked for duplicates. Either they lied to the respondents, or they lied to the press, or their article inaccurately reflects the methodology that was in place.
Overview and Conclusion
When I completed the first half of this critique, my overall impression was that there were some issues with the study that I found troubling, specifically the strength of their claims regarding the study’s validity and the difficulty their methodology created for other researchers attempting to verify their work. However, I thought that on balance the authors had done an adequate job of a very difficult task, and that – while their numbers were probably a little bit high – they were on the right lines.
I am forced to reconsider that proposition. The exclusion of an indeterminate, but large, fraction of the Iraqi population from the study’s potential range of survey respondents – particularly in view of the fact that the excluded fraction is also the group most likely on common-sense grounds to have avoided mass fatalities – is extremely troubling. It isn’t a priori proof that the study authors are dishonest or incompetent; it is proof that the study does not measure what it purports to measure. What appears to be an attempt to cover over another flaw, the impossibility of avoiding duplicate reporting under the study’s purported methodology, amplifies my concerns about the study’s integrity.
What are the real civilian casualty figures in Iraq? “Depressingly high” is an unsatisfactory answer, but until someone conducts a proper population-based study, that’s the best we have to go on.