Well, I should be going to bed, but I’m not tired. I can think of nothing better than statistical nitpicking to put me to sleep, so herewith is the first annual Lancet Skeptical Review and Somnolence Soliloquy.
There are two documents in play here. The first one is entitled “The Human Cost of the War in Iraq” and subtitled “A Mortality Study, 2002-2006”. That document can be viewed in the original here. The second document is a companion article which provides some more detail on the study and which can be viewed here. I shall refer to these documents as “the study” and “the article”, respectively.
Let me begin with a quick disclaimer. I am not a trained statistician; any numerical analysis which crawls its way into this post should be viewed with a skeptical eye and read broadly and generally. I am skeptical towards this article’s conclusions on grounds of its consistency with the other things that I know, but this post is not about that inconsistency, and is instead a list of what valid critiques I can come up with against the study and the article. I have skimmed the IBC press release slamming the study, and have glimpsed other criticisms, but have not done any extensive reading in the “opposition research”.
Some of the following criticisms may seem trivial. I have not made an attempt to pick every possible nit, but I have listed each flaw or criticism that I can find in the interest of completeness and thoroughness.
1. My first criticism comes in the first sentence of the first paragraph of the study, which states that 600,000 people have been killed “in the violence of the war that began with the U.S. invasion in March 2003”. This criticism is not statistical, but historical and editorial. The war did not begin in March 2003; the war began in Kuwait on August 2, 1990, when Saddam Hussein invaded his neighbor. We do not speak of World War II as beginning on D-Day, or when Operation Torch put Allied troops back into the continental mass in 1942. This may seem a minor quibble, but it is revelatory of an authorial mindset that the war is blamed on the United States, and not on the original aggressor.
2. Later on the first page, the study states “The survey also reflects growing sectarian violence, a steep rise in deaths by
gunshots, and very high mortality among young men.” These are all facially plausible claims, but only the second and third are actually supported by the study. The study goes on to assert “growing sectarian violence”, “sectarian violence”, “sectarian animosity” and “sectarian lines”, again as assertions. These assertions of sectarianism are plausible from what I know, but an attempt appears to be being made to rest the “fact” of sectarianism upon the study’s foundation. No such finding is supported, however.
3. In the Introduction (p. 4), the study authors assert “Such methods [passive data collection such as morgue reports] can provide important information on the types of fatal injuries and trends. It is not possible, however, to use these methods to estimate the burden of conflict on an entire population. Only population-based survey methods can estimate deaths for an entire country.” This is flatly untrue. Survey methods are in most circumstances the best method for estimating a systemic variable like countrywide-deaths, but it is trivial to reach reasonably strong conclusions concerning deaths using counting methods. Demographers do not do this very often, because survey methods are really very powerful. But they could do if they needed to, and in fact they used to quite extensively before the development of the statistical knowledge that permits us to use survey methods. The survey authors here appear to be attempting to bolster the strength of their work by denying any validity to alternative methods. Those other methods, however, function – and the study authors, if they are competent statisticians, know that they function.
4. In the Introduction, the authors claim that 2.5% of Iraq’s population has been killed since the invasion. The casualty figure they use, 654,965, would thus indicate a total Iraqi population of 26,198,600 people. However, the chart on page 5 detailing the population figures as they were used to assign clusters has a total Iraqi population of 27,072,200 people. With that population total, the percentage ought to be 2.4%. Either they are misreporting the figure, or they are using a different population total for their conclusions versus their starting point.
5. On page 5, the authors note that “For ethical reasons, no names were written down, and no incentives were provided to participate.” While it is indeed ethical to refrain from providing incentives, it is difficult to see the ethical merit of making it impossible to verify or check the study results. That information must ethically remain confidential, but in order to validate a demographic study, it must be possible for other researchers to recompile data. This is a major lapse. It may be justified by the security situation, but given the seeming eagerness to participate in the study on the part of the Iraqi people, it seems unlikely that cooperation could not have been elicited even while following standard demographic survey protocols. The survey work is not reproducible.
The lack of name recording, even informally by the survey takers, also opens up a major area of uncertainty. Without recording names, it is impossible to reliably check for duplicate reporting. Household statuses in war zones are not always fixed and immutable. It is entirely possible that the death of a relative who lived in more than one household over the course of the occupation was reported twice or more. This is made even more likely considering that the surveyors went literally house to house in the cluster area; in Iraq, as in many places in the world, it is quite common to see brothers and cousins living in proximity. The magnitude of this effect could be quite small or it could be very substantial, and we will never know because the surveyors did not keep records of the names.
6. Also on page 5, it is noted that 92% of respondents who reported a death were able to produce a death certificate. This is not a priori impossible but it does seem like a high value considering the condition of the country’s health and governmental infrastructure over the period in question. The central bureaucracy is reported by the study authors as failing to retain a miserable one-third of the death certificate information in peacetime, yet the local versions of that same bureaucracy managed to achieve an essentially 100% rating on ensuring that every dead body went through the proper government protocol. This is again not impossible, but there does seem to be a disconnect between these two observations.
7. On page 7, the post-occupation non-violent death rate for the country, as indicated by the current survey reports, is calculated by the study authors as being essentially the same as during the pre-occupation period, with a deteriorating trend beginning to show itself. The authors hypothesize that “this may represent the beginning of a trend toward increasing deaths from deterioration in the health services and stagnation in efforts to improve environmental health in Iraq.” This seems unlikely; it would seem much more reasonable that those infrastructure components would deteriorate rapidly following the invasion and then either slowly recover as coalition troops and Iraqi government agencies restored capacity, or stay at a low level if insurgent activity was sufficient to eradicate any gains made. This is a small but potentially significant indicator that the survey sample used by the authors does not jibe with the overall population of the country.
8. On page 10, the authors compare this study with the 2002 study and find that the surveys indicate similar results. The authors report “That these two surveys were carried out in different locations and two years apart from each other yet yielded results that were very similar to each other, is strong validation of both surveys.” To describe it politely, this is wishful thinking. That the two surveys yielded similar results is a strong validation that the surveys have similar methodology, execution, and sample, and nothing more than that. A smashed barometer will give the same wrong reading a hundred days in a row; this indicates nothing about the weather and everything about the barometer. This is not the only instance of the study authors hyping the strength and quality of their results without providing foundation for the assertion.
I will hopefully post Part 2 of this on Friday, covering the article itself, which contains some fairly serious problems. Thanks for reading thus far. Comments are welcome. (Update: Part 2 posted.)