As I unintentionally walked into a debate on this issue, I thought I’d take the time to look at it by itself.
I must concede to Ampersand that Slate’s criticism seems mostly to play on the assumption that such a large range (8,000 to 194,000) is akin to a dartboard that Lancet is using. Yet Roberts, the author of the article, rebutted this point well–discussing the nature of confidence intervals and how the probability could be calculated from the point of view of a normal distribution. As he puts it:
1. There is a 2.5 % chance that the number is lower than 8000, and a 2.5 % chance it’s higher than 194,000 (2.5 % + 2.5 % = 5 %, thus the 95 % chance the number is between 8000 and 194,000). 2. There is a 10 % chance that the number is lower than 45,000, and a 10 % chance it’s higher than 167,000 (thus an 80 % chance the number is between 45,000 and 167,000). 3. There is a 20 % chance that the number is lower than 65,000, and a 20 % chance it’s higher than 147,000 (thus a 60 % chance the number is between 65,000 and 147,000).
So that point is effectively put to rest. Yet there is still much about the conclusions that the article draws that do not necessary follow from the information gathered by the studies.
To begin the article itself can be read here. The first point that should be made is that one bias is immediately introduced into their sample–for those families that were killed entirely during the pre-invasion period, during the slaughters that usually involved Kurdish families but not uncommonly Shia families as well, they obviously would not be there to be interviewed. This is of course but one of the ways that a comparison with pre-invasion Iraq is troubling, but it’s one that isn’t mentioned in the article.
It can also be seen on page 3 that six of Iraq’s provinces weren’t even surveyed at all. These are Al-Basrah, Al-Muthanna, An-Najaf, Dahuk, Arbil, and Kirkuk. The estimated populations in these areas as of 2003 are as follows:
- Al-Basrah: 2,600,000
- Al-Muthanna: Fewer than 1,000,000 (from an earlier, 1997 study)
- An-Najaf: 931,600
- Dahuk: 497,230
- Arbil: 1,134,300 (from 2001 estimate)
- Kirkuk: 949,000
These population figures are of course estimates, with Al-Muthanna’s and Arbil’s being older and perhaps less reliable–but for our purposes, I think it can be established that there are more than a couple million people who essentially were removed from the list of households that could be chosen at random.
In a response in the Lancet itself, Stephen Apfelroth makes a similar point–focusing on the fact that Roberts’ approach was only stastically accurate if you assumed away local variance.
Although sampling of 988 households randomly selected from a list of all households in a country would be routinely acceptable for a survey, this was far from the method actually used—a point basically lost in the news releases such a report inevitably engenders. The survey actually only included 33 randomised selections, with 30 households interviewed surrounding each selected cluster point. Again, this technique would be adequate for rough estimates of variables expected to be fairly homogeneous within a geographic region, such as political opinion or even natural mortality, but it is wholly inadequate for variables (such as violent death) that can be expected to show extreme local variation within each geographic region. In such a situation, multiple random sample points are required within each geographic region, not one per 739 000 individuals.
Emphasis added by me.
Apfelroth makes a number of other valid criticisms as well, and I recommend the article.
I also recommend this article, which goes into great statistical analysis of the ways in which Roberts’ method could be improved, as well as to how the information from the studies could be more accurately categorized (in terms of combatants, collateral, etc.,).
Finally, there is Roberts’ response.
He begins by defending the cluster sampling method, as it has similarly been applied to cases measuring starvation and other problems in countries around the world.
But he immediately acknowledges Apfelroth’s criticism:
Unfortunately, as Stephen Apfelroth rightly points out, our study and a similar one in Kosovo,3 suggest that in settings where most deaths are from bombing-type events, the standard 30-cluster approach might not produce a high level of precision in the death toll. But the key public-health findings of this study are robust despite this imprecision. These findings include: a higher death rate after the invasion; a 58-fold increase in death from violence, making it the main cause of death; and most violent deaths being caused by air-strikes from Coalition Forces. Whether the true death toll is 90 000 or 150 000, these three findings give ample guidance towards understanding what must happen to reduce civilian deaths.
In the end, arguing that the study isn’t without merit–but welcoming people such as Apfelroth to help improve the methodology.
However, I myself still believe that the margin of error in this projection is too great.
I will summarise my reasons:
- Six provinces were left out of the random sampling entirely
- The data on preinvasion Iraq is insufficient to use as a basis for comparison
- Without putting more resources into identifying local trends, taking a little more than two dozen samplings from 33 localities seems like a poor basis for projecting trends on the national level
That’s where I stand now–though I am but an interested amateur in these matters, and I welcome any criticism.
UPDATE: Ampersand provided a link to this article in The Chronicle of Higher Education, on this very subject. Though it disagrees with my own assessment, it is a very even-handed, well-written take on the subject, and I recommend it.