Creative Destruction

January 16, 2007

Racism in the Electoral College: Not So Much

Filed under: Blogosphere,Debate,Race and Racism,Statistical Method — Robert @ 2:25 am

Rachel of Alas has a post about structural racism up for MLK Day. In the discussion section of that post, we get into it hot and heavy about the Electoral College and how it is, per Rachel, a “very good example of structural racism”. Why? Because more white people live in the small states, which are proportionally “whiter” than the rest of the country. In Rachel’s words, “It proves that whites votes count for more.”

Not really. Aside from the obvious logical flaw of assigning a weight based on skin color when it is in fact based on a geographic distinction (a black man who lives in Wyoming gets the same overweighted vote in the Electoral College as a white man), the numbers do not, in fact, support Rachel’s position. In the spirit of the “blue states give less” and “red states are dumber” statistical simplifications that go around the Web every time there’s an election (I’ve posted one or two myself), in a follow-up post she comes up with two tables purporting to show that all the small states are heavily white, and all the big states are less white, and thus the Electoral College deprecates the black vote enormously. (The actual quote from the first post is via a source who she cites approvingly, stating that “The Electoral College negates the votes of almost half of all people of color.”)

Again, it turns out, not really. In fact, not only not really – it’s pretty much a wash. Here is an exhaustive table of the states which have votes in the Electoral College. The first six columns are self-explanatory. “EV Weight” is an inverted factors showing the significance of a single person’s vote in that state, compared to the hypothetical “fair” number of people who should get 1 electoral vote if everything was even-steven. Numbers lower than one indicate that a person voting in that state has more than their “fair share” of input into the Electoral College; the winner here is Wyoming, at 0.31. The worst-off state is Texas, at 1.24. The “EV Over/Undercount” column indicates how many EC votes the state would gain or lose if everything were perfectly proportional (and if we could have fractional EC votes). The “White” and “Nonwhite Over/Undercount” columns indicate how many of those over or undervotes would be distributed among the racial balance of the state; if a state “should” have 10 more EC votes and is 80% white, then 8 of those votes are credited to the white column, and 2 to the non-white.

The point of all this was to come up with a picture of how the distribution of Electoral College votes would change if everything were proportional to population. That final number is damning for Rachel’s view of a world where the Electoral College is a huge structurally racist institutions: 4.80 electoral votes would shift, relative to population. That’s about 0.89% of the EC vote total. Check out the figures for yourself below the break.

Electoral Vote Over and Undercounts, by State and Racial Designation, Weighted by Population

Ordered by Under-representation

State Population “Electoral
Votes” Population/EV % White % Nonwhite EV Weight “EV Over/
Undercount” “White Over/
Undercount” “Nonwhite Over/
California 36,457,549 55 662,865 59.5 40.5 1.19 10.27 6.11 4.16
Texas 23,507,783 34 691,405 71 29 1.24 8.09 5.74 2.35
Florida 18,089,888 27 669,996 78 22 1.20 5.39 4.20 1.19
New York 19,306,183 31 622,780 67.9 32.1 1.12 3.57 2.42 1.14
Illinois 12,831,970 21 611,046 73.5 26.5 1.09 1.97 1.45 0.52
Georgia 9,363,941 15 624,263 65.1 34.9 1.12 1.77 1.15 0.62
Pennsylvania 12,440,621 21 592,411 85.4 14.6 1.06 1.27 1.09 0.19
Michigan 10,095,643 17 593,861 80.2 19.8 1.06 1.08 0.86 0.21
Arizona 6,166,318 10 616,632 75.5 24.5 1.10 1.04 0.79 0.25
North Carolina 8,856,505 15 590,434 72.1 27.9 1.06 0.86 0.62 0.24
Virginia 7,642,884 13 587,914 72.3 27.7 1.05 0.68 0.49 0.19
New Jersey 8,724,560 15 581,637 72.6 27.4 1.04 0.62 0.45 0.17
Ohio 11,478,006 20 573,900 85 15 1.03 0.55 0.47 0.08
Washington 6,395,798 11 581,436 81.8 18.2 1.04 0.45 0.37 0.08
Indiana 6,313,520 11 573,956 87.5 12.5 1.03 0.30 0.27 0.04
Maryland 5,615,727 10 561,573 64 36 1.01 0.05 0.03 0.02
Wisconsin 5,556,506 10 555,651 88.9 11.1 0.99 -0.05 -0.05 -0.01
Tennessee 6,038,803 11 548,982 80.2 19.8 0.98 -0.19 -0.15 -0.04
South Carolina 4,321,249 8 540,156 67.2 32.8 0.97 -0.26 -0.18 -0.09
Oregon 3,700,758 7 528,680 86.6 13.4 0.95 -0.37 -0.32 -0.05
Utah 2,550,063 5 510,013 89.2 10.8 0.91 -0.43 -0.39 -0.05
Kentucky 4,206,074 8 525,759 90.1 9.9 0.94 -0.47 -0.42 -0.05
Massachusetts 6,437,193 12 536,443 84.5 15.5 0.96 -0.47 -0.40 -0.07
Colorado 4,753,377 9 528,153 82.8 17.2 0.95 -0.49 -0.41 -0.08
Nevada 2,495,529 5 499,106 75.2 24.8 0.89 -0.53 -0.40 -0.13
Missouri 5,842,713 11 531,156 84.9 15.1 0.95 -0.54 -0.46 -0.08
Oklahoma 3,579,212 7 511,316 76.2 23.8 0.92 -0.59 -0.45 -0.14
Connecticut 3,504,809 7 500,687 81.6 18.4 0.90 -0.73 -0.59 -0.13
Minnesota 5,167,101 10 516,710 89.4 10.6 0.93 -0.75 -0.67 -0.08
Alabama 4,599,030 9 511,003 71.1 28.9 0.91 -0.77 -0.54 -0.22
Mississippi 2,910,540 6 485,090 61.4 38.6 0.87 -0.79 -0.48 -0.30
Arkansas 2,810,872 6 468,479 80 20 0.84 -0.97 -0.77 -0.19
Kansas 2,764,075 6 460,679 86.1 13.9 0.82 -1.05 -0.91 -0.15
Montana 944,632 3 314,877 90.6 9.4 0.56 -1.31 -1.19 -0.12
Louisiana 4,287,768 9 476,419 63.9 36.1 0.85 -1.32 -0.85 -0.48
Idaho 1,466,465 4 366,616 91 9 0.66 -1.37 -1.25 -0.12
Delaware 853,476 3 284,492 74.6 25.4 0.51 -1.47 -1.10 -0.37
New Mexico 1,954,599 5 390,920 66.8 33.2 0.70 -1.50 -1.00 -0.50
South Dakota 781,919 3 260,640 88.7 11.3 0.47 -1.60 -1.42 -0.18
Maine 1,321,574 4 330,394 96.9 3.1 0.59 -1.63 -1.58 -0.05
New Hampshire 1,314,895 4 328,724 96 4 0.59 -1.65 -1.58 -0.07
Iowa 2,982,085 7 426,012 93.9 6.1 0.76 -1.66 -1.56 -0.10
Hawaii 1,285,498 4 321,375 24.3 75.7 0.58 -1.70 -0.41 -1.29
West Virginia 1,818,470 5 363,694 95 5 0.65 -1.74 -1.66 -0.09
Alaska 670,053 3 223,351 69.3 30.7 0.40 -1.80 -1.25 -0.55
Nebraska 1,768,331 5 353,666 89.6 10.4 0.63 -1.83 -1.64 -0.19
North Dakota 635,867 3 211,956 92.4 7.6 0.38 -1.86 -1.72 -0.14
Vermont 623,908 3 207,969 96.8 3.2 0.37 -1.88 -1.82 -0.06
Wyoming 515,004 3 171,668 92.1 7.9 0.31 -2.08 -1.91 -0.16
Rhode Island 1,067,610 4 266,903 85 15 0.48 -2.09 -1.78 -0.31
Total 298,816,954 535 558,536       0.00 -4.80 4.80

Population and Electoral Vote data, Wikipedia

Racial Breakdown data, US Census 2001

Rachel uses the same population and electoral vote data in her partial tables; I do not know where she gets her racial breakdown numbers. She cites the 2000 Census but does not provide a document or link; her numbers don’t match with anything I found on the Census site. (This table as originally posted used data from the wrong year; I re-ran the calculation using the proper whites-only data from 2000. The difference was about 1 EC vote.)

The only election in US history, offhand, that this could have ever affected was the rancorous Hayes-Tilden race of 1876, which Hayes won by one vote. However, Hayes was the Republican in that race (it goes back), and in 1876, most any black who was able to vote, was voting Republican. So, as far as I am aware, no Presidential election would ever have been swayed or thrown into question by an adjustment such as this, even ignoring the fact that the demographic balances have changed over time.

I believe that Rachel is genuinely concerned about racial issues in this country, and I applaud the fact that she devotes a considerable fraction of her professional energy to raising awareness about racism and its many forms. However, that appreciation and respect does not extend to ignoring errors of analysis or of fact, and on this question, Rachel and the scholars/activists she cites are simply, and completely, wrong. And they and she cannot afford to be. As she herself notes, the students entering her classes (she is a sociology professor at a New York university) are resistant to education on the topic of racism, and defensive. That represents a challenge to the pedagogical process which can only be surmounted with unassailable data and sound analysis. Casually untrue assertions about structural racism are expensive to raising awareness – already-defensive undergraduates whose instructors give them misinterpreted data of this type are not going to adapt an understanding attitude; they are going to assume they are being lied to by someone with an agenda. I do believe that Rachel has an agenda, but I do not believe she is lying – but she is way off base, and if she wants to be taken seriously as a social science scholar, she must improve the quality of the analytical work she is presenting as indisputable facts and final conclusions.

The spreadsheets I used to run this simple analysis are available to anyone who would like to play with the numbers; drop me an e-mail, or leave your e-mail address in the comments.



  1. Robert,
    You used the wrong numbers. Your figure includes everybody who check white alone or in combination, which means that you threw in the multiracial population into the white category.

    Comment by Rachel S. — January 16, 2007 @ 10:21 am | Reply

  2. Moreover, I didn’t use that table because it includes Latinos who self identify as white. However, the data you are using favors my point even more than more own data, but it still isn’t the right data. I’m searching for the correct table…

    Comment by Rachel S. — January 16, 2007 @ 12:33 pm | Reply

  3. Well, the data I’m using that over-favors you, shows your premise to be false. So…

    Looking at it, it looks like at 2 AM I pasted the wrong column in, too, using 1990 #s. I’ll update the post with the right numbers from the sheet, but it won’t change much, I don’t think.

    Let me know if you find another table of racial percentages, I’ll be glad to run them through the spreadsheet again.

    Comment by Robert — January 16, 2007 @ 1:03 pm | Reply

  4. Updated – it changed the EC total to 4.80. (I used the wrong column – them little numbers is hard to read, Cletus!). Still less than one percent of the EC; still pretty much diddly.

    Comment by Robert — January 16, 2007 @ 1:24 pm | Reply

  5. Robert the proper statistic to use would be a regression equation, with the dependent variable being the number of people represented by each electoral vote and the independent variable being % White. I guarantee you that the result will be siginficant.

    Comment by Rachel S. — January 16, 2007 @ 4:33 pm | Reply

  6. Don’t mind my oar, but…

    Moreover, I didn’t use that table because it includes Latinos who self identify as white.

    Isn’t that the point? If they so choose to identify as a white Latino, then let them so choose. No good playing Calvinball with how other people view their own ethnic identities. If they are self-identified as Caucasian, then let them be counted as Caucasian.

    Comment by Off Colfax — January 16, 2007 @ 5:37 pm | Reply

  7. Actually, that would be an inappropriate statistic, because our data is segmented into 50 units of unequal size. An ordinary regression will thus grossly overstate the impact that small states have on the total. Since what we care about is “how much is the nonwhite vote being under or overvalued in the EC”, not “in how many arbitrary geographic units is the nonwhite vote being under or overvalued in the EC”, we would need to do a regression weighted by the population of each state.

    But I’m a sport; I’ll do both, and post them. Probably later this afternoon.

    Comment by Robert — January 16, 2007 @ 6:05 pm | Reply

  8. I hope that guarantee comes with a fabulous cash prize.

    Regression Statistics
    Multiple R 0.238520984
    R Square 0.05689226
    Adjusted R Square 0.036826138
    Standard Error 132183.1416

    An r-square of 0.05 is insignificant. Social scientists usually use 0.25 as their threshold, I’m told. In bidness school they told us to throw out anything less than 0.5.

    I won’t bother doing a weighted version, not least because I can’t off-hand figure out how to weight it properly. Doing a linear regression about hits my limits as a statistician, even with Excel doing the work.

    Rachel, I’ve now demonstrated that your claim about the Electoral College is false. You’ve critiqued the data, although without offering your own set. Then you’ve critiqued the methodology – but a change to the methodology that you now want to use does not demonstrate the conclusion you asserted; it doesn’t even come close to supporting it. I’ll look forward to re-running my analysis with whatever reasonable demographic data you come up with, but at the moment it appears unarguably true that there is no large racial bias in the Electoral College’s distribution of votes.

    Comment by Robert — January 16, 2007 @ 6:48 pm | Reply

  9. No, the test of significance in not the R-Square test. That should be an F test–interpret the F-Test.
    An R-Square of .05 means that 5% of the variance in # of people per electoral vote is explained by the percent White. Any statistician will tell you that 5% of the variance is quite high, assuming you calculated correctly.

    Comment by Rachel S. — January 16, 2007 @ 8:04 pm | Reply

  10. Robert’s first statement is the most important.

    Basically, the magnitude of even big problems in EV vote weight in small states doesn’t have much impact.

    The electoral college is also better than Congressional voting because DC has electoral votes.

    Comment by ohwilleke — January 16, 2007 @ 8:14 pm | Reply

  11. Rachel, I’ve taken basic statistics. A R-square of 0.05 is NOT high; it is very low.

    Comment by Robert — January 16, 2007 @ 8:27 pm | Reply

  12. If someone with more statistical knowledge than I possess would like to run an F-test on this data (I have no idea if such a test is even appropriate, since it’s usually used to check that two populations have the same mean), please drop me a note. Rachel keeps moving the goalposts, and so far the ball just insists on sailing through the uprights, but this is a kick I’m not equipped to make.

    Comment by Robert — January 16, 2007 @ 8:39 pm | Reply

  13. No Robert. I’m not moving the bar. You aren’t interpreting the stats properly. An R-Square is not a test of significance–look it up.

    If you used any standard statistical program, the significance test will be included. Unfortunately, I don’t have the time (or the formula) to hand calculate that regression equation, but I have a copy of SPSS at school, and I can run it next week.

    Having experience using regression, I can assure that 5% is fairly high, especially in the social sciences. I rarely have a lone variable that can explain that much variance.

    Comment by Rachel S. — January 16, 2007 @ 8:55 pm | Reply

  14. If an r-square isn’t the test that would indicate something, then why is it the test that you asked for? You didn’t mention an F-test until the standard regression results were provided.

    Nor have you commented on the original analysis – the one showing that the actual result in the physical world of making the Electoral College purely proportional to the population would result in the shifting of a few EC votes, in contrast to your (endorsed) claim that the Electoral College’s institutional bias is stripping HALF of the non-white population’s representation.

    You are, in fact, moving the bar. You’ve made a VERY strong claim, and provided VERY weak evidence in support of it. Analyses which indicate your claim to be off-base, you’ve consistently failed to address – instead asking for more and different tests, twice now. The first analysis is sufficient to indicate that your truth claim is WRONG – answer that point, please.

    Comment by Robert — January 16, 2007 @ 9:27 pm | Reply

  15. Hmm, I think my comment got lost. Sorry if this is a double post.

    I agree with Robert’s analysis, for what it is worth. While the p from the F-test is the best measure of whether this r-squared is consistent with a random distribution of voting power among states on the basis of proportion of black voters, a 0.05 r-squared with a sample of 50 is not going to produce a p

    Comment by Charles S — January 17, 2007 @ 12:02 am | Reply

  16. (continued from last comment, where I stupidly used a bare &lt symbol)

    p&lt0.05 (I haven’t checked, but I’ll be truly shocked if I’m wrong). Anyway, I don’t think that is the aspect that is of interest. What is of interest is whether non-white voters are disadvantaged by the electoral college system. Robert’s original analysis answered this. The answer is yes. White people get an average of 1.01 votes for each vote that a non-white person gets (ignoring all of the other ways that non-white people are disenfranchised in this country). However, compared to all the other ways that non-white people are disenfranchised, this really looks like a non-issue to me.

    Comment by Charles S — January 17, 2007 @ 12:05 am | Reply

  17. Arg, &lt should be <

    That’s what I get for slacking off work.

    Comment by Charles S — January 17, 2007 @ 12:05 am | Reply

  18. Thanks for dropping by, Charles. Poke around. Kick a few cats while you’re here.

    At some point I may extend this analysis, because looking at the numbers it seems clear that a relatively small movement of nonwhite Americans into the high-value states could easily flip those states’ political alignments, and make a real sea change in the balance of power in the EC. The libertarians have tried this on a small scale, and it’d be interesting to see someone try to game the system on the macro level.

    Comment by Robert — January 17, 2007 @ 12:44 am | Reply

  19. An r-square of 0.05 is insignificant. Social scientists usually use 0.25 as their threshold, I’m told. In bidness school they told us to throw out anything less than 0.5.

    This is a digression, but is that really what they said in business school? I find that very strange.

    Here’s a hypothetical example: Let’s say that I own a company that manufactures unusually high-quality spray-on paint. I commission a statistician to determine what factors most account for consumers choosing to buy high-quality spray-on paint. The results of the multivariate analysis come back; it turns out that skateboard ownership explains high-quality spray-on paint purchasing with a r-square of .4. Put another way, after controlling for all the factors I can control for, 40% of propensity to buy my paints can be explained by skateboard ownership.

    Would business school graduates really decide to throw out that kind of information as irrelevant? That seems nuts to me.

    Comment by Ampersand — January 17, 2007 @ 2:03 am | Reply

  20. Yeah, that would be nuts. You wouldn’t throw out that one, because you have controlled for all these (unarticulated in your example but presumably real) factors, done a (presumably) sophisticated multivariate analysis, etc., know your own business well, have a desk piled high with semiliterate letters from skaters saying that your paint rulez, etc.

    But if you were a data miner looking for associations between a bunch of groups and a bunch of products, for example, with no particular knowledge and no ability to control for any other factors, then the .4 wouldn’t seem so impressive. Lots of people own skateboards; lots of people buy paint; it beat the hell out of the data miner whether there was a connection before the analysis, and it still beats the hell out of the data miner. Whether an r2 value is meaningful for you really depends on all these factors outside the analysis. If you know them well, a low r2 might well mean something. On the other hand, if you have absolute godlike knowledge of something, you might reject an r2 of .9 as being insufficient to show a relationship. For example, in some physics and engineering applications, where it either happens or it doesn’t happen, and if it doesn’t happen 100% of the time, then something else is the cause. Say there’s a correlation of .92 between bridge collapses and high winds – but high winds don’t make bridges collapse. Faulty steel does. The high winds just put the steel under the right stress for it to break. (I have no idea if that’s actually true, that’s just the kind of thing I mean.)

    In the case at hand, the r2 of 0.05 probably actually does show something – and we can say that because the statistical universe in question is so simple that we can actually figure out how the physical universe would change, and that observed change falls right in line, magnitude-wise, with the r2. If I had found that there would be a 60-vote shift in the EC, then that r2 ought to be a lot higher. But instead we find that there is a real effect, but a small one.

    Usually in business statistics, you aren’t trying to find something abtruse and tricky. It’s hard to make money on abtruse and tricky (although Microsoft has done OK, ba dum bump). In your example, you wouldn’t do a regression analysis to find out if skaters buy paint; you already know they do, from customer surveys and marketing outreach. You’d be a lot more likely to do a regression when you were looking for knowledge you didn’t already have, and for that kind of throwing the net out into the data ocean, you only pull it back in when you get a really big hit. Otherwise you waste all your time tracking down low-r2 correlations that mostly turn out to be spurious. Analyst time costs money and most spurious correlations won’t make any of it back.

    Or that’s what my stats prof said, anyway.

    Comment by Robert — January 17, 2007 @ 3:47 am | Reply

  21. Say there’s a correlation of .92 between bridge collapses and high winds – but high winds don’t make bridges collapse. Faulty steel does. The high winds just put the steel under the right stress for it to break.

    This example would only makes sense if faulty steel isn’t included as a factor in your regression analysis. If high winds have very little or no independent bridge-collapsing causation after faulty steel is accounted for, then they won’t have an r2 of .92; instead, they’ll have a very low r2. Right?

    Comment by Ampersand — January 17, 2007 @ 7:41 am | Reply

  22. I have to say that this analysis is right on.

    Amp, an r squared of 0.05 is almost nothing. If the social sciences are “accepting” an r2 of 0.05 (they weren’t when i was taking statistics) then IMO that says something about the lack of statistical rigor in the programs–it doesn’t mean that the number magically becomes significant.

    Amp: An easy way to think of the R measure is this:
    1) set up an x-y graph for the two variables you’re interested in.
    2) graph your datapoints (each data piont has an x and a y)
    3) Look at your graph. Does the cluster of datapoints “look circular”? if so, there’s no correlation. Does it “look linear”? If so, there’s correlation.

    All Pearson’s R does is to give an idea of “how linear” a cluster is–the closer to 1, the more linear it is. It only works for TWO DATASETS. So the accuracy of the r value is entirely dependent on what other factors exist.

    Comment by sailorman — January 17, 2007 @ 11:30 am | Reply

  23. If high winds have very little or no independent bridge-collapsing causation after faulty steel is accounted for, then they won’t have an r2 of .92; instead, they’ll have a very low r2. Right?

    So I understand.

    Comment by Robert — January 17, 2007 @ 12:07 pm | Reply

  24. saliorman–R-Square is not the same as Pearson’s R. Nevertheless, you are right; two variables could be related but not in a linear fashion.

    Comment by Rachel S. — January 17, 2007 @ 12:17 pm | Reply

  25. Ok, I have searched and search and cannot find that chart I used. I have a feeling it included 2005 population projections, not 200 Census data. Nevertheless, here is the appropriate data from the 2000 Census. It is not perfect because we should actually be counting the voting age population, I tried to get the over 18 only data, but I couldn’t get that chart to work.

    Robert’s data is skewed because it includes Latinos who identify as white. The Census uses two questions to measure race/spanish origin. Historically, most Latinos have identified their race as White on Census forms, although more recently many are checking “other.” This alters the data somewhat, but not drastically. You should use the data from the final column.

    Comment by Rachel S. — January 17, 2007 @ 1:45 pm | Reply

  26. There’s nothing in your link.

    Comment by Robert — January 17, 2007 @ 1:48 pm | Reply

  27. Just to further back up my point. Here is the first paragraph of the data that Robert uses:

    “Census 2000 showed that the United States population on April 1,2000 was 281.4 million. Of the total,
    216.9 million, or 77.1 percent, reported White. This number includes 211.5 million people, or 75.1 percent,
    who reported only White in addition to 5.5 million people, or 1.9 percent, who reported White as well as one
    or more other races. Census 2000 asked separate questions on race and Hispanic or Latino origin. Hispanics
    who reported their race as White, either alone or in combination with one or more other races, are included in the numbers for Whites.”

    Comment by Rachel S. — January 17, 2007 @ 1:50 pm | Reply

  28. I’ll join in saying that r squared of .05 is quite low in a social science application.

    The trouble is that social science data is almost always very noisy. It has all sorts of non-independent variables, data biases, data sensitive to minor definitional changes (like whether multi-racial is included or not) and small data sets.

    Honestly, the remarkable think about the balance of power between the states is not that the electoral college and Senate have screwed it up, but, the fortunate coincidence that the partisan balance in the Senate and among the states in Presidential election has rarely differed dramatically from the partisan balance in the House, something that nothing in the system makes a particularly necessary or likely result.

    Comment by ohwilleke — January 17, 2007 @ 2:44 pm | Reply

  29. Let’s try it again. Here is the link.

    Comment by Rachel S. — January 17, 2007 @ 5:30 pm | Reply

  30. Thanks for the link, Rachel.

    Using those numbers for % white, the EC disparity shrinks, to 2.86 net EC votes that would shift hands. R-squared shrinks to 0.04.

    Comment by Robert — January 17, 2007 @ 5:51 pm | Reply

  31. Hmm, I reran it using those numbers, and got an increase in r-squared, so one of us is doing this wrong.

    Also, if you look at % of pop black or % of pop American Indians and Alaskan Natives (rather than % of pop white), r-squared goes up, and p goes down (for black % of pop regressed on EV, I get p of 0.0012 and r-squared of 0.19). White (non-latino) % of pop regressed on EV still gives rsquared of 0.12 and p of 0.013.

    Comment by Charles S — January 18, 2007 @ 12:02 am | Reply

  32. Okay, I went through it again more carefully because I really don’t want to get any real work done.

    The population numbers in the table at the top of this post don’t match the census numbers. Your EV numbers are right though. Using the census numbers from Rachel’s link, and recalculating the people/EV based on those numbers, I reran my calculations again, being careful that I wasn’t misaligning states in the merger of the data sets (I wasn’t, but it was the first source of error I thought of). Using 2000 population numbers, the r-squared goes up a little bit.

    rsquared F-test p
    0.0694 3.5821 0.0644 %whites
    0.2252 13.9509 0.0005 %blacks
    0.1766 10.2984 0.0024 %indian
    0.0014 0.0684 0.7948 %asian (I lose my bet from earlier up)
    0.0308 1.5264 0.2227 %hawaiian
    0.0628 3.2148 0.0793 %other race
    0.0184 0.8994 0.3477 %biracial
    0.0597 3.0475 0.0873 %hispanic
    0.1215 6.6416 0.0131 %white not hispanic

    Comment by Charles S — January 18, 2007 @ 12:57 am | Reply

  33. Yeah, I fouled up. I forgot that I had re-ordered the states, and pasted alphabetized state data into a numerically-sorted list. (The error you suspected yourself of making was actually the error I made.)

    With the data in the right order, and the population data from the same Census 2000 source, I show an an EC shift of 6.52 votes, and an r-squared of 0.09. Still not huge, but bigger. I don’t know where the discrepancy is.

    Comment by Robert — January 18, 2007 @ 1:49 am | Reply

  34. Are you still using the same population and and pop/ev numbers that are in the table in the OP? I used the population numbers from the census page that Rachel linked to, so that might be the difference.

    Comment by Charles S — January 18, 2007 @ 2:03 am | Reply

  35. Oh, on the break out by group, the p values should be treated with caution, because at that point 7 different groups are being tested, so the chance that one will be have a low p goes up. However, I think p

    Comment by Charles S — January 18, 2007 @ 2:06 am | Reply

  36. Off-topic: Does anyone else read “sadistics” whenever you see the word “statistics”? Or is that a Fraudian slip that only I have?

    Comment by Off Colfax — January 18, 2007 @ 3:56 am | Reply

  37. ACk.

    … However, I think p < 0.0005 would still be significant even with the data mining aspect taken in to account.

    This is the error that made that one study show that prayer had a positive effect on medical outcomes, they were using something like 20 different measures of outcome, and come up with a weakly significant outcome in 1 of them, which they trumpeted to the news, but they had forgotten to adjust for the fact that if you use 20 different measurements of outcomes, the chance that one of them will have an outcome that only has a 1 in 20 likelihood (p = 0.05) is actually quite high. Total tangent, but that is what always reminds me of that rule (or vice versa).

    I don’t read statistics as sadistics, but I do read discipline as de-splining. A typo that would always bring cries of “safeword! safeword!” from the engineers.

    Comment by Charles S — January 18, 2007 @ 5:29 am | Reply

  38. I used the pop numbers and % white from the 2000 census page that Rachel linked to. This afternoon I’ll put the revised table into HTML and repost it, maybe the error will pop out at me.

    Comment by Robert — January 18, 2007 @ 1:34 pm | Reply

  39. heh. charles, you reminded me of my intro statistics class where–after having learned about t-tests–we all woud gleefully start running comparative tests on multiple sets of data until we “found something”. It was a good introduction to ANOVAs if I recall correctly;)

    Comment by sailorman — January 18, 2007 @ 5:02 pm | Reply

  40. Although I have to say in defense of my black % result, that when I first read Robert’s description of what he was doing, my response was, shouldn’t you do regression on % black, not % white? Our country has a much more significant history of trying to keep black people out of particular states than it does for any other racial or ethnic group (Asians were kept out of everywhere, until they weren’t). The only other group with a similar history of state based exclusion would be Native Americans, who were progressively driven out of parts of the country. I think the slope of the regression for Native Americans is actually the opposite of the slope for Blacks. Like Whites (but even more so), Native Americans are heavily over represented (relative to % of total US population) in the less populous states (particularly Alaska, Idaho, Oklahoma, the Dakotas, New Mexico, Arizona, Montana).

    Comment by Charles S — January 18, 2007 @ 11:53 pm | Reply

  41. Well, if I’d been doing this on my own, I would have done it differently. It was Rachel’s point, so I used Rachel’s choice of variables and sources.

    Comment by Robert — January 19, 2007 @ 12:21 am | Reply

  42. That’s fair. As I said before, I think your EV disparity count is a more meaningful number than the rsquared or p of the regression as it is a measure of the size of the effect. I’d be interested to see what the disparity is if you use the % black vs. % white non-latino in the EV disparity (not interested enough to set it up myself, of course). It is interesting, though, how nicely the rsquared numbers line up with my little just so story of the history of population distribution.

    Comment by Charles S — January 19, 2007 @ 1:17 am | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Blog at

%d bloggers like this: