August 28, 2006

Polling statistics

As long as polls are done this badly, no one should be surprised if they turn out wrong. The NPR poll linked to by the title seems to predict a possible victory for Democrats in the House. The poll looked at the top 50 "competitive" districts. The label "competitive" is based on the opinions of pundits, which is not a great start. 10 of these districts are Democratic, and in those, the incumbents seem to have a 2-to-1 advantage. Hardly competitive. But that's not the core point.

The core point is that the poll randomly checked 1000 likely voters in these 50 districts. That's only 20 per district, on average. That's simply too few to say anything per district. And per district is what matters. If in some district the incumbent is an incompetent crook, the fact that his or her constituents are dead-set on dumping his butt means nothing for the other districts. That's a crucial problem.

NPR presumably knows that, and in stead focussed on the total numbers over all the districts. The poll claims that 49% of the voters are likely to vote Democratic, and 43% are likely to vote Republican. Based on the statistical error of 3.2%, this is barely significant. In other words, if there are no systematic errors, the result is still a statistical dead heat, but only barely.

But there are serious systematic problems in the poll. A poll is only reliable if it queries the "average" voter. The respondents in the poll said that they had voted for Bush by a 49-to-46 margin in 2004. However, the districts went for Bush 58-42. That means that the poll SIGNIFICANTLY undercounts those who voted for Bush in 2004.

So, in addition to the poor statistics, add in a large systematic error. Don't get your hopes up yet...

