Did I ever tell you that I used to be a “database specialist” in the marketing department of an HMO?
Well, I did, and it was a very enlightening opportunity to observe the behind-the-scenes maneuvering of corporate America and our so-called “health care” system.
My job was to parse data. By parse, I mean to find a way to make the business look as good as possible.
For example, when we needed to address a RFI (request for information) concerning the rate of mammograms for our members, I might be able to increase the percentage by slightly shifting the age range used. “Women over age 45” might yield one number, while “women over age 50” another, and “women aged 45-60” yet another. Then the marketing person responsible for answering the question could pick the number that looked best and simply respond, “based on the data available to us…” or some such phrasing.
Because of this background, I quickly notice data that seems to have been manipulated to support a particular point of view, and such is the case with the Internet Movie Database (IMDb) website’s user ratings for the AIDS documentary House of Numbers.
What is “weighting”?
Weighting is the name of a process often used by pollsters to compensate for shortcomings in their polling data due to methodology. It is a complicated subject and prone to errors, because it inevitably requires that assumptions be made.
A very simple example of the need for weighting might be a telephone survey of consumer purchasing habits. Weighting might be required to compensate for those consumers who do not have telephones and therefore can’t participate in the survey as it is designed, for example. It is common practice to use weighting in political polling to compensate for gender imbalances.
Here is how the National Council on Public Polls puts it: “For example, men and women vote differently. Gender is correlated with vote. If we weight the sample to reflect the correct proportions of men and women in the population we will improve the results.”
Weighting can be good, but it is not without risks.
Unbalanced weighting skews results
I recently came across a poll result that was so astoundingly affected by weighting that it raised some red flags to me about the methodology used.
House of Numbers is a very controversial documentary that challenges the conventional theory about HIV and AIDS by talking directly to the discoverers of the alleged virus, as well as top AIDS researchers and policy makers around the world. Needless to say it has thrown the AIDS establishment into a damage control tizzy that includes a coordinated campaign to discredit the film and its producer, Brent Leung.
It is obvious from the raw voting data below that there is no middle ground when it comes to supporters or detractors who find their way to IMDb to cast their vote. More than 93% of the voters rated the movie either a 10 or a 1, and there are zero middle-of-the-road votes (data as of 02/12/2010).
It is understandable that IMDb has felt compelled to devise a weighting system to prevent “vote stuffing” that might unfairly skew the results of new movies:
IMDb publishes weighted vote averages rather than raw data averages. Various filters are applied to the raw data in order to eliminate and reduce attempts at ‘vote stuffing’ by individuals more interested in changing the current rating of a movie than giving their true opinion of it.
I propose that the “ballot stuffing” IMDb is trying to weight for usually comes from movie supporters, such as directors, cast and supporters trying to boost their movie’s rating. It may be effective for that and it is even possible that this is happening with some of the favorable votes for HON.
What IMDb’s secret algorithm does not seem to take into account however, are organized detractors, opposed to the message of this controversial movie. Vested interests that are the focus of unflattering documentaries have a lot of resources to try to stack the deck to sink such movies.
Similarly controversial films, like Al Gore’s An Inconvenient Truth, or Michael Moore’s Sicko may have been such targets as well.
Unlike HON, these movies do not have that glaring donut hole of no votes for any intermediate ratings. There is also the matter of volume of votes. HON has 135 votes to date, compared to tens of thousands votes for Sicko and hundreds of thousands for Inconvenient Truth. These differences alone probably help make the IMDb algorithm work better.
There is no way a casual observer knows how IMDb is weighting this kind of anti-movie reverse vote stuffing:
The exact methods we use will not be disclosed. This should ensure that the policy remains effective. The result is a more accurate vote average.
I would argue that the secret algorithm used by IMDb probably does not take into account organized attempts to stifle a controversial documentary that negatively impacts those naysayers’ own industry, prestige and livelihood.
The actual average ranking for House of Numbers, based on raw votes without weighting, would be 9.37, not 2.8 as stated on IMDb. That’s a big difference. A difference that begs to be re-evaluated. Even IMDb acknowledges that the arithmetic mean is 7.0, which is also a far cry from the IMDb user rating of 2.8.
Appropriate weighting requires that analysts consider all possible factors that might skew the numbers. There is no doubt a need to consider the influence of supporters trying to promote this film, but it is equally important to consider the fact that there is a significantly passionate group of detractors who are trying to disproportionately discredit this film.
Thank you for this post and explaining this situation Jonathan.