low-throughput: i use the eighty/twenty rule

The Pareto principle, widely known as 80-20 rule, has been used, abused and ridiculed innumerable times. Perhaps the funniest joke involving the rule is this one:

Chicago Driving 80/20 rule: 80% of your waving will be done with 20% of your fingers.

To quote Dilbert and the Way of the Weasel:

Another good utility phrase is “I use the eighty/twenty rule”. Toss it into conversation at any time. This will generate strong agreement because it fits any situation where you have no data. It doesn’t even matter what part is eighty and what is the twenty. It just always sounds right. For example, if you’re waiting for people to arrive for a meeting you could say, “For any business meeting, eighty percent of the people come on time and twenty percent are late”. That sounds totally reasonable. But if you say it the other way around it sounds just as reasonable: “For any business meeting, twenty percent of the people come on time and the other eighty percent are late”. It’s like magic.

Granted, it may sound reasonable to those who never heard about the Pareto principle, but the business meeting example is not the illustration of the 80-20 rule.

The Pareto principle is a special case of the Pareto distribution. The two numbers do not have to add to 100 because they apply to different things entirely. In the example graph below, “20” correspond to 20% of whatever the horizontal axis stands for, while “80” corresponds to the green area above it which takes 80% of all area.

Back in 1998, Sidney Redner analysed “popularity” of scientific papers in terms of citation. It turns out that there is no such thing as a typical number of citations received by a published paper. He counted the citations for the ISI list of papers published in 1981 (783,339 papers) and found that

most publications are minimally recognized, with ≈47% of the papers in the ISI data set uncited, more than 80% cited 10 times or less, and ≈.01% cited more than 1000 times. The distribution of citations is a rapidly decreasing function of citation count but does not appear to be described by a single function over the entire range of this variable.

Couple of years later, two Brazilian scientists, Constantino Tsallis and Marcio de Albuquerque, analysed the same data set and, contrary to Redner’s conclusion, found that there exists a single power law-type function N(x) along the entire range of the citation number x.

What, if anything, does it tell us? If scientific papers exhibit universality, that is, behave in the same fashion as sand piles or earthquakes or stock markets (and it looks like they do), then there is no way to predict whether a paper will be popular or not. Moreover, this kind of distribution has nothing to do with scientific qualities of a paper. The authors should stop worrying — or boasting — about impact factors and concentrate on important things, like getting a life.

low-throughput

Sunday, 31 January 2010

i use the eighty/twenty rule

No comments:

Post a Comment