Zipf's Law

From P2P Foundation
Jump to navigation Jump to search


Zipf's Law is a particular formulation of the Power Law


Definition

"Zipf's Law is one of those empirical rules that characterize a surprising range of real-world phenomena remarkably well. It says that if we order some large collection by size or popularity, the second element in the collection will be about half the measure of the first one, the third one will be about one-third the measure of the first one, and so on. In general, in other words, the kth-ranked item will measure about 1/k of the first one." (http://www.spectrum.ieee.org/jul06/4109/4)


Examples

"To take one example, in a typical large body of English-language text, the most popular word, "the," usually accounts for nearly 7 percent of all word occurrences. The second-place word, "of," makes up 3.5 percent of such occurrences, and the third-place word, "and," accounts for 2.8 percent. In other words, the sequence of percentages (7.0, 3.5, 2.8, and so on) corresponds closely with the 1/k sequence (1/1, 1/2, 1/3…). Although Zipf originally formulated his law to apply just to this phenomenon of word frequencies, scientists find that it describes a surprisingly wide range of statistical distributions, such as individual wealth and income, populations of cities, and even the readership of blogs." (http://www.spectrum.ieee.org/jul06/4109/4)


P2P Application

"Zipf's Law can also describe in quantitative terms a currently popular thesis called The Long Tail. Consider the items in a collection, such as the books for sale at Amazon, ranked by popularity. A popularity graph would slope downward, with the few dozen most popular books in the upper left-hand corner. The graph would trail off to the lower right, and the long tail would list the hundreds of thousands of books that sell only one or two copies each year. The long tail of the English language—the original application of Zipf's Law—would be the several hundred thousand words that you hardly ever encounter, such as "floriferous" or "refulgent."

Taking popularity as a rough measure of value (at least to booksellers like Amazon), then the value of each individual item is given by Zipf's Law. That is, if we have a million items, then the most popular 100 will contribute a third of the total value, the next 10 000 another third, and the remaining 989 900 the final third. The value of the collection of n items is proportional to log(n)." (http://www.spectrum.ieee.org/jul06/4109/4)


Critique

An empirical study by Philippe Aigrain in First Monday shows that free music and free text communities do considerable better than Zipf's Law in guaranteeing access to the middle layers material in their collection. In other words: the Power Law does not apply.


More Information

See the related entries on Reed's Law, on Metcalfe's Law, as well as the entries on the Long Tail and on Group Forming Networks/