[tied] Re: *kap-

From: Daniel J. Milton
Message: 40983
Date: 2005-10-03

I side with Patrick rather than Grzegorz in the dispute about
Zipf's Law (see message 40980 -- it's getting too unwieldy to
reproduce); it deals with frequency of occurrence of items (such as
words) and has nothing to do with their character (such as length).
An authoritative statement may be found at
http://www.nist.gov/dads/HTML/zipfslaw.html
Zipf's law

Definition: The probability of occurrence of words or other items
starts high and tapers off. Thus, a few occur very often while many
others occur rarely.

Formal Definition: Pn similar to 1/na, where Pn is the frequency of
occurrence of the nth ranked item and a is close to 1.

See also Zipfian distribution, Lotka's law, Benford's law, Bradford's
law.

Note: In the English language words like "and," "the," "to," and "of"
occur often while words like "undeniable" are rare. This law applies
to words in human or computer languages, operating system calls,
colors in images, etc., and is the basis of many (if not, all!)
compression approaches.

Named for George Kingsley Zipf.


Grzegorz clearly has read Zipf's book and I have not, but I very
much doubt that Zipf wrote anything like "This is my Law"; rather some
reader picked out a striking finding and named it "Zipf's Law".
I strongly suspect Zipf did discuss wordlength and abbreviation,
but if some statement thereon is taken as "Zipf's Law", then that's an
idiosyncrasy of Grzegorz, or perhaps of the linguistic community among
a broader range of sciences.
Dan Milton