# Zipfs Law

## The unreasonable effectiveness of math

When we look at the natural world we easily see physical regularities like the seasons or waves crashing on a shore. I wonder how long ago the human mind notice the FACT of gravity; ie become aware that things stick to the ground and fall from from trees as opposed to just acting based on that knowledge. Newton took that awareness to another level when he discussed gravity with math like calculus and Einstein extended it further by showing that gravity was the result the shape of spacetime. Brian Greene tells me that string theory extends that further - the very form of the law of gravity (it's an inverse square law) means that gravity only works in 3D spaces. I like this sort of thing - we can eventually learn from raw FACTs very fundamental things about the reality we inhabit.

I lately encountered a very interesting fact - Zipf's Law.

It turns out that if you rank the words in English by frequency of usage what you find is that the second most frequent word occurs half as often, and the third most frequent occurs a third as often.
George Zipf discovered this while doing a concordance of Ulysses as a young man.
(Ahh - youth :-)
Anyway; the most common English word is 'the' and it occurs about 7% of the time.
Next most common is 'of' at about 3.5%. 'And' comes next 1.7% and so on down the line.

A quirk of English? Not at all.

The same distributionn applies to Sanskrit, Etruscan, heiroglyphics.

Even when people make up languages Zipf's Law emerges.

Even when you make up random words from a set of letters Zipf's Law emerges.

OK - a quirk of language in general? Perhaps it's something to do with the way people happen to think.

It's been found in music, city population ranks, income distributions, mass extinctions, earthquake magnitudes and in the ratios of colors in pictures. And it's been found to apply to the frequencies that genes occur in DNA.

Think about that - a regularity is discovered in word frequencies in Ulysses - in itself that's interesting but no big deal. But then the same regularity is discovered in DNA.

I try to imagine what's going on here.

Let's note some aspects of this law. It is a regularity that is found but it's not the sort of regularity that has predictive value.

Knowing the law of gravity has predictive value. Knowing it you can predict the location of Saturn in a hundred years. Zipf's law predicts that the second most frequent word will occur half as often but doesn't say that if the first is 'the' the second will be 'of'.

Another thing to note is that Zipf's Law is one of many 'power' laws.
A power law is pretty simple - y equals x squared is a power law where the squared is the power.
Here's a graph of the power law.

The thing to note is that the graph has a pretty distinctive shape.
So you can do things like take a bunch of data an plot a graph of it and then just look visually for the power curve that fits it best.
And of course not all data sets fit onto smooth curves and in fact no data set fits any curve exactly.
But you can get a pretty good kind of hint about the underlying mathematical structure that you're dealing with.

Which brings us around to the issue of the amazing effectiveness of math in helping understand reality. Just what is going on when so many frequency ranking situations follow something like Zipf's Law? Is this pointing to some sort of natural constraint imposed on reality by logic?

But think about it - is that really so surprising? Logic is stuff like not having two different things in the same place at the same time. And that's just the way reality is as far as we can tell. (Ghost stories notwithstanding). But what about reality that makes it so that wierd regularities like Zipf's Law hold?

What do you think?