My Photo

Recent Posts

Blog powered by TypePad
Member since 06/2005

« Tracking Health With Bash | Main | Platform Boots as Presentation Controllers »

Benford vs. Bash usage

I was on IRC today listening (reading) to some people complaign about stored procedures, reading between the lines they were falling into the keep in simple camp of SQL/RDBMS. Wanting to find saner pastures I googled to see if Joe Celko had a blog, I couldn't find one but I did find a collection of his articles.

The first reminded me of a phenomenon known as Benford's Law, quickly explained it states that if you examine any set of data, the digit one will be the leading digit approximately 30% of the time (out of the 9 possible leading digits for values > 1). In fact P(d) where d is the digit is approximately,

 

P(d) = log10 (1 + 1/d)

It's the sort of thing you can't really believe until you test it, so I ran the data on bash usage from yesterday against it. Here are the results (and in my opinion they are pretty darn close) ...

 

Digit Bedford Probability bash Data
1        0.301029995663981        0.347826086956522
2       0.176091259055681        0.217391304347826
3        0.1249387366083            0.0869565217391304
4        0.0969100130080564       0.0434782608695652
5        0.0791812460476248       0.0652173913043478
6        0.0669467896306132       0.0434782608695652
7        0.0579919469776867       0.0652173913043478
8       0.0511525224473813       0.0869565217391304
9        0.0457574905606751       0.0434782608695652

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/411359/6209609

Listed below are links to weblogs that reference Benford vs. Bash usage:

Comments

Post a comment

If you have a TypeKey or TypePad account, please Sign In