Benford vs. Bash usage
I was on IRC today listening (reading) to some people complaign about stored procedures, reading between the lines they were falling into the keep in simple camp of SQL/RDBMS. Wanting to find saner pastures I googled to see if Joe Celko had a blog, I couldn't find one but I did find a collection of his articles.
The first reminded me of a phenomenon known as Benford's Law, quickly explained it states that if you examine any set of data, the digit one will be the leading digit approximately 30% of the time (out of the 9 possible leading digits for values > 1). In fact P(d) where d is the digit is approximately,
P(d) = log10 (1 + 1/d)
It's the sort of thing you can't really believe until you test it, so I ran the data on bash usage from yesterday against it. Here are the results (and in my opinion they are pretty darn close) ...
| Digit | Bedford Probability | bash Data |
| 1 | 0.301029995663981 | 0.347826086956522 |
| 2 | 0.176091259055681 | 0.217391304347826 |
| 3 | 0.1249387366083 | 0.0869565217391304 |
| 4 | 0.0969100130080564 | 0.0434782608695652 |
| 5 | 0.0791812460476248 | 0.0652173913043478 |
| 6 | 0.0669467896306132 | 0.0434782608695652 |
| 7 | 0.0579919469776867 | 0.0652173913043478 |
| 8 | 0.0511525224473813 | 0.0869565217391304 |
| 9 | 0.0457574905606751 | 0.0434782608695652 |

Comments