My Photo

Recent Posts

Blog powered by TypePad
Member since 06/2005

What is interesting?

I was chatting this afternoon on one of those great channels with a small membership that seems to effortlessly mix computing, philosophy, politics, physics and just about anything else that comes along. And I was thanking a speaker at a recent event for a great talk on SVM's - but I complained about how I wanted to not concern myself with the kernels used in a SVM.

In someways I wanted a system which I could just throw data at it, and it would work out the relationships -- I cited my recently failed attempt to do so with the Perl Survey data. And my major problem with what that attempt spat out was that it just wasn't interesting. And he asked me what was "interesting"?

The problem was, that my attempt had identified that people who contributed to Perl 6 were pretty much the same people who had contributed to Perl 5. Not a very interesting fact.

So I suggested a solution to figure out if something is interesting that could be executed by a computer and I wondered if anyone reading this could come up with something better?

Assume you have a set of conditionals, if X->Y. Then for each permutation of these (apart from where X = Y) you can evaluate the following...

    gh(X AND Y)/(min(gh(X),gh(Y)))

looking for the lowest score for the combination and comparing it to the % of if X->Y for a set of data. Where gh() is a function that measures the number of Google hits, and gh(X and Y) looking for the number of google hits that mention both X and Y. This should find the most strong correlation of things that is not mentioned by people on the internet.

Now I just need a set of data thats wider than the Perl survey and Google not to cut me off for hammering them -- but you guys can take it right? ;-) And I might well have my computer find out that people who buy nappies buy beer.

Charlie Brown Plays With Stats

Charlie_brown_lucy_football I'm a sucker for data and automated data analysis, I don't care if it's a huge unruly database, CVS/SVN commit info, a code base, stock market feeds, site performance data, whatever – I always believe that if you can just write a clever enough program you can discover something useful from it.

However my quest for programs to discover this “useful thing” is like, to borrow the use of an image from The West Wing, Charlie Brown kicking the football while Lucy holds it – It always looks promising right up to the last minute when the inevitable happens.

The thing is just like Charlie Brown I'm sure I'll be able to do it next time.

This time was the Perl Survey 2007 – I was sure that I could construct a program that on it's own would automatically discover something truly startling about the Perl community if I just approached it from the right direction. I decided to limit myself to the boolean fields in the survey, making a field goal certain.

So I wrote a program that identified all the boolean fields or at least those that looked like them. Then it went about constructing conditional propositions for permutations of the fields. Such as ...

	P->Q
!P->Q
P->!Q
!P->!Q

Where P and Q are both boolean fields in the data. These conditionals where then evaluated against the response data and a measure of truthiness (you've got to love hacks that combine formal logic, Perl and Stephen Colbert) was calculated.

I'm pretty sure there is a better or maybe more formal algorithm/mathematical approach to do this, however a quick google/trip to the book case didn't reveal anything immediately useful – although half of the time I find in algorithms and maths especially, you need to solve a problem the naïve way to properly appreciate and understand the clever way or even recognise it as the solution you need.

In the first version of the program I discovered two things, a lot of very truthy things were really really boring. e.g. If you haven't helped with Perl 5 there is a good chance you haven't helped with Perl 6. And there is also a certain amount of psychology in how we value rules, i believe, we  value the if P then whatever type, and value less the if !P then whatever type, and we especially no not value if !P then !Q.

Also, my first measure of truthiness which averaged over all the responses just didn't feel right and I moved to averaging it over the times when P or rather the left hand side (LHS) was true – for reasons I'm not sure I can explain –  this approach just felt better and gave better results.

So I cracked on with version 2, despite seeing Lucy's hand on the ball starting to twitch on version 1.

So with the improvements in my mind I managed to adjust it in just a few minutes. And the results where astonishing!....-ly boring. Here's the top 5...

  If 'Posted to Perl Mongers list      ' Then 'Subscribed to Perl Mongers list  ' (With 0.995 probability)
  If 'Posted to other list             ' Then 'Subscribed to other list         ' (With 0.990 probability)
  If 'Attended conference (non-local)  ' Then 'Attended conference              ' (With 0.970 probability)
  If 'Attended Perl Mongers (non-local)' Then 'Attended Perl Mongers            ' (With 0.965 probability)
  If 'Contributed to Perl 5            ' Then 'Subscribed to other list         ' (With 0.882 probability)

Not exactly breathtaking stuff, and there will be no prizes for what other list they are likely to be subscribed to if they have contributed to Perl 5.

Anyway to get something useful out of the list I ended up hand editing it myself, so here is my clumsy/poor attempt at interesting things from the Perl Survey 2007 (pay attention to the probabilities as the lower ones are more interesting).

  If 'Contributed to Perl 5            ' Then 'Contributed to CPAN              ' (With 0.780 probability)
  If 'Presented at conference          ' Then 'Subscribed to other list         ' (With 0.778 probability)
  If 'Contributed to Perl 6            ' Then 'Provided feedback                ' (With 0.701 probability)
  If 'Contributed to Perl 5            ' Then 'Attended Perl Mongers            ' (With 0.630 probability)
  If 'Attended conference              ' Then 'Attended Perl Mongers            ' (With 0.629 probability)
  If 'Subscribed to other list         ' Then 'Attended conference (non-local)  ' (With 0.112 probability)
  If 'Led other projects               ' Then 'Contributed to Perl 5            ' (With 0.111 probability)
  If 'Contributed to CPAN              ' Then 'Contributed to Perl 6            ' (With 0.101 probability)
  If 'Attended conference              ' Then 'Contributed to Perl 5            ' (With 0.099 probability)
  If 'Perlmonks                        ' Then 'Contributed to Perl 6            ' (With 0.089 probability)
  If 'Attended Perl Mongers            ' Then 'Contributed to Perl 6            ' (With 0.077 probability)
  If 'Posted to Perl Mongers list      ' Then 'Contributed to Perl 6            ' (With 0.072 probability)

To be honest I think Lucy has won once again. However if you are interested in the raw output have a look at v1_out.txt or v2_out.txt. And do take some heart from the performance figures  even with version 1's brute force/clumsy technique the program still managed to analyse over 5 million conditional proposition/data points in  20 seconds - maybe some day we'll kick that football.

Set the Hack Free #3 : AutOcado

I've always wanted one of those intelligent fridges (see: the Register (1998) or the Guardian ) that reorders food for me. But there are a few problems with this,

  1. As Jack Schofield points out in the Guardian there is no way I'm going to scan barcodes. Putting away the shopping is already enough hassle for me.

  2. I don't want some fancy new RFID fridge, If I'm going to spend the money on RFID kit I can do cooler things with it and as soon as I'd have a fancy RFID fridge I'd need fancy RFID cupboards as well.

However I figured out a cheaper approach while ordering from Ocado recently – quite simply do some basic analysis of the time between when you last ordered an item and the next time you order it and also the quanity in the first order.

All you need to do is get the computer to watch you or rather your e-mail receipts from Ocado, assume that over time you aren't going to horde or go without toilet roll for a few months. In fact a 0 stock level in your home isn't that important, if the computer watches you, it will reorder at the point that you should be getting concerned about your supplies based on your previous ordering history rather than at 0 stock, as after all you wouldn't wait to order at 0 stock yourself.

Now, I don't think you want to do this for everything you order from your online supermarket, its really for "store cupboard items". So you need to make a list of what you want to manage this way.

However brands change, etc. So how do you determine from some plain text like the email Ocado sends you with the list of things you've ordered whats toilet roll and whats washing up liquid that are on your list? Well you simply do a google search for the product name, e.g. "puppy dog soft tissues" and "toilet roll", in fact you do all permutations and then count the number of hits (long term caching would be nice here) - the one that comes back with the most hits for the pair of terms is probably the correct association (some degree of calibration/threshold is probably required).

Of course multipacks make things difficult - the best I can think of to solve this is to use price as an indicator of unit. So "super coke 24" if it costs 4.80 and if "super coke mega 48" costs 9.00 you can establish some basic relationship. with the later being just under 2x the base unit of the first.

Anyway, once you have the cleaned data, say ...

    week 2 order	toilet roll	4 units
    week 4 order toilet roll 8 units

 

You know that you will have to reorder toilet roll in around week 8 (((wk 4 - wk 2) * (8 units / 4 units)) + wk 4).

Of course firing up the report in say week 7 will mean you also have to do a little bit more math to take into account.

You will also have to track how often you normally place an order so you know when the target date should be for your stock levels.

One last point is that you don't want to order blindly, you do want to adjust the suggested order because when you do this the computer gets more human generated data to refine its model from. Oh, and I'm not even going to consider seasonal adjustments on things like bottled water.

Anyway I doubt I'll be implementing this soon, but if someone even more obsessive about this sort of thing than me does so, I'd like to hear about it.

As a side note, years ago I worked for a retail logistics company called Radius Retail in Edinburgh - Two cocenpts I'll always remember from this experience were ...

  • The way via OO that large supermarkets were treated like warehouses (they could effectively be a warehouse for the smaller "metro" sized stores).
  • The sheer amount of logic/analysis that goes into retail/warehousing.

So it's with the second point in mind that I'm guessing that there are probably companies already doing this sort of trend analysis, but instead of your fridge they are predicting stock levels of different stores/warehouses along the food chain.

Actually – WARNING MADNESS AHEAD! – i know of some groups of friends who live very close together but in different houses. So if they wanted, they could apply the first point as well. They could have a distributed warehousing concept between the group in order to smooth trends/increase likelihood of availability. I did warn you of the madness.

Set the Hack Free #2 : Optimal Hacking Music

Another hack idea I once had and never finished was to rate music by how it helped with coding. I tend to like fairly high tempo tunes that I can get into typing rhythm for some things, this isn't really dance/electro for me – a good example is what's playing just now Mrs. Robinson – by Simon & Garfunkle, its got a good beat[1]. Although I also like classical music if I know it well and of course there is my special West Wing thing, but explaining that is not going to happen in public.

Basically to figure out what music (not shows about people with a great attitude to their jobs) you best work to you you need two data sets – how in the zone of coding you are and what you are listening to at that time (you should probably also tell silence/white noise, but that's too scientific for this blog).

This first is on the surface easy, lines of code (I know, I know) counts from version control checkins. The second I thought would be the more difficult and involve me either looking to a music player to see if its database had the data, seeing If I could tell from the process (e.g. open files) what it was playing or maybe even writing a basic controller and logging the data.

Like all the best hacks I had a eureka moment. The music data is already available if you use a service like last.fm (if you go to my page, ignore the 3 hour Christina Aguilera marathon – that was the cat running across the keyboard).

So now its just a case of matching up the LoC count to the music that was playing, but the problem is there is no hard start to when you started working on that code – unless you really want to go back to using rcs/sccs.

So I never settled on a truly good solution to this – it gets simpler if you check in often like me or you check in a lot because of using svk. My best solution was to disregard the data gathered from a the first check-in in of the day and measure between that and the next ones. The problem of meetings/lunch is dealt with by the fact that you probably turn off your music while away.

It's also one of those annoying hacks that you can't just crank out in an hour because it needs a lot of testing hours to get enough data to refine it.

Anyway there's the hack idea, if someone implements it let me know – and please feel free to hardcode a couple of rules in so that it doesn't appear that I do my best hacking to Christina Aguilera.

[1] The bit before the the guitar solo (the one Dave Grohl plays in the video) and in also fact the guitar solo  in Tenacious D – Tribute is pretty damn good for hacking as well.

Set the Hack Free #1 : Lazy Regression Testing for Perl.

I've got a couple of notebooks and many more loose papers with ideas for hacks that I never get around to implementing. This seems like a waste. So I've decided to start taking some of them to write up here, include any skeleton/stub code I've written and then basically purge myself (and my svn sandbox) of the burden of trying to hope that I'll get around to doing a decent implementation one day. This is the first such hack idea, so be free little hack idea, fly, fly like the wind!

I'm lazy (Larry says thats ok, although he also says that you should be dilligent – but lets ignore this) and because I'm so lazy I like the computer to do my work. Although I'd still like to get the salary/rewards - the computer would just blow it on more SLI graphics cards (bizarrely Nvidia now sell clothes) and those fancy neon case mods.

Anyway, I was thinking about regression testing one day, especially the problem of using Perl in big organisations. Imagine a programmer, lets call him Dave, now Dave has just knocked up a great new program that needs to go into production, it uses use one module – Acme::Shiny. The good news is that Acme::Shiny has already been installed on the production servers and LOTS of programs use it, the bad news is that Dave being uber-keen needs the latest shiniest version of Acme::Shiny to use the new method do_super_shiny_thing(). The problem is how do you know that the author of Acme::Shiny hasn't decided to break the interface for all the other programs that already use it.

And did i mention that Dave is the CEO's nephew and for entirely unrelated reasons the CEO is really keen to see this program go live.

Now you could go and look at the interface by hand (and at some point you should), but this becomes difficult especially if its not just Acme::Shiny that you are worried about – imagine that it also uses the bleeding edge versions of Catalyst and Plagger and their one or two dependencies.

So what is my hair brained solution for this? Whenever you install new software you keep its test suite, so somewhere you have the tarball for your installed Acme::Shiny (or you could grab it off CPAN). When it comes to the new version you run the tests as normal, but then you delete them and copy the tests from the old and installed module. If you are lucky when you run these old tests they check the old/expected interface reasonably well – and any change of interface will be spotted by the previous tests failing.

Of course this is far from fool proof, but like most tools its simply there to help you do your job a little better.

Anyway hopefully this idea is more useful here rather than in my notebook.

Syntax Highlighting / Code Colouring

(shameless abuse of current reddit karma - if you are in London and interested in Perl you might like to check out this years London Perl Workshop, it's free and it should be a pretty good day - although i'm biased as i'm the organiser - Greg)

I normally don't like syntax highlighting, I find it more irritating than I find it useful - probably because I don't need a crutch to understand the syntax as well as a computer can interpret it.

But I've just been reading an old email by Schwern that talked about traffic lights (a subject dear to my heart) and Perl 6 - and while he was intending to discuss the non-colour aspects of traffic lights it sparked a thought in my head about Perl 6 and colouring.

And I realised I had at last a use for code colouring, in fact I had lots of uses.

  • When Perl 6 arrives I'd like a two colour system that indicates whats a new Perl 6 feature and whats an old Perl 5 feature. (this was the one that sparked the train of thought)
  • When I'm being a maintenance programmer I'd like a colour system that indicates the age of the code sections when I'm tracking down a newly found bug.
  • When I'm being a code reviewer I'd like to be able to tell what is covered by unit tests and what is not.
  • When I'm optomising code I'd like frequence of execution mapped to the colour so I can see the hot spots in a nice hot red.

The question is now, how much can I be bothered to hack around with elisp to get any of the above working.

Greylisting

I recently attended Jesse Vincent's training day  on RT. (Gosh that was a lot of links).

What was great was that these people were all administering significant RT deployments and because of their knowledge of the application, such a specific focus on it for the day and also the nature of RT we could often talk about wider architecture concepts such as database optimisation and mail configuration without losing focus.

The later (mail configuration) spawned what I want to talk about in this post, but I simply have to recall one anecdote from the day that I loved first.

The problem being discussed was people replying to the email they receive telling them that a ticket had been closed. Two responses to this are either ...

a) No its not bloody well closed, some kid came down and told me to reboot my PC and then wandered off before the machine had even restarted but not before he smirked at my troll doll collection sitting on top of my monitor. Section 21 of the Staff Handbook entitles me to decorate my workspace you know?!

b) Thanks.

Now in case (a) you want to reopen the ticket and in case (b) you don't, and of course you want a computer to do this. But computers still aren't that good at understanding text – I'm sure I can find some crazy MIT'er who will put this down to the industry rejecting LISP Machines, but then again I'm sure I can find another crazy MIT'er who thinks they can produce a perpetual motion machine. Anyway I digress.

So the question came to Jesse of how do you handle it. And I really dug his answer, maybe its my background in sales or maybe it's just my love for hacking, he proposed a social engineering solution where you set the closed text message to be something like.

“Your ticket is now closed if you are happy with the level of service, could we please ask you to mail our boss (boss@bigorganisation.com) to let him know instead of replying to this email. Blah blah blah”

Anyway maybe its just me, but I loved this solution, but on to the main and probably a lot more brief topic for this post.

Spam

Anyway Jesse talked about grey listing, well he probably talked about gray listing but you have to cut him some slack he was the speaker after all.

Basically this 'grey listing' is the strategy of doing a temporary reject on all email if you haven't white listed the address. Now good and nice mail servers will wait and retry a few times and once they do so after your threshold of keeping them outside waiting you can not only accept their message but also white list them so they don't have to wait in the future.

So I implemented this using greylistd and exim. It took about 30 minutes.

Now previously my daily spam levels where something like (and I should point out I hadn't tweaked spam assassin (SA)) ...

Caught by SA ~160
In my mbox ~60

I measured it again recently ...

Caught by grey listing ~225
Caught by SA ~5
In my mbox ~5

And to say I'm pleased is an understatement. The only inconvenience so far is asking a friend who I was talking to in IRC to make a document available via HTTP as it was time critical, he wasn't currently white listed and it was simply easier than manually white listing him and getting him to kick his mail server.

Leaving Venda

I've resigned from Venda and am on the job hunt again (CV here). It was clear to me that you really needed to have at least a years experience of the platform to deal with the short term challenges of being head of support and while I could deal with the long term strategy for support, you can't do that without managing the day to day issues.

The good thing is I've submitted a set of recommendations, which have been accepted about how to make support work well with someone with the platform experience and still get the long term improvements. And I'm now working with the the next (and former) head of 2nd line support to see them through including working with him as a mentor on line management/business issues (my coaching mad HR expert of a wife is very pleased to see a person and an organisation taking mentoring seriously).

I thought about staying on and doing a more technical role, but I decided to take advantage of what I can only best describe as an employment iteration (think Newton Rhapson, where you should get closer to the root/ideal) and take another look at what was out there and what was right for me – having said this, I'd highly recommend Venda to any Perl hackers who want to be given a chance to make a difference In a rapidly growing company which has a great new office for the techies and loads of opportunities for advancement and travel (I believe that while they've completed the bulk of  their recent recruitment, they are still interested in direct (i.e. non-recruiter) candidates, so for more info please go here or here for the US).

I'll be staying on until the 2nd of November working four days a week on tools to help manage large code bases (i got the ok yesterday to open source the first very simple one of these), visualisation tools to extract information from RT,etc. and mentoring/coaching. And I'm sure I'll be staying in touch to see how support pans out over the coming year.

Finally I'd like to thank everyone at Venda for their professionalism, thoughtful discussion and friendship – Good luck for the future!

 

RFID Hacks

Someone mentioned RFID in passing recently on a work mailing list and I'm afraid it has reignited my interest in just how many cool hacks you could do with RFID on items. Unfortunately checking out kit again on a few websites shows that you can either have cheap and short range  or expensive and long range  neither of which suits for the sort of hacks (and the budget I have) that I want to do with RFID.

However partly because I want to try and avoid keeping hack ideas inside me when someone else could do them and partly because I've just finished another Coupland novel and making lists seems like a good idea, I thought it would be fun to brainstorm out the hacks I'd do if i had a cheap long range (10m) reader.

The following unless otherwise stated assume scanners/readers near my home workstation, home front door and office workstation.

  • Tags on both myself and my keys/phone/other essential item – set off a buzzer/alarm if only one if the tags comes into the area near the front door.
  • Tag on my boss so that if he comes near my workstation it automatically fires up a boss screen as seen in old 80's/90's computer games ;-).
  • Tag on co-workers which is compared to people I want to talk to so that I don't have to go to their workstation and potentially interrupt them, instead when they are over talking to me I can bring up the topic I wanted to discuss. I could also do this with my wife to remind me of the things we want to talk about when she comes to the study etc. or tasks we need to do together. Actually the more I think about it, a system to do “conversations I want to have with this person” would be really useful, it could even just link in with dopplr instead.
  • Tag on me to track my commute time, by looking at when I leave the house and when I get to work to optimise the best time to leave to have the shortest commute time.
  • Tag on Hobbes (and additional scanner on cat flap) just to find when he goes in and out during the day and night – although this (definetly worth following for cat owners) is cooler.
  • This is a bit of a weird one – iconic book sized coloured items with tags that represent tasks groups. They would have to be very visual and able to sit on a desk. Their purpose would be to identify the sort of task I sat down to do, so for instance.
    • Some work hacking for Venda.
    • Household stuff.
    • Law homework.
    • etc.

They would bring up the appropriate folders etc. but would also be a reminder of what I sat down to give my time to. I think this idea comes from an old show that I doubt anyone remembers called chock-a-block where some sort of computer/production machine was programmed by huge block's.

  • Tag on myself and other co-workers (with a portable reader) to measure how much time I spend talking to other people to ensure communication is kept up with key individuals.
  • Tagged (and laminated) favorite recipe cards that I could hold up to the computer when ordering from Ocado and have some sort of hack add to my shopping basket.
  • Tag on key law books and myself, with maybe a few extra scanners, to ensure that I spend enough time in the proximity of those books to try and get a feel for if I am or not keeping up to date with the reading.
  • Tag on myself and my wife (with a portable reader) just to make sure I prioritise what is really the most important thing and we spend time together.
  • Tag on either of us linked into some sort of power consumption device so we can ensure we spend time in the living room with the TV off.

Ok, I'm probably getting obsessive now, but hopefully some ideas will spark someones imagination.

Software Patent Quotes

For fairness I should note that not all anti-software patent people are Slashdot crackpots, here is a nice selection of quotes originally compiled by "softwarevisualization" in a comment section over at PatentlyO.

"Congress wisely decided long ago that mathematical things cannot be patented. Surely nobody could apply mathematics if it were necessary to pay a license fee whenever the theorem of Pythagoras is employed. The basic algorithmic ideas that people are now rushing to patent are so fundamental, the result threatens to be like what would happen if we allowed authors to have patents on individual words and concepts. Novelists or journalists would be unable to write stories unless their publishers had permission from the owners of the words. Algorithms are exactly as basic to software as words are to writers, because they are the fundamental building blocks needed to make interesting products. What would happen if individual lawyers could patent their methods of defense, or if Supreme Court justices could patent their precedents?"

-Donald Knuth

"...There is absolutely no evidence, whatsoever—not a single iota—that software patents have promoted or will promote progress..."

-Jim Warren (Autodesk) 1994

"In the majority of cases in software, patents [affect] independent invention. Get a dozen sharp programmers together, give them all a hard problem to work on, and a bunch of them will come up with solutions that would probably be patentable, and be similar enough that the first programmer to file the patent could sue the others for patent infringement. Why should society reward that? ... The programmer that filed the patent didn't work any harder because a patent might be available, solving the problem was his job and he had to do it anyway. ... Yes, it is a legal tool that may help you against your competitors, but I'll have no part of it. It's basically mugging someone"

-John Carmack (id Software) 2005

"...SAP would not need patents to protect its investments and is collecting them only as a defensive weapon to prepare for litigation in the U.S..."

-Prof. Hasso Plattner when Chair of SAP Board

Peer to Patent

I'm a bit of fence sitter when it comes to patents. On the one hand I think that inventors and companies should be encouraged to carry out true R&D and on the other hand I hate stupidity. And there are a lot of stupid patents, from the weird crazy to the crazy by triviality. Yet I can accept that patent offices have limited resources.

Also the anti-patent/anti-software patent movement isn't exactly helped by the crackpots on Slashdot (apologies gentle reader, but I don't have the inclination to wade through Slashdot comments for an example), so it's especially nice when some geeks come together and do something useful about it as they have done with Peer to Patent, I just hope it can be sustained and that similar efforts start in the UK and EU.

Tolkien themed spectrum games.

Lotrkhazaddum This is just quite simply awesome. Someone has made Tolkien themed versions of Manic Miner (The Hobbit) and Jet Set Willy (The Lord of the Rings). You can find them for download here.

And if this makes you feel nostalgic for the speccy, then the Your Sinclair Top 100 will compound that.

Feature creep

I'm not sure this is that interesting however Dave H. asked me to write a blog post so here it is.

This morning (tuesday) at 3am I woke up worrying in a good way about work, so I got up and checked email etc. and decided to finish a hack to get me back to sleep.

I've recently set up a 2 system (blackbook + generic linux workstation) setup with the laptop screen + 3 other screens (+synergy) and wanted to apply a very wide wallpaper/background. So I started on wsplit, a program that could take a single large image and split it up into a series of smaller images appropriate to the height (and width) of the displays.

So I give you wsplit - it is crappy code, but it was 3am.

Observe,   

    http://www.flickr.com/photo_zoom.gne?id=1117896163&size=o     
    http://www.flickr.com/photo_zoom.gne?id=1118743696&size=o

The key feature is best seen if you look at the join between the blackbook and the next monitor. Both on the left.

Anyway the topic is feature creep...

No sooner than I had wrote this I realised I needed a new feature that would also take into account the screen casing widths between the display areas so that I could split things even nicer.

And so In conclusion, feature creep hits every program even if its a 3am exercise to get back to sleep.

Another day brought more tidying of subversion. This time I came across a tiny script I wrote for Simon one day. It took in data, counted the occurrences of a string and made a tag cloud from the data using Leon's module.

My problem is that I find it hard to throw this sort of thing away, its trivial and I can recreate it in under a minute (in theory), but I hate throwing away a file because 'I might need it' or 'I might crib from it' or for some other made up reason that excuses me from throwing away stuff.

So in order to avoid the dreaded svn rm I added a little value to it and created module_cloud which scanned various dirs, looking for text files that might be Perl, and analyzing what modules I used.

(Interestingly this for once helped me with my initial task of sorting out subversion as I realised DBI was missing, and I have a lot of personal code that uses DBI that I now have to hunt down and import to svn)

Anyway, here's my 'Module Cloud' (The script ignores some files such as .t files which explains the absense of the very excellent Test::More).

AlarmClock::LDB
AlarmClock::Mouse AlarmClock::Music CGI CPANPLUS::Backend Carp Config Cwd Data::Dumper DateTime DateTime::Duration DateTime::Format::Duration Devel::Symdump Device::USB Email::Send Email::Simple File::Basename File::Find::Rule File::MMagic File::Spec::Functions File::stat Geo::Google Geo::Google::Location HTML::Template IOU IPC::Run Katamari Kevin LWP LWP::Simple List::Util Log::Log4perl Math::Matrix Module::Build Net::Jabber RTT Shiny::Bank Shiny::Edge Shiny::Evidence Shiny::Node Shiny::Player Shiny::Protocol::Move Shiny::Protocol::Register Shiny::Protocol::World Shiny::Protocol::WorldSkeleton Shiny::Skeleton Shiny::World Siesta Siesta::Plugin Storable Tk URI Underscore VCS VCS::Cvs VCS::Cvs::Dir VCS::Cvs::File VCS::Cvs::Version VCS::Dir VCS::File VCS::Hms::Dir VCS::Hms::File VCS::Hms::Version VCS::Rcs::Dir VCS::Rcs::File VCS::Rcs::Version VCS::Version WWW::LiveDepartureBoards WorkGrid

Desktop Traffic Lights

Picture_1
I've been cleaning up my personal subversion repositories and I've come across some files that are little more than an idea quickly noted down. Yesterday I came across an idea I had for a feature for a desktop file/folder environment.

Basically, because I generally don't keep a very tidy desktop I often lose files I've just downloaded or moved to the desktop, so I wanted a way to make recent files flash when I hit some hotkey combination. I was just about to blog about this to ask if this has turned up in any window manager (e.g. I suspected Beryl would be a likely candidate) or as an OSX hack when I realised there was a way I could script something similar using the color labels in OSX.

Basically I could use 4 of the colours (red, orange, yellow and green) to indicate if a file was >28 days old, 28-7 days old, 7-1 days old or <1 day old.

So I started hacking and ended up with a nasty hack in Perl that generated the appropriate AppleScript. It's really ugly and once I resolve issues with the upgrade of my enviroment to OSX on Intel I'll take care of some this (it currently opens a pipe to osascript), I should, but probably won't, translate it all to AppleScript. I also need to hook it in as a folder action (mmmm, AppleScript calling Perl calling AppleScript ... nice) and port it to my Linux desktop (I note my gnome window manager has emblems that could do the same job).

Anyway you can see a "traffic lighted" screenshot to the right, files are currently sorted by modification time and are in order apart from scully_fowler_screenshots.tar which appears to be out of order because the hack also takes account of 'first seen time' the time a file is first seen on the desktop.

Sticking with Perl

I wanted to make a backup this morning of some files hosted on Zimki and quite conveniently it offers a backup facility. Unfortunately it downloads the files as unique_id.extension however it also gives you a copy of zimki_file.json that contains the mapping information to turn the unique_id.extension files into human readable names.

Armed with this json data I wrote a quick and nasty tool in Perl to fix this, nothing new here. But what I did notice after the tool was finished was just how many modules I used to make the job easy and quick, ignoring the pragma lines the use list was as follows.

use File::Slurp qw(read_file);
use JSON;
use File::Copy;
use File::Basename;
use File::Path;
use Data::Dumper;
use Log::Log4perl qw(:easy);

With these (ignoring the two used for debugging) I managed to read in a file, parse JSON, move files, parse filenames and create complex directory structures. Meaning the meat of the program was only 6 lines. In fact the craziness, that I suspect is wrong that I had to do to get File::Basename to work how I wanted it was longer.

It only strengthens what I think Dave Cross said at the London PM Teachin that Perl was the ultimate glue language these days and most programs basically just glued together bits of CPAN to get the job done.

#london.pm recommends ...

Once I finally got around to watching The Shawshank Redemption after having it sitting in my DVD collection for a year or more, I realised that even though I thought I had watched most of the great films there was still ones I hadn't seen or maybe didn't know of.

So I asked #london.pm for one or two recommendations. Naturally they took to this with excessive enthusiasm and listed far more. Anyway I promised I'd write up the list for them so here it is. (Please feel free to ignore films that have Elizabeth Shue in them, that seemed to be a bit of a tangent that the channel went down.)

Sharing is good for you.

I've just spent the last 4 hours trying to get nvidia drivers working well with Linux, and I don't mean a casual stop for a coffee 4 hours - I mean the absolute attention, my will against the machine, food/bodily functions on hold sort of hours. Everything was as it should be the kernel module was there the driver was there, but the damn x server just wasn't picking up the driver.

darla:/usr/lib/xorg/modules/drivers# ls
apm_drv.so        cirrus_drv.so     i128_drv.so      newport_drv.so  rendition_drv.so      
ark_drv.so        cirrus_laguna.so  i740_drv.so      nsc_drv.so      s3_drv.so            
ati_drv.so        cyrix_drv.so      i810_drv.so      nv_drv.so       s3virge_drv.so      
atimisc_drv.so    dummy_drv.so      imstt_drv.so     nvidia_drv.o    savage_drv.so         
chips_drv.so      fbdev_drv.so      mga_drv.so       r128_drv.so     siliconmotion_drv.so 
cirrus_alpine.so  glint_drv.so      neomagic_drv.so  radeon_drv.so   sis_drv.so          

It's there, see it! 4 columns across, 4 rows down.

Anyway what I missed the first time I checked this directory was how all it's friends had little s's in their extensions and it only had a lonely little o.

ARRRRRRRRRRRGHHHHHHHHHHHH!

gcc -shared -o nvidia_drv.so nvidia_drv.o 

..... fixed it.

If somethings worth doing ...

... its worth doing completely and utterly obsessively.

First off, please open the following link in the background, trust me it'll take a while to load, and I'm including those of you that work at ISPs here.

   

http://board.spawn.com/forums/showthread.php?t=209328

Now, think about action figures, the sort of things you see on sale at Forbidden Planet, you know Spawn, Simspons, Alien v. Predator - that sort of thing. Now choose a theme (i.e. a movie or comic book title). The chances are this guy has it.

It's a truly amazing collection, you just need to keep scrolling down and see the full extent of his obsession. I think it was the Emperor's guard, complete with Vader, royal guards and a troop of stormtroopers that made me realise just how awesome this collection was. Also his wife's My Little Pony collection was pretty extreme as well.

Oh and here's page 2.

   

http://board.spawn.com/forums/showthread.php?t=225990&page=1&pp=35

(Thanks to someone called trench , for the link, who I only know of because their blogging partner goes by the name of penge which keeps firing off my Google alert for local news)

Mojitos and Memories

Last Friday was my last day at Fotango and I must admit that in between the Mojitos and story telling it was quite sad to leave so many great people. I can honestly say that Fotango has been one of the best learning experiences of my professional life, I feel I almost need to tattoo what worked (and didn't) for managing my team onto the inside of my eyelids – its all simple ideas, but once again I’m reminded how important remembering simple ideas and keeping them in your mind is.

I’ll also always remember just how many great people worked at Fotango at one point or another over the years. It reads like a who’s who of the London Perl community. Also the freedom you had to try things was incredible.

Anyway I’m now unemployed (and loving it) although I do have a verbal offer, so It’s time to move on, starting with 28 hacks later – more info to come.

Today's Resignation

Today, one of the great world leaders resigned, oh and that Tony Blair chap resigned as well.

Yip, I've took the leap (even though I never invaded Iraq), after a couple of years of spending more time with Word than emacs and after talking it over with Simon I've decided to leave the mothership and see what else I could do.

My last day will be on the 8th of June and I have no idea what I'm doing next, although my wife and the future sole breadwinner has given me the OK to look at all options. Whether that will be doing a startup, going back to the care free life of a contractor/consultant or finding some sort of senior programmer/architect role, I really don't know yet. But we've done the maths and with a few cut backs we can have enough financial freedom to buy me the time to find out what will make me happy from day to day.

Anyway I'll keep you updated with this new adventure.

p.s. http://www.mccarroll.org.uk/~gem/pages/cv/

 

No Value Here

This is probably a post that is of little value to the reader and is more useful to myself as a historical marker. But I just wanted to note that I've finally got around to applying a patch to WWW-LiveDepartureBoards that I've had for ages and done nothing with.

Thanks to Adam Trickett for the contribution.

I guess the only thing interesting to comment on was how much I was out of touch with the code of a really simple module after not looking at it in ages. For instance I had forgotten how it makes an intelligent guess of the day in the absence of this information to create a proper datetime value.

21 days late

I've consistently recorded 25lbs of weight loss now, 25% of my target for the year. I'm 21 days late in hitting this target but I'm going to try and catch up.

However, I'd like to take this chance to thank everyone who has came up and spoke to me with kind words after reading my blog about the goals I've set myself this year. It has really helped.

I caught a hacker!

This morning, I was ssh'd into my mail server and for my usual paranoid reasons (and yes I know a rootkit could make these false) I did a couple of commands, one of which was a netstat.

To my horror someone from an IP/host I did not recognise was accessing my imapd server, in fact on closer inspection of the logs they were authenticating ok as me!

I immediately went into defense mode, changing passwords, configuring firewall rules, finding out who they were. And it was the last one that really irked me, they were in Bromley give or take a few ISP generalisations.

Eventually, using a level of cunning not even seen in films such as "The Net", I tracked them down to a specific individual, who had used their neighbours network briefly due to their powerbook (yes I could even tell what type of machine it was) being in a general WiFi DHCP mode.

It was me.

Why mortgage brokers are a good thing.

I'm working at home today, partly to get a valuation done and partly to get a GP appointment. To ensure I will be able to attend the later I make a quick call to the surveyors involved, just to find out with greater accuracy what time they will be coming at.

    "Oh, I'm afraid you're valuation has been cancelled by a panel manager"

Huh? This is most annoying as a) I arranged to work at home in an extremely busy week and b) there have already been grumblings from the lender about how I just sneaked in before the Bank of England rate change. One quick call to the mortgage broker and the following things happen in under 15 minutes,

  • Call from a new surveyor's manager who can value the property today at 1pm.
  • Call from the old surveyor's (or maybe someone else - i got a lot of calls very quickly) explaining they had two valuations on the same street today and when they were told to cancel one they cancelled the wrong one.
  • Call from the lender apologising for the mixup.

So in general, my mood has gone to furious to delighted, albeit delighted with my broker/IFA who managed to kick ass extremely quickly. (FWIW, he's Steve Dominguez and he works for Clover Morgages Ltd).

Stoned

I'm pleased to report that I hit my 7 week target today for weightloss. 14lbs, although it reads better as 1st. Sure the fever last night might have helped, but if there is one thing I've learned from watching my weight for the last few months, it varies incredibly and you really can only consider major reoccuring patterns.

i.e. the cheeky curry you had yesterday is probably not responsible for your weight gain, but the fact its always wednesday when you hit your target probably is just your body taking time to get rid of the glycogen the body stored up during the weekend when you were not as well disciplined.

Anyway, i've lost a stone and i'm the lightest i've been in 5 years (the diet i did in 2002 ended up with only 12lbs weightloss from the same starting point). I'll report again in week 14 of the year.

Nobody puts Baby in a corner.

I don't usually pass on meme's, however this was so good I just had to.

A couple for their wedding dance did the choreography from the final dance in dirty dancing. It's great entertainment and thanks to the happy couple for sharing it.

It's really the dance equivalent of karaoke, maybe we need dance-karaoke bars! It could be the new fitness craze.

Spectrum of SaaS/Utility Adoption

I've been slowly moving towards outsourcing all the online stuff I do, I recently moved images.mccarroll.org.uk to Zimki and as I mentioned in passing here, I've just outsourced by subversion repository to these guys.

(Although I'm not sure that outsourcing is the best word, as it has certain connotations).

Anyway this got me thinking about the adoption of SaaS/Utility services by myself and maybe others - I made some notes and created an "Adoption Spectrum" and thought It might be interesting for some of you.

It should be explanatory but what's probably interesting is the way the factors play off against each other, a good example is web mail, clearly privacy is an important factor against adopting a SaaS approach to this however factors such as accessibility (you can check a website on holiday) and the fact its a commodity (hotmail is the standard for many) mean that it's something that people adopt fairly early on.

I'd welcome comments on wheter you agree with my spectrum, factors or even just where you see yourself on the adoption spectrum.

Veronica killed the radio star

I probably listen to more radio than the amount of TV I watch. I've got simple tastes, the today programme in the morning and maybe some radio 5 in the car and radio 5 in the car on the way home. I share this car ride with my lovely wife, Veronica. Unfortunately she has decided she doesn't want to listen to any of the above and prefers some commercial station called XFM, which of all things has music on it!

This has recently caused disagreement. Her initial proposal was that I choose the morning station in the car and she chooses the evening one. However, this would be unfair as I would have to use my choice for the basic requirement of getting the morning news. So I countered her offer with a proposal of my choice of radio in the morning or educational podcast/recording and the iShuffle in the evening, or XFM if the iShuffle isn't charged (I'd now like to add a caveat to this that says we listen to Radio 5 if Spurs or an international team I follow are playing in the evening).

Being childish, she stated she wasn't prepared to listen to a counter proposal and said that if we couldn't agree, we'd simply turn it off - and thats what we did.

So rather than let her get away with this childishness without similar childishness I threatened to blog about it - so ha! See I can be just as childish!

For those of you who are not married, welcome to the fun of childish, irrelevant and pointless debates that help the years pass :-).

Wow

I've just had a play with Yahoo! Pipes and I'm hooked, I just can't believe anyone hasn't done this already.

The concept is a fairly old one, I seem to recall an old piece of visualisation software from AVS (OpenVIS maybe) that used the pipe model to allow the user to build up data flows and analysis that would usually be pumped out to visualisation widgets.

Anyway I strongly urge you to check it out.