Mashed-up Expenses

With all the controversy over MPs’ expenses, I wasn’t unduly surprised to see that Parliament have released the raw data in PDF form. Nice for presentational purposes, but bugger-all use for analysis. Funny, that?

I briefly toyed with attempting to convert from PDF to spreadsheet, but the Guardian got there first – and they’ve released the raw data in Google spreadsheet form so anyone can mash it up and see what happens. Cue some hacking…

MySociety’s TheyWorkForYou site has an API that allows you to do all kinds of intriguing things, including retrieving geographical data about constituencies. A few Ruby scripts later, I’ve got the raw data pulled down – the latitude and longitude of the constituency centre point, from which it’s a very quick job to calculate the distance from Westminster using a great circle calculation. I figured that as I was mainly interested in travel, using Westminster tube station as the centre point from which distances were calculated would be a good-enough approximation. Of course, it’s a bit of a rough calculation – the centre of the constituency isn’t necessarily the point from which the Member travels, and if the constituency has an odd shape it’ll distort the result slightly. But no more than they’ve been distorting the system…

The results are here —> There are one or two glitches – for some reason, the Northern Ireland constituencies don’t have lat/long data so those are broken, and there’s the possibility that things got slightly mixed up for MPs that share surnames. But a quick scan suggests it’s OK.

The next stage would be to start correlating between distances and expenditure on travel, and plotting this in interesting ways. Cue a rapid learning curve on Google Maps.   This is something of an overnight hack project, but it does raise some interesting questions about the way data can be interpreted in ways that the originators probably didn’t expect – I’d be amazed if Google Maps mashups ever crossed the minds of the Commons Fees Office when they (reluctantly) released the information.   Given the raw data, some controversy and geeks with time on their hands, it’s unsurprising that interesting things result.