[![Twitter](http://www.twitter.com/shippingcast)One of the aspects of Twitter I like the most is the fact that it has an API. That makes it hackable in a way that appeals to my inner geek, so something I’ve been playing with for a while is creating Twitter bots. One of the first I made was @shippingcast, which takes the UK Shipping Forecast and squirts it out onto the web in a Twitter stream.
Unless you’re British, the Shipping Forecast probably doesn’t mean a lot. It’s one of those curiously English anachronisms – four times a day, the BBC broadcasts weather forecasts for offshore shipping areas [RealAudio link] on Radio 4 (the BBC’s main “highbrow” speech station). The waters around the British Isles are divided into areas, and the whole thing follows a strangely poetic format. There’s an entire generation of Brits who have never left dry land, but can recite “Dover, Wight, Portland, Plymouth” as if they know what it means.
Slightly eccentric, syntactically-regular? Must be ripe for a Twitter bot. Here’s how I did it.
First, a quick digression/rant about the source of the data. The Shipping Forecast is published twice a day by the Met Office and then reposted on the BBC’s website. This causes no end of hassle for the aspiring Twitterer, the reason for which usually comes as something of a shock for non-Brits. Despite the fact that it’s the US that is meant to be the land of capitalism and free enterprise, the Americans enjoy a strangely socialist attitude to data and information that the taxpayer has funded and provide it free for all-comers. We Brits, on the other hand, have been lumbered with what are known as “trading funds” – quasi-public organisations like the Met Office instructed by government to behave like private enterprises. They tend to guard their data and information as if it were a scarce, rivalrous monopoly good, and will seek every opportunity to charge for it. Never mind that there are far greater benefits to society and the economy as a whole if the data were available for reuse; and never mind that I as a taxpayer have already funded the production of it. No, these organisations have all the safety and security of being public sector, but like to play (and pay) as if they were shareholder-funded. The upshot of this is that – technically – I’m breaking the law by Twittering the Shipping Forecast.
So, back to the “how”. The Met Office have made it fairly difficult to grab the data from their site, although I’m in two minds as to whether that’s because of their aforementioned grasping nature, or the fact that they’re yet another public sector Microsoft monoculture who knows little and cares less about web standards and playing nicely online. This means I have to grab the data from the BBC site, and they format it for viewing rather than parsing. Every six hours, a cron job kicks off which runs a Ruby script to grab the HTML page. This then gets squirted through Hpricot, a Ruby-based XML parser that discards the extraneous stuff and extracts the data that I’m after. The BBC use tables to display the forecast, which makes it relatively easy to grab the “meat” and drop the “sugar”.
Once I’ve got the data itself, it needs formatting. This involves iterating across the table to grab the forecast data for each area (or group of areas), and then stripping out extraneous content to squeeze it into 140 characters. Then finally it gets Tweeted using the Twitter4R library to talk directly to the Twitter API.
So, how well does it work, and what’s worth learning from all of this? Firstly, as a proof of concept for publishing microcontent, I think it works quite well. The process is quite simple, and the information pops up into a Twitter feed on a nicely regular basis. There’s a useful side effect of the cron job, as well, which is to act as a “heartbeat” for the server that it’s running on.
As far as the information being useful, though, it’s less successful. The Shipping Forecast is optimised for broadcast – it’s written to be read aloud rather than read from the page. This means that although there’s a standardised syntax, the format is less standardised – if several sea areas are going to experience the same weather, then they’ll be lumped together. Which is incidentally why phrases like “Dover, Wight, Portland, Plymouth” have such resonance to a Brit of a certain age. While it’s undeniably poetic, it’s a bugger to squeeze into 140 characters.
By contrast, aviation forecasts are very much more standardised, particularly automated services like METAR. “EGLL 130750Z 21006KT 7000 FEW011 BKN014 07/06 Q1009” manages to convey “13th January at 0750 UTC – wind speed 6 knots at a direction of 210 degrees, 7km visibility, a few clouds at 11,000 feet and broken cloud at 14,000 feed, air pressure at ground level 1009mb, temperature 7 degrees Celsius, dewpoint 6 degrees Celcius” in 50 characters. And once you know what it means, it’s actually no less readable. And writing Twitter bots like @EGLL_METAR is comparatively simple.
Then there’s the sheer quantity of information to convey. Each area forecast covers wind, sea state, weather and visibility information – and uses descriptive terms rather than wholly-numeric values. While this makes it a lot more readable – “squally showers” is somewhat more descriptive than having to look up “#437” or something in a table – it doesn’t make the Twitter bot’s life any easier. So in order to compress it all in, I’ve had to convert the forecasts into something rather less verbose than txt-speak – which somewhat defeats the object.
Another limitation is that the bot squirts out the whole forecast for all areas, rather than individual ones. This does tend to flood your timeline, particularly if you’re using a mobile device that will only display a limited number of Tweets per page. It’s mainly my fault, because I set up the @shippingcast account rather than a number of smaller areas. But then @dogger or @germanbight don’t have quite the same ring to them as @shippingcast. What I should do, if I ever get around to it, is to set up the bot so that you can “subscribe” to specific areas and then be DM’d with the relevant area forecasts.
Of course, this was mainly done for my own entertainment and education, so I’m not too bothered by the limitations. What’s more interesting is the way that an online service with an API can get exploited for purposes that would never have been envisaged at the outset – I’m pretty sure that the creators of Twitter wouldn’t have thought of the Shipping Forecast as a use case. It’s also been the starting point for some more “successful” bots (for a given definition of success) – my favourite is @riverthames, which tweets high and low tides to keep @towerbridge company. And there are many more “things” waiting to be animated – @thepips and @bigbenclock for a few.