About   |   Projects   |   Elsewhere   |   Work   |   Feeds   |   Contact

Archive for 13 January 2009

Twittering the pips.

This is possibly the most pointless Twitter bot ever, but @thepips now Tweets on the hour.

It’s a British thing.

13 January 2009

Play

No comments yet

Twittering the Shipping Forecast

TwitterOne of the aspects of Twitter I like the most is the fact that it has an API. That makes it hackable in a way that appeals to my inner geek, so something I’ve been playing with for a while is creating Twitter bots. One of the first I made was @shippingcast, which takes the UK Shipping Forecast and squirts it out onto the web in a Twitter stream.

Unless you’re British, the Shipping Forecast probably doesn’t mean a lot. It’s one of those curiously English anachronisms – four times a day, the BBC broadcasts weather forecasts for offshore shipping areas [RealAudio link] on Radio 4 (the BBC’s main “highbrow” speech station). The waters around the British Isles are divided into areas, and the whole thing follows a strangely poetic format. There’s an entire generation of Brits who have never left dry land, but can recite “Dover, Wight, Portland, Plymouth” as if they know what it means.

Slightly eccentric, syntactically-regular? Must be ripe for a Twitter bot. Here’s how I did it.

First, a quick digression/rant about the source of the data. The Shipping Forecast is published twice a day by the Met Office and then reposted on the BBC’s website. This causes no end of hassle for the aspiring Twitterer, the reason for which usually comes as something of a shock for non-Brits. Despite the fact that it’s the US that is meant to be the land of capitalism and free enterprise, the Americans enjoy a strangely socialist attitude to data and information that the taxpayer has funded and provide it free for all-comers. We Brits, on the other hand, have been lumbered with what are known as “trading funds” – quasi-public organisations like the Met Office instructed by government to behave like private enterprises. They tend to guard their data and information as if it were a scarce, rivalrous monopoly good, and will seek every opportunity to charge for it. Never mind that there are far greater benefits to society and the economy as a whole if the data were available for reuse; and never mind that I as a taxpayer have already funded the production of it. No, these organisations have all the safety and security of being public sector, but like to play (and pay) as if they were shareholder-funded. The upshot of this is that – technically – I’m breaking the law by Twittering the Shipping Forecast.

So, back to the “how”. The Met Office have made it fairly difficult to grab the data from their site, although I’m in two minds as to whether that’s because of their aforementioned grasping nature, or the fact that they’re yet another public sector Microsoft monoculture who knows little and cares less about web standards and playing nicely online. This means I have to grab the data from the BBC site, and they format it for viewing rather than parsing. Every six hours, a cron job kicks off which runs a Ruby script to grab the HTML page. This then gets squirted through Hpricot, a Ruby-based XML parser that discards the extraneous stuff and extracts the data that I’m after. The BBC use tables to display the forecast, which makes it relatively easy to grab the “meat” and drop the “sugar”.

Once I’ve got the data itself, it needs formatting. This involves iterating across the table to grab the forecast data for each area (or group of areas), and then stripping out extraneous content to squeeze it into 140 characters. Then finally it gets Tweeted using the Twitter4R library to talk directly to the Twitter API.

So, how well does it work, and what’s worth learning from all of this? Firstly, as a proof of concept for publishing microcontent, I think it works quite well. The process is quite simple, and the information pops up into a Twitter feed on a nicely regular basis. There’s a useful side effect of the cron job, as well, which is to act as a “heartbeat” for the server that it’s running on.

As far as the information being useful, though, it’s less successful. The Shipping Forecast is optimised for broadcast – it’s written to be read aloud rather than read from the page. This means that although there’s a standardised syntax, the format is less standardised – if several sea areas are going to experience the same weather, then they’ll be lumped together. Which is incidentally why phrases like “Dover, Wight, Portland, Plymouth” have such resonance to a Brit of a certain age. While it’s undeniably poetic, it’s a bugger to squeeze into 140 characters.

By contrast, aviation forecasts are very much more standardised, particularly automated services like METAR. “EGLL 130750Z 21006KT 7000 FEW011 BKN014 07/06 Q1009″ manages to convey “13th January at 0750 UTC – wind speed 6 knots at a direction of 210 degrees, 7km visibility, a few clouds at 11,000 feet and broken cloud at 14,000 feed, air pressure at ground level 1009mb, temperature 7 degrees Celsius, dewpoint 6 degrees Celcius” in 50 characters. And once you know what it means, it’s actually no less readable. And writing Twitter bots like @EGLL_METAR is comparatively simple.

Then there’s the sheer quantity of information to convey. Each area forecast covers wind, sea state, weather and visibility information – and uses descriptive terms rather than wholly-numeric values. While this makes it a lot more readable – “squally showers” is somewhat more descriptive than having to look up “#437″ or something in a table – it doesn’t make the Twitter bot’s life any easier. So in order to compress it all in, I’ve had to convert the forecasts into something rather less verbose than txt-speak – which somewhat defeats the object.

The key for decoding the content is here.

Another limitation is that the bot squirts out the whole forecast for all areas, rather than individual ones. This does tend to flood your timeline, particularly if you’re using a mobile device that will only display a limited number of Tweets per page. It’s mainly my fault, because I set up the @shippingcast account rather than a number of smaller areas. But then @dogger or @germanbight don’t have quite the same ring to them as @shippingcast. What I should do, if I ever get around to it, is to set up the bot so that you can “subscribe” to specific areas and then be DM’d with the relevant area forecasts.

Of course, this was mainly done for my own entertainment and education, so I’m not too bothered by the limitations. What’s more interesting is the way that an online service with an API can get exploited for purposes that would never have been envisaged at the outset – I’m pretty sure that the creators of Twitter wouldn’t have thought of the Shipping Forecast as a use case. It’s also been the starting point for some more “successful” bots (for a given definition of success) – my favourite is @riverthames, which tweets high and low tides to keep @towerbridge company. And there are many more “things” waiting to be animated – @thepips and @bigbenclock for a few.

13 January 2009

Technical

1 comment

links for 2009-01-12

  • Amazon Web Services Simple Storage Service (S3) is the solution for anyone wishing to store or deliver massive amounts of data without eating up precious bandwidth on their own server. Amazon S3 works great for your computer and web hosting needs.
    Getting started with Amazon S3

    Getting started with Amazon S3 is simple and takes no more than a few minutes. Once registered and confirmed, access to your account can be achieved using your account identifiers and an S3 client like S3Fox, S3Hub, or Transmit.

  • "FeedTools is a simple Ruby library for handling rss, atom, and cdf parsing, generation, and translation as well as caching. It attempts to adhere to Postel’s law—i.e. a liberal parsing and conservative generation policy.

    It’s ideal for parsing RSS feeds in Ruby on Rails applications and equally useful in just simple scripts. FeedTools can also create new feeds for you in only a few lines of code."

13 January 2009

Links

No comments yet