Scraping the Flood (or another reason to code)


The River Thames has spilled into the flood plain (which is what flood plains are for) and the local footpaths now look like this:

Thames flooding between Benson and Wallingford

Being generally inquisitive, and also wondering when I’ll be next able to walk into Wallingford, it would be useful to know whether the river is rising or falling. Is there an easier way that walking down to the river every couple of hours with a depth gauge?

Off the shelf

The UK environment agency publish data on river levels, including this page which has the level for the River Thames downstream from Benson Lock.

ThamesDownstreamAtBensonLock-Env1

As you can see in the screenshot, the page has the latest level recorded, a date and time for the reading, the typical range, and two high levels (the highest and a recent high). The page also includes a chart showing the last 48 hours of data.

ThamesDownstreamAtBensonLock-Env2

Unfortunately the scale on the chart makes it hard to see small changes, and the period of data is quite short relative to the slow rate at which the Thames responds to rainfall across its catchment area. Frustratingly, the image is published as an image so you can’t go poking around the data to work things out.

Limits

This is where we hit the limit of someone else’s solution. What do I do if I want the trend over more than 48 hours or if I want to see what’s hidden in that straight looking line around 5 metres?

I could ask the environment agency…

…and wait. I’m sure they have far more important things to do.

…or I could roll my own solution based on the data they publish.

Which is exactly what I have here:

ThamesDownstreamAtBensonLock

In this plot you can see the river levels at points since December 24th and also clearly see that the level peaked on the 28th and has been dropping since then.

Now you could check the web page manually every few hours, and write down the information or copy it into a spread sheet, or you could spend less time writing a little app to do that for you. 

That’s the power of code. When you can code you’re not bound by someone else’s solution, you’re not constrained by the software some anonymous person decided would suffice, you’re not dependent on there being a financial case for someone to create a tool for the job. If it’s important enough, you can roll your own.

The custom solution

How did I do it?

It’s a little Python script that uses beautiful soup to scrape the page published by the environment agency. It grabs the level measured and the time of the measurement and writes them into a text file. I can then use a standard application (in this case Excel) to plot a nice chart from the data.

Here’s some detail.

First open the URL for the page and create a Beautiful Soup object that represents the whole page:

page = urllib2.urlopen(url)
soup = BeautifulSoup(page)

Now dig into the page to find the paragraphs contained within certain classes of divs:

for divTag in soup.findAll("div" , {"class":"bl"}):
    for plainText in divTag.findAll(True, {"class":"plain_text"}):
        for paraText in plainText.findAll("p"):

And finally do a regular expression match to pull out the data we want, in this case the river level:

levelStart = re.compile('The river level at .* is ')
levelEnd = re.compile('metres')
# find river level in meters
matchStart = levelStart.search(paraText)
   matchEnd = levelEnd.search(paraText)
   if(matchStart != None):
        if(matchEnd != None):
            start = matchStart.end()
            end = matchEnd.start()
            level = paraText[start:end]

 

Easy.

If your eyes glazed over but you’re intrigued; you might want to check out code with Python in 3 steps.

Notes

The graph in this post Contains Environment Agency information © Environment Agency and database right. For further information see http://www.environment-agency.gov.uk/help/35768.aspx

Advertisements

One thought on “Scraping the Flood (or another reason to code)

  1. Pingback: Measuring Work Life Balance | TeachGeek

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s