Programming The World (Part 2)

Data From Other Sources

Nick Efford

Introduction

Part 1 of this series looked at how devices in the 'Internet of Things' can sense their surroundings and make sensor measurements available over the web in formats such as CSV, XML and JSON. The same formats are used to publish data from a variety of other sources. This article gives a few examples of these other sources.

Data From The BBC

The British Broadcasting Corporation is a public body established by Royal Charter. As such it has a number of obligations, among them the expectation that it will deliver to the public the benefit of emerging communication technologies and services. Part and parcel of this is a commitment to linked data - which, in practice, means that the BBC is attempting to provide machine-readable data on its radio and TV programmes via the web.

Visit http://www.bbc.co.uk/programmes/developers and you will see details of the BBC's approach. This page describes the addressing scheme that the BBC has devised for publishing programme data and provides links to a couple of examples: an XML data feed giving the schedule for Radio 1 in England and a JSON data feed giving upcoming Sci-Fi programmes on TV. Try both of these out now. The screenshot below shows a portion of XML data from the first of them, as displayed by the Chrome browser.

Screenshot of XML representing the BBC Radio 1 schedule

Another interesting BBC data feed is this one, which breaks down radioplay by artist across the BBC's radio stations:

http://www.bbc.co.uk/programmes/music/artists/charts.json

(The same data can be retrieved as XML simply by replacing json with xml in the URL above.)

Earth & Environmental Data

The Met Office

The Met Office is the UK's national weather forecasting service. It is currently beta-testing a service called DataPoint, which it describes thus:

DataPoint is a way of accessing freely available Met Office data feeds in a format that is suitable for application developers. It is aimed at professionals, the scientific community and student or amateur developers, in fact anyone looking to re-use Met Office data within their own innovative applications.

DataPoint offers a wide range of useful meteorological data. For example, it provides a five-day forecast of temperature, wind speed & direction, precipitation and other variables for specific locations in the UK, either as a visualisation like that shown below or as raw data in XML or JSON formats.

Met Office weather forecast

Map overlays from Met Office DataPoint

DataPoint also provides map layers in the PNG image format for both weather forecasts and actual observations. Forecast layers show cloud cover, rainfall, temperature and pressure as isobars. Observation layers show rainfall, lightning storms, and satellite images in the visible and IR regions of the spectrum. Layer retrieval is a two-stage process in which you must first request details of all the available layers, in either XML or JSON formats. This information can then be used to construct the specific URL of the desired layer. The example shown here is a composite of a visible-spectrum satellite image with layers showing forecasted rainfall and pressure.

One important thing to note about DataPoint is that users must register with the service in order to obtain an API key. This is a unique string of characters that identifies you as a legitimate user of the service. It is used for authentication purposes and to track your usage of the service. All requests made to DataPoint must include your API key.

API keys are actually a fairly common requirement for use of web services. ThingSpeak, discussed in Part 1, requires one. Many services will provide an API key for free but will limit the number of times that you can invoke the service free of charge; for example, forecast.io will allow you to make up to 1,000 API calls per day for free but will charge you $1 per 10,000 calls thereafter.

USGS Earthquake Hazards Program

The United States Geological Survey's Earthquake Hazards Program is one of my favourite data source examples. Their website provides comprehensive real-time feeds of seismological data in a variety of different formats, as the screenshot below illustrates.

USGS earthquake data feeds

The Spreadsheet Applications link on this page takes you to another page containing various CSV data feeds. The Atom Syndication and QuakeML links are for two different XML-based formats, the former being for consumption by RSS readers and the latter for professional geoscientists. The Programmatic Access link is for a JSON-based format called GeoJSON.

Links for earthquake data feeds

For each format, feeds are grouped by time, covering the past hour, past day, past 7 days and past 30 days. In each of these groups there is an 'all earthquakes' feed plus separate feeds for different levels of severity, covering 'significant' earthquakes and those with magnitudes of 4.5 or more, 2.5 or more, 1.0 or more.

The quantity of data that you obtain from these feeds will very much depend on which one you choose; for example, the feed for significant earthquakes occurring in the past hour will be empty most of the time, whereas the feed for all earthquakes from the last 30 days will typically give you many thousands of events each time that you access it.

A GeoJSON feed is a list of seismic events, each of which is represented as shown below. A glossary explains what the various data fields mean. (A few of them have been omitted in the interests of clarity.) This particular example is for the magnitude 7.4 quake that occurred near the Solomon Islands on 13 April 2014.

{
  "type": "Feature",
  "properties": {
    "mag": 7.4,
    "place": "111km S of Kirakira, Solomon Islands",
    "time": 1397392578710,
    "updated": 1397421536312,
    "tz": 660,
    "felt": null,
    "cdi": null,
    "mmi": 7.51,
    "alert": "green",
    "status": "reviewed",
    "tsunami": 1,
    "sig": 842,
    "net": "us",
    "nst": null,
    "dmin": 2.89,
    "rms": 1.06,
    "gap": 17,
    "magType": "mww",
    "type": "earthquake",
    "title": "M 7.4 - 111km S of Kirakira, Solomon Islands"
  },
  "geometry": {
    "type": "Point",
    "coordinates": [162.0692, -11.451, 35]
  },
  "id": "usc000piqj"
}

Open Data Initiatives

Data.gov.uk is at the heart of the UK government's Transparency agenda and currently (April 2014) makes almost 14,000 datasets available to the public. You can search for a dataset by keyword or conduct geographic searches based on postcode, latitude & longitude or a rectangular region dragged out on a map. You can also drill down via menus that classify datasets by license, theme or format. CSV and XML are well represented, but there are comparatively few JSON datasets currently available. Note that 'open' does not necessarily imply 'easily machine readable'; some of the datasets are provided only as Excel spreadsheets, PDF files or Microsoft Word documents, for example - formats that can be much harder to process using software.

The screenshot below shows the most popular health-related CSV datasets available from the site. You can view this page yourself by visiting

http://data.gov.uk/data/search?theme-primary=Health&res_format=CSV

CSV health data from data.gov.uk

Open data is having an impact at the local level, too. A good example is the recently-established Leeds Data Mill, promoted as "a place for organisations to share their open data to change the way we live, work and play in the city".

Leeds Data Mill's small but growing collection of datasets includes locations and number of available spaces in council car parks, details of completed and live roadworks in the city and footfall data for eight locations in the city centre.

Leeds roadworks data from Leeds Data Mill

Continued in Part 3...