<?xml version="1.0" encoding="UTF-8"?>
<rss version='2.0' xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Nick Efford</title>
    <description></description>
    <link>https://nickefford.silvrback.com/feed</link>
    <atom:link href="https://nickefford.silvrback.com/feed" rel="self" type="application/rss+xml"/>
    <category domain="nickefford.silvrback.com">Content Management/Blog</category>
    <language>en-us</language>
      <pubDate>Thu, 24 Apr 2014 02:40:57 -1200</pubDate>
    <managingEditor>nick.efford@gmail.com (Nick Efford)</managingEditor>
      <item>
        <guid>https://nickefford.silvrback.com/programming-the-world-part-3#2660</guid>
          <pubDate>Thu, 24 Apr 2014 02:40:57 -1200</pubDate>
        <link>https://nickefford.silvrback.com/programming-the-world-part-3</link>
        <title>Programming The World (Part 3)</title>
        <description>Data Acquisition &amp; Processing</description>
        <content:encoded><![CDATA[<h1 id="introduction">Introduction</h1>

<p><a href="https://nickefford.silvrback.com/programming-the-world-part-1">Part 1</a> and <a href="https://nickefford.silvrback.com/programming-the-world-part-2">Part 2</a> of this series showed that the web is a rich source of open data in formats such as CSV, JSON and XML. But how do we get data from these sources into a program? How do we deal with the formatting and extract what we need from the data? This final part of the series shows how these goals can be achieved using the <a href="http://www.python.org">Python</a> programming language.</p>

<p>Note that Python 3 is used here, rather than the older (and <a href="https://plus.google.com/+CoreyGoldberg/posts/ZM3Tcswhaii">increasingly obsolete</a>) Python 2. Note also that other languages besides Python can do these things, with comparable ease in some cases. Python is used here because it is a good choice for such tasks and because it also happens to be my favourite programming language!</p>

<h1 id="reading-data-from-the-web">Reading Data From The Web</h1>

<p>This can be done using the <code>urlopen</code> function from the <code>urllib.request</code> module in Python&#39;s standard library (see the <a href="https://docs.python.org/3/library/urllib.request.html">official module documentation</a> for full details). To make a basic HTTP GET request, which is typically what you&#39;ll need to access data, this function requires just one argument: the URL of the resource being accessed. It returns an <code>HTTPResponse</code> object from which we can read the data.</p>

<p>Calling the <code>read</code> method of the <code>HTTPResponse</code> object returns the data as a Python <code>bytes</code> object - essentially a string of bytes. No assumptions are made about the nature of the data. Thus, if you were expecting a text-based format such as CSV, XML or JSON, you must do the translation of bytes to text yourself by calling the <code>decode</code> method on the string of bytes. The <code>decode</code> method accepts an encoding scheme as an optional argument, defaulting to <a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8</a> if one isn&#39;t specified. This default will be suitable in most cases.</p>

<p>All of this leads to code like the following example, which acquires CSV data for earthquakes from the <a href="http://earthquake.usgs.gov/">USGS Earthquake Hazards Program website</a> discussed in <a href="https://nickefford.silvrback.com/programming-the-world-part-2">Part 2</a>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">urllib.request</span> <span class="k">import</span> <span class="n">urlopen</span>

<span class="c1"># Construct feed URL for M4.5+ quakes in the past 7 days</span>

<span class="n">base</span> <span class="o">=</span> <span class="s2">&quot;http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/&quot;</span>
<span class="n">feed_url</span> <span class="o">=</span> <span class="n">base</span> <span class="o">+</span> <span class="s2">&quot;4.5_week.csv&quot;</span>

<span class="c1"># Open URL and read text from it</span>

<span class="n">source</span> <span class="o">=</span> <span class="n">urlopen</span><span class="p">(</span><span class="n">feed_url</span><span class="p">)</span>
<span class="n">text</span> <span class="o">=</span> <span class="n">source</span><span class="o">.</span><span class="n">read</span><span class="p">()</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span>
</pre></div>
<h1 id="handling-csv">Handling CSV</h1>

<p>You now have a program that reads one of the earthquake CSV data feeds into the program as a single string of text. Imagine that you wish to process this dataset in order to find the mean and standard deviation of earthquake depths. An examination of the data feed will tell you that the fourth column holds the depth values you need.</p>

<p><img alt="Earthquake CSV data imported into LibreOffice Calc" src="https://silvrback.s3.amazonaws.com/uploads/e341be0d-d166-406c-afb7-a6c93b2c7df2/quakecsv_large.png" /></p>

<p>Extraction of the depth values can be done using the <code>csv</code> module from Python&#39;s standard library (see the <a href="https://docs.python.org/3/library/csv.html">official module documentation</a> for full details). If you are running Python 3.4 or newer, the standard library also has a module called <code>statistics</code> that makes computation of the mean and standard deviation trivial.</p>

<p>One approach is to create a reader object that can scan the lines in the text string containing all of the data. Calling the <code>splitlines</code> method on the string will return these lines in a list, which is then used to create the reader object. Iterating over the reader object will give us first the column headings (if present), then each record from the dataset. The record will be a list of string values, since the reader object doesn&#39;t know how the data should be interpreted. Depths will therefore need to be converted into floating-point values before being collected into a list. This list of values can then be passed to functions from the <code>statistics</code> module that compute mean and standard deviation.</p>

<p>Suitable code to do all this is shown below. (The code to read the data has been omitted but you should imagine it to be at the location of the ...)</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">csv</span>
<span class="kn">import</span> <span class="nn">statistics</span>

<span class="o">...</span>

<span class="c1"># Create reader for the dataset</span>

<span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">reader</span><span class="p">(</span><span class="n">text</span><span class="o">.</span><span class="n">splitlines</span><span class="p">())</span>

<span class="c1"># Read column headings (which are not used here)</span>

<span class="n">headings</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">reader</span><span class="p">)</span>

<span class="c1"># Fetch each record and collect depths into a list</span>

<span class="n">depths</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">record</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">:</span>
    <span class="c1"># Depth is fourth value, at index 3</span>
    <span class="c1"># Value is a string and must be converted to a float</span>
    <span class="n">depth</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">record</span><span class="p">[</span><span class="mi">3</span><span class="p">])</span>
    <span class="n">depths</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">depth</span><span class="p">)</span>

<span class="c1"># Compute mean &amp; standard deviation of depths</span>

<span class="n">mean</span> <span class="o">=</span> <span class="n">statistics</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">depths</span><span class="p">)</span>
<span class="n">stdev</span> <span class="o">=</span> <span class="n">statistics</span><span class="o">.</span><span class="n">stdev</span><span class="p">(</span><span class="n">depths</span><span class="p">,</span> <span class="n">mean</span><span class="p">)</span>

<span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Mean    =&quot;</span><span class="p">,</span> <span class="n">mean</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Std dev =&quot;</span><span class="p">,</span> <span class="n">stdev</span><span class="p">)</span>
</pre></div>
<p>An alternative and slightly more user-friendly approach is to use a <code>DictReader</code> object. Unlike the normal reader object provided by the <code>csv</code> module, which gives you each record as a list of strings, a <code>DictReader</code> object will give you each record as a dictionary, in which the keys are the column headings. Here&#39;s an example generated from the earthquake data feed:</p>
<div class="highlight"><pre><span></span><span class="p">{</span><span class="s1">&#39;depth&#39;</span><span class="p">:</span> <span class="s1">&#39;11.4&#39;</span><span class="p">,</span> <span class="s1">&#39;dmin&#39;</span><span class="p">:</span> <span class="s1">&#39;2.379&#39;</span><span class="p">,</span> <span class="s1">&#39;time&#39;</span><span class="p">:</span> <span class="s1">&#39;2014-04-24T03:10:12.880Z&#39;</span><span class="p">,</span> <span class="s1">&#39;updated&#39;</span><span class="p">:</span> <span class="s1">&#39;2014-04-24T09:31:17.990Z&#39;</span><span class="p">,</span> <span class="s1">&#39;net&#39;</span><span class="p">:</span> <span class="s1">&#39;us&#39;</span><span class="p">,</span> <span class="s1">&#39;id&#39;</span><span class="p">:</span> <span class="s1">&#39;usb000px6r&#39;</span><span class="p">,</span> <span class="s1">&#39;nst&#39;</span><span class="p">:</span> <span class="s1">&#39;&#39;</span><span class="p">,</span> <span class="s1">&#39;rms&#39;</span><span class="p">:</span> <span class="s1">&#39;1.18&#39;</span><span class="p">,</span> <span class="s1">&#39;mag&#39;</span><span class="p">:</span> <span class="s1">&#39;6.6&#39;</span><span class="p">,</span> <span class="s1">&#39;magType&#39;</span><span class="p">:</span> <span class="s1">&#39;mww&#39;</span><span class="p">,</span> <span class="s1">&#39;place&#39;</span><span class="p">:</span> <span class="s1">&#39;94km S of Port Hardy, Canada&#39;</span><span class="p">,</span> <span class="s1">&#39;latitude&#39;</span><span class="p">:</span> <span class="s1">&#39;49.8459&#39;</span><span class="p">,</span> <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="s1">&#39;earthquake&#39;</span><span class="p">,</span> <span class="s1">&#39;longitude&#39;</span><span class="p">:</span> <span class="s1">&#39;-127.444&#39;</span><span class="p">,</span> <span class="s1">&#39;gap&#39;</span><span class="p">:</span> <span class="s1">&#39;41&#39;</span><span class="p">}</span>
</pre></div>
<p>If <code>DictReader</code> is used then the code needed to build a list of earthquake depths will change to something like this:</p>
<div class="highlight"><pre><span></span><span class="n">reader</span> <span class="o">=</span> <span class="n">csv</span><span class="o">.</span><span class="n">DictReader</span><span class="p">(</span><span class="n">text</span><span class="o">.</span><span class="n">splitlines</span><span class="p">())</span>

<span class="n">depths</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">record</span> <span class="ow">in</span> <span class="n">reader</span><span class="p">:</span>
    <span class="n">depth</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">record</span><span class="p">[</span><span class="s2">&quot;depth&quot;</span><span class="p">])</span>
    <span class="n">depths</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">depth</span><span class="p">)</span>
</pre></div>
<p>Note how &quot;depth&quot; is used to look up the depth value instead of an integer index. This makes the code a little easier to understand.</p>

<h1 id="handling-json">Handling JSON</h1>

<p>This can be done using the <code>json</code> module from Python&#39;s standard library (see the <a href="https://docs.python.org/3/library/json.html">official module documentation</a> for full details). Let us consider how this module can be used to find artists who have been played more than ten times in the past week on BBC radio stations, using the JSON data feed that was mentioned in <a href="https://nickefford.silvrback.com/programming-the-world-part-2">Part 2</a>.</p>

<p>The first step is to read data from the feed into a single string of text, as discussed above. This string can then be passed to the <code>loads</code> function from the <code>json</code> module, which <a href="http://en.wikipedia.org/wiki/Serialization">deserializes</a> the JSON dataset, returning it as a dictionary. Here is the code that you need:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">urllib.request</span> <span class="k">import</span> <span class="n">urlopen</span>

<span class="n">feed_url</span> <span class="o">=</span> <span class="s2">&quot;http://www.bbc.co.uk/programmes/music/artists/charts.json&quot;</span>

<span class="c1"># Open URL and read text from it</span>

<span class="n">source</span> <span class="o">=</span> <span class="n">urlopen</span><span class="p">(</span><span class="n">feed_url</span><span class="p">)</span>
<span class="n">text</span> <span class="o">=</span> <span class="n">source</span><span class="o">.</span><span class="n">read</span><span class="p">()</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span>

<span class="c1"># Deserialize the JSON data contained in the text</span>

<span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
</pre></div>
<p>The dictionary will have the following format. (Note: this is real data, but records for only the first three artists are shown here.)</p>
<div class="highlight"><pre><span></span><span class="p">{</span>
  <span class="s2">&quot;artists_chart&quot;</span> <span class="p">:</span> <span class="p">{</span>
    <span class="s2">&quot;artists&quot;</span> <span class="p">:</span> <span class="p">[</span>
      <span class="p">{</span>
        <span class="s2">&quot;plays&quot;</span> <span class="p">:</span> <span class="mi">17</span><span class="p">,</span>
        <span class="s2">&quot;name&quot;</span> <span class="p">:</span> <span class="s2">&quot;Drake&quot;</span><span class="p">,</span>
        <span class="s2">&quot;previous_plays&quot;</span> <span class="p">:</span> <span class="mi">15</span><span class="p">,</span>
        <span class="s2">&quot;gid&quot;</span> <span class="p">:</span> <span class="s2">&quot;9fff2f8a-21e6-47de-a2b8-7f449929d43f&quot;</span>
      <span class="p">},</span>
      <span class="p">{</span>
        <span class="s2">&quot;plays&quot;</span> <span class="p">:</span> <span class="mi">16</span><span class="p">,</span>
        <span class="s2">&quot;name&quot;</span> <span class="p">:</span> <span class="s2">&quot;Nas&quot;</span><span class="p">,</span>
        <span class="s2">&quot;previous_plays&quot;</span> <span class="p">:</span> <span class="mi">4</span><span class="p">,</span>
        <span class="s2">&quot;gid&quot;</span> <span class="p">:</span> <span class="s2">&quot;cfbc0924-0035-4d6c-8197-f024653af823&quot;</span>
      <span class="p">},</span>
      <span class="p">{</span>
        <span class="s2">&quot;plays&quot;</span> <span class="p">:</span> <span class="mi">15</span><span class="p">,</span>
        <span class="s2">&quot;name&quot;</span> <span class="p">:</span> <span class="s2">&quot;David Bowie&quot;</span><span class="p">,</span>
        <span class="s2">&quot;previous_plays&quot;</span> <span class="p">:</span> <span class="mi">12</span><span class="p">,</span>
        <span class="s2">&quot;gid&quot;</span> <span class="p">:</span> <span class="s2">&quot;5441c29d-3602-4898-b1a1-b77fa23b8e50&quot;</span>
      <span class="p">},</span>

    <span class="p">],</span>
    <span class="s2">&quot;period&quot;</span> <span class="p">:</span> <span class="s2">&quot;Past 7 days&quot;</span><span class="p">,</span>
    <span class="s2">&quot;end&quot;</span> <span class="p">:</span> <span class="s2">&quot;2014-04-24&quot;</span><span class="p">,</span>
    <span class="s2">&quot;start&quot;</span> <span class="p">:</span> <span class="s2">&quot;2014-04-17&quot;</span>
  <span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>You can see from this example that keys of &quot;artists_chart&quot; and &quot;artists&quot; are required in order to access the list of artist details. Each element of this list is itself a dictionary in which artist name and play count can be accessed using keys called &quot;name&quot; and &quot;plays&quot;, respectively.  This leads us to the following code:</p>
<div class="highlight"><pre><span></span><span class="o">...</span>

<span class="n">artists</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="s2">&quot;artists_chart&quot;</span><span class="p">][</span><span class="s2">&quot;artists&quot;</span><span class="p">]</span>

<span class="k">for</span> <span class="n">artist</span> <span class="ow">in</span> <span class="n">artists</span><span class="p">:</span>
    <span class="k">if</span> <span class="n">artist</span><span class="p">[</span><span class="s2">&quot;plays&quot;</span><span class="p">]</span> <span class="o">&gt;</span> <span class="mi">10</span><span class="p">:</span>
        <span class="nb">print</span><span class="p">(</span><span class="n">artist</span><span class="p">[</span><span class="s2">&quot;name&quot;</span><span class="p">],</span> <span class="n">artist</span><span class="p">[</span><span class="s2">&quot;plays&quot;</span><span class="p">])</span>
</pre></div>
<h1 id="simplifying-things-with-requests">Simplifying Things With Requests</h1>

<p>If you are willing and able to install third-party Python packages on your system, Kenneth Reitz&#39;s excellent <a href="http://docs.python-requests.org/en/latest/">Requests</a> library can be used to simplify things considerably. Requests has a much cleaner API for issuing HTTP GET requests like those used in the preceding examples. It also greatly simplifies POST requests, file uploading and authentication. It even has built-in JSON deserialization capabilities.</p>

<p>Using Requests, the first six lines of code in the JSON example can be replaced with four simpler lines:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">requests</span>

<span class="n">feed_url</span> <span class="o">=</span> <span class="s2">&quot;http://www.bbc.co.uk/programmes/music/artists/charts.json&quot;</span>

<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">feed_url</span><span class="p">)</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">json</span><span class="p">()</span>

<span class="o">...</span>
</pre></div>
<h1 id="thats-all-folks">That&#39;s All Folks!</h1>

<p>The source code for this article&#39;s examples is available in a <a href="https://bitbucket.org/pythoneer/progworld3">Bitbucket repository</a>.</p>

<p>I hope you&#39;ve found this series of articles useful; feel free to get in touch if you have questions or comments!</p>
]]></content:encoded>
      </item>
      <item>
        <guid>https://nickefford.silvrback.com/programming-the-world-part-2#2659</guid>
          <pubDate>Mon, 14 Apr 2014 00:14:57 -1200</pubDate>
        <link>https://nickefford.silvrback.com/programming-the-world-part-2</link>
        <title>Programming The World (Part 2)</title>
        <description>Data From Other Sources</description>
        <content:encoded><![CDATA[<h1 id="introduction">Introduction</h1>

<p><a href="https://nickefford.silvrback.com/programming-the-world-part-1">Part 1</a> of this series looked at how devices in the &#39;Internet of Things&#39; can sense their surroundings and make sensor measurements available over the web in formats such as CSV, XML and JSON. The same formats are used to publish data from a variety of other sources. This article gives a few examples of these other sources.</p>

<h1 id="data-from-the-bbc">Data From The BBC</h1>

<p>The <a href="http://www.bbc.co.uk">British Broadcasting Corporation</a> is a public body established by Royal Charter. As such it has a number of obligations, among them the expectation that it will <a href="http://www.bbc.co.uk/aboutthebbc/insidethebbc/whoweare/publicpurposes/communication.html">deliver to the public the benefit of emerging communication technologies and services</a>. Part and parcel of this is a commitment to <a href="http://en.wikipedia.org/wiki/Linked_data">linked data</a> - which, in practice, means that the BBC is attempting to provide machine-readable data on its radio and TV programmes via the web.</p>

<p>Visit <a href="http://www.bbc.co.uk/programmes/developers">http://www.bbc.co.uk/programmes/developers</a> and you will see details of the BBC&#39;s approach. This page describes the addressing scheme that the BBC has devised for publishing programme data and provides links to a couple of examples: an <a href="http://www.bbc.co.uk/radio1/programmes/schedules/england.xml">XML data feed giving the schedule for Radio 1 in England</a> and a <a href="http://www.bbc.co.uk/tv/programmes/genres/drama/scifiandfantasy/schedules/upcoming.json">JSON data feed giving upcoming Sci-Fi programmes on TV</a>. Try both of these out now. The screenshot below shows a portion of XML data from the first of them, as displayed by the Chrome browser.</p>

<p><img alt="Screenshot of XML representing the BBC Radio 1 schedule" src="https://silvrback.s3.amazonaws.com/uploads/a89523cf-14de-44d4-a66b-7384350263c3/bbc2_large.png" /></p>

<p>Another interesting BBC data feed is this one, which breaks down radioplay by artist across the BBC&#39;s radio stations:</p>

<p><a href="http://www.bbc.co.uk/programmes/music/artists/charts.json">http://www.bbc.co.uk/programmes/music/artists/charts.json</a></p>

<p>(The same data can be retrieved as XML simply by replacing <code>json</code> with <code>xml</code> in the URL above.)</p>

<h1 id="earth-environmental-data">Earth &amp; Environmental Data</h1>

<h3 id="the-met-office">The Met Office</h3>

<p><a href="http://www.metoffice.gov.uk">The Met Office</a> is the UK&#39;s national weather forecasting service.  It is currently beta-testing a service called <a href="http://www.metoffice.gov.uk/datapoint/">DataPoint</a>, which it describes thus:</p>

<blockquote>
<p>DataPoint is a way of accessing freely available Met Office data feeds in a format that is suitable for application developers. It is aimed at professionals, the scientific community and student or amateur developers, in fact anyone looking to re-use Met Office data within their own innovative applications.</p>
</blockquote>

<p>DataPoint offers a wide range of useful meteorological data. For example, it provides a five-day forecast of temperature, wind speed &amp; direction, precipitation and other variables for specific locations in the UK, either as a visualisation like that shown below or as raw data in XML or JSON formats.</p>

<p><img alt="Met Office weather forecast" src="https://silvrback.s3.amazonaws.com/uploads/0d4e1c90-573b-448c-98b2-0592d5e56db8/metoffice_large.png" /></p>

<p><img alt="Map overlays from Met Office DataPoint" class="sb_float" src="https://silvrback.s3.amazonaws.com/uploads/1e059073-e6a8-4fb2-8b79-194567e43430/combined2_large.png" /></p>

<p>DataPoint also provides map layers in the <a href="http://en.wikipedia.org/wiki/Portable_Network_Graphics">PNG image format</a> for both weather forecasts and actual observations. Forecast layers show cloud cover, rainfall, temperature and pressure as isobars. Observation layers show rainfall, lightning storms, and satellite images in the visible and IR regions of the spectrum. Layer retrieval is a two-stage process in which you must first request details of all the available layers, in either XML or JSON formats. This information can then be used to construct the specific URL of the desired layer. The example shown here is a composite of a visible-spectrum satellite image with layers showing forecasted rainfall and pressure.</p>

<p>One important thing to note about DataPoint is that users must register with the service in order to obtain an <strong>API key</strong>. This is a unique string of characters that identifies you as a legitimate user of the service.  It is used for authentication purposes and to track your usage of the service. All requests made to DataPoint must include your API key.</p>

<p>API keys are actually a fairly common requirement for use of web services. ThingSpeak, discussed in <a href="https://nickefford.silvrback.com/programming-the-world-part-1">Part 1</a>, requires one. Many services will provide an API key for free but will limit the number of times that you can invoke the service free of charge; for example, <a href="https://developer.forecast.io/">forecast.io</a> will allow you to make up to 1,000 API calls per day for free but will charge you $1 per 10,000 calls thereafter.</p>

<h3 id="usgs-earthquake-hazards-program">USGS Earthquake Hazards Program</h3>

<p>The <a href="http://earthquake.usgs.gov">United States Geological Survey&#39;s Earthquake Hazards Program</a> is one of my favourite data source examples. Their website provides comprehensive <a href="http://earthquake.usgs.gov/earthquakes/feed/v1.0/">real-time feeds</a> of seismological data in a variety of different formats, as the screenshot below illustrates.</p>

<p><img alt="USGS earthquake data feeds" src="https://silvrback.s3.amazonaws.com/uploads/06418fdd-8a2c-4089-b7a1-e91828f96efd/usgs_large.png" /></p>

<p>The <a href="http://earthquake.usgs.gov/earthquakes/feed/v1.0/csv.php"><em>Spreadsheet Applications</em></a> link on this page takes you to another page containing various CSV data feeds. The <a href="http://earthquake.usgs.gov/earthquakes/feed/v1.0/atom.php"><em>Atom Syndication</em></a> and <a href="http://earthquake.usgs.gov/earthquakes/feed/v1.0/quakeml.php"><em>QuakeML</em></a> links are for two different XML-based formats, the former being for consumption by <a href="http://en.wikipedia.org/wiki/RSS_reader">RSS readers</a> and the latter for professional geoscientists. The <a href="http://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php"><em>Programmatic Access</em></a> link is for a JSON-based format called GeoJSON.</p>

<p><img alt="Links for earthquake data feeds" class="sb_float" src="https://silvrback.s3.amazonaws.com/uploads/fe58d819-8c47-4b65-9779-1d6f0e9f8e33/feeds_large.png" /></p>

<p>For each format, feeds are grouped by time, covering the past hour, past day, past 7 days and past 30 days. In each of these groups there is an &#39;all earthquakes&#39; feed plus separate feeds for different levels of severity, covering &#39;significant&#39; earthquakes and those with magnitudes of 4.5 or more, 2.5 or more, 1.0 or more.</p>

<p>The quantity of data that you obtain from these feeds will very much depend on which one you choose; for example, the feed for significant earthquakes occurring in the past hour will be empty most of the time, whereas the feed for all earthquakes from the last 30 days will typically give you many thousands of events each time that you access it.</p>

<p>A GeoJSON feed is a list of seismic events, each of which is represented as shown below. A <a href="http://earthquake.usgs.gov/earthquakes/feed/v1.0/glossary.php">glossary</a> explains what the various data fields mean. (A few of them have been omitted in the interests of clarity.) This particular example is for the magnitude 7.4 quake that occurred near the Solomon Islands on 13 April 2014.</p>
<div class="highlight"><pre><span></span><span class="p">{</span>
  <span class="nt">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;Feature&quot;</span><span class="p">,</span>
  <span class="nt">&quot;properties&quot;</span><span class="p">:</span> <span class="p">{</span>
    <span class="nt">&quot;mag&quot;</span><span class="p">:</span> <span class="mf">7.4</span><span class="p">,</span>
    <span class="nt">&quot;place&quot;</span><span class="p">:</span> <span class="s2">&quot;111km S of Kirakira, Solomon Islands&quot;</span><span class="p">,</span>
    <span class="nt">&quot;time&quot;</span><span class="p">:</span> <span class="mi">1397392578710</span><span class="p">,</span>
    <span class="nt">&quot;updated&quot;</span><span class="p">:</span> <span class="mi">1397421536312</span><span class="p">,</span>
    <span class="nt">&quot;tz&quot;</span><span class="p">:</span> <span class="mi">660</span><span class="p">,</span>
    <span class="nt">&quot;felt&quot;</span><span class="p">:</span> <span class="kc">null</span><span class="p">,</span>
    <span class="nt">&quot;cdi&quot;</span><span class="p">:</span> <span class="kc">null</span><span class="p">,</span>
    <span class="nt">&quot;mmi&quot;</span><span class="p">:</span> <span class="mf">7.51</span><span class="p">,</span>
    <span class="nt">&quot;alert&quot;</span><span class="p">:</span> <span class="s2">&quot;green&quot;</span><span class="p">,</span>
    <span class="nt">&quot;status&quot;</span><span class="p">:</span> <span class="s2">&quot;reviewed&quot;</span><span class="p">,</span>
    <span class="nt">&quot;tsunami&quot;</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
    <span class="nt">&quot;sig&quot;</span><span class="p">:</span> <span class="mi">842</span><span class="p">,</span>
    <span class="nt">&quot;net&quot;</span><span class="p">:</span> <span class="s2">&quot;us&quot;</span><span class="p">,</span>
    <span class="nt">&quot;nst&quot;</span><span class="p">:</span> <span class="kc">null</span><span class="p">,</span>
    <span class="nt">&quot;dmin&quot;</span><span class="p">:</span> <span class="mf">2.89</span><span class="p">,</span>
    <span class="nt">&quot;rms&quot;</span><span class="p">:</span> <span class="mf">1.06</span><span class="p">,</span>
    <span class="nt">&quot;gap&quot;</span><span class="p">:</span> <span class="mi">17</span><span class="p">,</span>
    <span class="nt">&quot;magType&quot;</span><span class="p">:</span> <span class="s2">&quot;mww&quot;</span><span class="p">,</span>
    <span class="nt">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;earthquake&quot;</span><span class="p">,</span>
    <span class="nt">&quot;title&quot;</span><span class="p">:</span> <span class="s2">&quot;M 7.4 - 111km S of Kirakira, Solomon Islands&quot;</span>
  <span class="p">},</span>
  <span class="nt">&quot;geometry&quot;</span><span class="p">:</span> <span class="p">{</span>
    <span class="nt">&quot;type&quot;</span><span class="p">:</span> <span class="s2">&quot;Point&quot;</span><span class="p">,</span>
    <span class="nt">&quot;coordinates&quot;</span><span class="p">:</span> <span class="p">[</span><span class="mf">162.0692</span><span class="p">,</span> <span class="mf">-11.451</span><span class="p">,</span> <span class="mi">35</span><span class="p">]</span>
  <span class="p">},</span>
  <span class="nt">&quot;id&quot;</span><span class="p">:</span> <span class="s2">&quot;usc000piqj&quot;</span>
<span class="p">}</span>
</pre></div>
<h1 id="open-data-initiatives">Open Data Initiatives</h1>

<p><a href="http://data.gov.uk">Data.gov.uk</a> is at the heart of the UK government&#39;s Transparency agenda and currently (April 2014) makes almost 14,000 datasets available to the public. You can search for a dataset by keyword or conduct geographic searches based on postcode, latitude &amp; longitude or a rectangular region dragged out on a map. You can also drill down via menus that classify datasets by license, theme or format. CSV and XML are well represented, but there are comparatively few JSON datasets currently available. Note that &#39;open&#39; does not necessarily imply &#39;easily machine readable&#39;; some of the datasets are provided only as Excel spreadsheets, PDF files or Microsoft Word documents, for example - formats that can be much harder to process using software.</p>

<p>The screenshot below shows the most popular health-related CSV datasets available from the site.  You can view this page yourself by visiting</p>

<p><a href="http://data.gov.uk/data/search?theme-primary=Health&res_format=CSV">http://data.gov.uk/data/search?theme-primary=Health&amp;res_format=CSV</a></p>

<p><img alt="CSV health data from data.gov.uk" src="https://silvrback.s3.amazonaws.com/uploads/98d5c8ba-5d23-42cf-876a-6588bc4b7ce0/gov_large.png" /></p>

<p>Open data is having an impact at the local level, too. A good example is the recently-established <a href="http://leedsdatamill.org">Leeds Data Mill</a>, promoted as &quot;a place for organisations to share their open data to change the way we live, work and play in the city&quot;.</p>

<p>Leeds Data Mill&#39;s small but growing collection of datasets includes <a href="http://leedsdatamill.org/dataset/council-car-parks">locations and number of available spaces in council car parks</a>, <a href="http://www.leedsdatamill.org/dataset/roadworks">details of completed and live roadworks in the city</a> and <a href="http://leedsdatamill.org/dataset/leeds-city-centre-footfall-data">footfall data for eight locations in the city centre</a>.</p>

<p><img alt="Leeds roadworks data from Leeds Data Mill" src="https://silvrback.s3.amazonaws.com/uploads/abaad5cd-a661-4bcc-937a-8a1b4ef2209f/leeds_large.png" /></p>

<p><em>Continued in <a href="https://nickefford.silvrback.com/programming-the-world-part-3">Part 3</a>...</em></p>
]]></content:encoded>
      </item>
      <item>
        <guid>https://nickefford.silvrback.com/programming-the-world-part-1#2644</guid>
          <pubDate>Mon, 07 Apr 2014 03:36:53 -1200</pubDate>
        <link>https://nickefford.silvrback.com/programming-the-world-part-1</link>
        <title>Programming The World (Part 1)</title>
        <description>Data From Devices</description>
        <content:encoded><![CDATA[<h1 id="introduction">Introduction</h1>

<p>Our world is becoming increasingly programmable, due to a number of emerging trends. One trend is that increasing quantities of useful public data are being made available over the web in machine-readable forms. Another is that many of the devices around us are becoming &#39;smart&#39; and connected, capable of feeding real-time information on their surroundings into the web and (to a more limited extent) of reacting in response to commands issued via the web. Then there&#39;s the fact that many of us these days carry smartphones: powerful computers with a near-permanent (depending on service provider) Internet connection. We therefore don&#39;t have to be sat in front of a PC to interact with this brave new world of data and devices.</p>

<p>This article, the first in a three-part series, looks at how data from devices becomes web-accessible and considers the different data formats that are commonly used. <a href="https://nickefford.silvrback.com/programming-the-world-part-2">Part 2</a> surveys some open data sources. <a href="https://nickefford.silvrback.com/programming-the-world-part-3">Part 3</a> explores how we can write programs in <a href="http://www.python.org">Python</a> to acquire and process data from these sources.</p>

<p><em>Note: these three articles are aimed at people who have some experience of Python programming but who don&#39;t have much familiarity with data sources or data formats. The articles are based on material originally delivered as a workshop for IT teachers, with the aim of showing them some interesting projects that their students might do once they have learned a bit of Python.</em></p>

<h1 id="the-internet-of-things">The &#39;Internet of Things&#39;</h1>

<p>Advances in networking technology and falling hardware costs have resulted in a proliferation of small devices that sense their environment and make these measurements available over the web. One such device is the Kickstarter-funded <a href="http://supermechanical.com/twine">Twine</a>.</p>

<p><img alt="Twine" class="sb_float" src="https://silvrback.s3.amazonaws.com/uploads/4e1be365-b482-4738-ba3a-7195c88bb87c/twine_medium.jpg" /></p>

<p>A Twine contains internal sensors for temperature, orientation and acceleration. You can also connect external moisture sensors and magnetic reed switches produced by Twine&#39;s manufacturer, or sensors of your own design via a special &#39;breakout board&#39;.</p>

<p>A Twine is programmed with rules based on data from its sensors and uses a Wi-Fi connection to issue notifications via email, SMS or HTTP when these rules trigger. This Wi-Fi connection is also used to update the device with new or modified rules, which are programmed in a visual manner via a straightforward web-based interface.</p>

<p><img alt="Twine rule development" src="https://silvrback.s3.amazonaws.com/uploads/7a6bf34c-3ff1-4903-935a-4285f66a9931/rule_large.png" /></p>

<p>Notice in the screenshot above how this particular Twine has been programmed to send data to a web API hosted at thingspeak.com. <a href="http://thingspeak.com">ThingSpeak</a> promotes itself as an &quot;open application platform designed to enable meaningful connections between things and people&quot;. Once you&#39;ve registered with ThingSpeak, you can set up public or private <strong>channels</strong> for your devices, through which data are made available for visualisation or downloading.</p>

<p>Why not try this out now? Head over to <a href="https://thingspeak.com/channels/public">https://thingspeak.com/channels/public</a> to see a listing of some of ThingSpeak&#39;s public channels. Click on the link to one of these channels to see the data feed visualised, then click on the <em>Developer Info</em> tab at the top-right to see the formats in which you can download data from this channel.</p>

<p><img alt="Data for a ThingSpeak channel" src="https://silvrback.s3.amazonaws.com/uploads/304c356c-1985-4a7c-a29b-6c0fe1301ebe/thingspeak_large.png" /></p>

<p>Notice the links to three different formats: JSON, XML, CSV. Try clicking on these links to view the data. (Depending on your browser, you might see the data displayed in the browser window or it might be treated as a downloaded file; if the latter, just open the file in a text editor to view the data.)</p>

<h1 id="data-formats">Data Formats</h1>

<h3 id="csv">CSV</h3>

<p>&#39;Comma-Separated Value&#39; format is the simplest of the three formats offered by ThingSpeak, best suited to data that are tabular in nature. One big reason for its popularity is that spreadsheet applications such as Excel or <a href="https://www.libreoffice.org/discover/calc/">LibreOffice Calc</a> can open CSV files.</p>

<p>The first few lines of the CSV data for the ThingSpeak feed in the screenshot above look like this:</p>
<div class="highlight"><pre><span></span>created_at,entry_id,field1
2014-04-03 11:47:02 UTC,9005,14.375
2014-04-03 12:02:07 UTC,9006,13.75
2014-04-03 12:17:11 UTC,9007,13
2014-04-03 12:32:18 UTC,9008,13.125
</pre></div>
<p>This is a dataset with three columns, representing a timestamp, a unique identifier for the measurement and the measurement itself (a temperature in this case). A comma is used to separate the values in each column. (If the value itself contains a comma, this must be protected in some way - e.g., by enclosing the entire value in quotes.) The first line contains the column headings.</p>

<p>For very uniform data where all the records have the same structure, CSV is a good choice, not least because it has a very good <strong>data-to-markup ratio</strong>. In this example, the markup consists of the first line and then only two commas on each subsequent line. Most of the text is useful data.</p>

<h3 id="xml">XML</h3>

<p>Extensible Markup Language (XML) is very flexible because it allows you to define your own <strong>elements</strong> that describe the data. Most (though not all) elements enclose data within a <strong>start tag</strong> and <strong>end tag</strong> - for example, <code>&lt;name&gt;</code> and <code>&lt;/name&gt;</code>. Attributes can also be associated with an element if required, using a &#39;key=value&#39; format - for example, <code>&lt;latitude type=&quot;decimal&quot;&gt;...&lt;/latitude&gt;</code>.</p>

<p>The XML data for the ThingSpeak feed in the screenshot above looks like this:</p>
<div class="highlight"><pre><span></span><span class="cp">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;</span>
<span class="nt">&lt;channel&gt;</span>
  <span class="nt">&lt;id</span> <span class="na">type=</span><span class="s">&quot;integer&quot;</span><span class="nt">&gt;</span>135<span class="nt">&lt;/id&gt;</span>
  <span class="nt">&lt;name&gt;</span>Thermometer<span class="nt">&lt;/name&gt;</span>
  <span class="nt">&lt;description&gt;</span>
    Wireless outdoor thermometer
    (Electric Imp, TI TMP102 sensor, 4 x AA Energizer L91).
  <span class="nt">&lt;/description&gt;</span>
  <span class="nt">&lt;latitude</span> <span class="na">type=</span><span class="s">&quot;decimal&quot;</span><span class="nt">&gt;</span>55.652072<span class="nt">&lt;/latitude&gt;</span>
  <span class="nt">&lt;longitude</span> <span class="na">type=</span><span class="s">&quot;decimal&quot;</span><span class="nt">&gt;</span>12.546301<span class="nt">&lt;/longitude&gt;</span>
  <span class="nt">&lt;field1&gt;</span>Temperature<span class="nt">&lt;/field1&gt;</span>
  <span class="nt">&lt;created-at</span> <span class="na">type=</span><span class="s">&quot;dateTime&quot;</span><span class="nt">&gt;</span>2011-02-23T22:43:37Z<span class="nt">&lt;/created-at&gt;</span>
  <span class="nt">&lt;updated-at</span> <span class="na">type=</span><span class="s">&quot;dateTime&quot;</span><span class="nt">&gt;</span>2014-04-04T11:22:55Z<span class="nt">&lt;/updated-at&gt;</span>
  <span class="nt">&lt;elevation&gt;</span>20m<span class="nt">&lt;/elevation&gt;</span>
  <span class="nt">&lt;last-entry-id</span> <span class="na">type=</span><span class="s">&quot;integer&quot;</span><span class="nt">&gt;</span>9092<span class="nt">&lt;/last-entry-id&gt;</span>
  <span class="nt">&lt;feeds</span> <span class="na">type=</span><span class="s">&quot;array&quot;</span><span class="nt">&gt;</span>
    <span class="nt">&lt;feed&gt;</span>
      <span class="nt">&lt;created-at</span> <span class="na">type=</span><span class="s">&quot;dateTime&quot;</span><span class="nt">&gt;</span>2014-04-03T11:47:02Z<span class="nt">&lt;/created-at&gt;</span>
      <span class="nt">&lt;entry-id</span> <span class="na">type=</span><span class="s">&quot;integer&quot;</span><span class="nt">&gt;</span>9005<span class="nt">&lt;/entry-id&gt;</span>
      <span class="nt">&lt;field1&gt;</span>14.375<span class="nt">&lt;/field1&gt;</span>
      <span class="nt">&lt;id</span> <span class="na">type=</span><span class="s">&quot;integer&quot;</span> <span class="na">nil=</span><span class="s">&quot;true&quot;</span><span class="nt">/&gt;</span>
    <span class="nt">&lt;/feed&gt;</span>
    <span class="nt">&lt;feed&gt;</span>
      <span class="nt">&lt;created-at</span> <span class="na">type=</span><span class="s">&quot;dateTime&quot;</span><span class="nt">&gt;</span>2014-04-03T12:02:07Z<span class="nt">&lt;/created-at&gt;</span>
      <span class="nt">&lt;entry-id</span> <span class="na">type=</span><span class="s">&quot;integer&quot;</span><span class="nt">&gt;</span>9006<span class="nt">&lt;/entry-id&gt;</span>
      <span class="nt">&lt;field1&gt;</span>13.75<span class="nt">&lt;/field1&gt;</span>
      <span class="nt">&lt;id</span> <span class="na">type=</span><span class="s">&quot;integer&quot;</span> <span class="na">nil=</span><span class="s">&quot;true&quot;</span><span class="nt">/&gt;</span>
    <span class="nt">&lt;/feed&gt;</span>

  <span class="nt">&lt;/feeds&gt;</span>
<span class="nt">&lt;/channel&gt;</span>
</pre></div>
<p>This lengthy example includes just two of the measurements from the data feed! To be fair, this is partly because XML&#39;s flexibility means that other information can be included besides the measurements themselves - for example, a description of the sensor and its latitude, longitude and elevation. However, even if you ignore all this extra information, the part dealing with the measurements themselves still occupies <em>over five times</em> as much space as the equivalent CSV text!</p>

<p>This inherent verbosity is one of the main drawbacks of using XML. Another is that processing XML with a program is more difficult than processing CSV. Fortunately, there are libraries of code for all common programming languages that will parse XML for you. In many cases, these libraries are a standard part of the language.</p>

<h3 id="json">JSON</h3>

<p>JavaScript Object Notation (JSON) is a less formal alternative to XML, providing similar flexibility and descriptive capabilities but with reduced verbosity and a much improved data-to-markup ratio.</p>

<p>The JSON data for the ThingSpeak feed looks like this:</p>
<div class="highlight"><pre><span></span><span class="p">{</span>
  <span class="nt">&quot;channel&quot;</span><span class="p">:</span> <span class="p">{</span>
    <span class="nt">&quot;id&quot;</span><span class="p">:</span> <span class="mi">135</span><span class="p">,</span>
    <span class="nt">&quot;name&quot;</span><span class="p">:</span> <span class="s2">&quot;Thermometer&quot;</span><span class="p">,</span>
    <span class="nt">&quot;description&quot;</span><span class="p">:</span> <span class="s2">&quot;Wireless outdoor thermometer (Electric Imp, TI TMP102 sensor, 4 x AA Energizer L91).&quot;</span><span class="p">,</span>
    <span class="nt">&quot;latitude&quot;</span><span class="p">:</span> <span class="s2">&quot;55.652072&quot;</span><span class="p">,</span>
    <span class="nt">&quot;longitude&quot;</span><span class="p">:</span> <span class="s2">&quot;12.546301&quot;</span><span class="p">,</span>
    <span class="nt">&quot;field1&quot;</span><span class="p">:</span> <span class="s2">&quot;Temperature&quot;</span><span class="p">,</span>
    <span class="nt">&quot;created_at&quot;</span><span class="p">:</span> <span class="s2">&quot;2011-02-23T22:43:37Z&quot;</span><span class="p">,</span>
    <span class="nt">&quot;updated_at&quot;</span><span class="p">:</span> <span class="s2">&quot;2014-04-04T11:22:55Z&quot;</span><span class="p">,</span>
    <span class="nt">&quot;elevation&quot;</span><span class="p">:</span> <span class="s2">&quot;20m&quot;</span><span class="p">,</span>
    <span class="nt">&quot;last_entry_id&quot;</span><span class="p">:</span> <span class="mi">9092</span>
  <span class="p">},</span>
  <span class="nt">&quot;feeds&quot;</span><span class="p">:</span> <span class="p">[</span>
    <span class="p">{</span>
      <span class="nt">&quot;created_at&quot;</span><span class="p">:</span> <span class="s2">&quot;2014-04-03T11:47:02Z&quot;</span><span class="p">,</span>
      <span class="nt">&quot;entry_id&quot;</span><span class="p">:</span> <span class="mi">9005</span><span class="p">,</span>
      <span class="nt">&quot;field1&quot;</span><span class="p">:</span> <span class="s2">&quot;14.375&quot;</span>
    <span class="p">},</span>
    <span class="p">{</span>
      <span class="nt">&quot;created_at&quot;</span><span class="p">:</span> <span class="s2">&quot;2014-04-03T12:02:07Z&quot;</span><span class="p">,</span>
      <span class="nt">&quot;entry_id&quot;</span><span class="p">:</span> <span class="mi">9006</span><span class="p">,</span>
      <span class="nt">&quot;field1&quot;</span><span class="p">:</span> <span class="s2">&quot;13.75&quot;</span>
    <span class="p">},</span>

  <span class="p">]</span>
<span class="p">}</span>
</pre></div>
<p>The use of name-value pairs rather than start and end tags helps to reduce the storage requirements considerably. The temperature measurements in this data feed occupy half the space of those in the XML data feed.</p>

<p><em>Continued in <a href="https://nickefford.silvrback.com/programming-the-world-part-2">Part 2</a>...</em></p>
]]></content:encoded>
      </item>
  </channel>
</rss>