Its long been a problem that Local Authorities have published broadband maps of their publicly-funded roll-out programmes but made looking up the data really quite hard.

While a householder can usually find their own area quite quickly an operator considering investing in a wider area often finds it very much harder, sometimes resorting to manually entering postcodes one by one which may work for small areas but not for larger areas across multiple local authority areas.

I take a different approach which gives me reliable data for each and every postcode in a given local body area. Here’s how I reverse engineer LA broadband maps into tables of data.

The process can feel a little cumbersome to set up the first time but quickly becomes fairly routine. The key steps are:

i.    Capture the best quality broadband map published by the local authority;

ii.   Use QGIS’s georeference tool to align it with a suitable free OS Opendata map;

iii.  Overlay the OS Opendata Codepoint postcode points;

iv.  Use QGIS’s point sampling tool to capture the colour under each postcode in numerical format (RGB values)

v.   Pass the RGB values through kmeans where the number of clusters equates to the number of legend items in the LA map

This last step is a quick and effective way to reverse engineer the map colours from the legend – if there are three colours in the legend representing areas that already have superfast broadband, areas that will get it under the local programme, and areas that remain problematic then tell kmeans to find three clusters hidden among the red, green and blue values.

The first four stages can be done in QGIS – it’s quick, painless and user-friendly, and results in a shapefile with a map point for each postcode together with the red, green and blue values from the map beneath.

At the moment I load the shapefile into a spatial database from within QGIS, and complete the final stage with a kmeans function in Postgres – at some point I must convert this to a QGIS Processing script which would mean the whole process could be done in a single session with QGIS.

Its only this final stage thats a little tricky to set-up initially. The first step is to get the R statistical language installed and to add the Pl/R extension to extend Postgres with R capabilities. The best guide for doing this is on the brilliant Boston GIS site.

With this working, the inelegant SQL function below will scan a table with red, green and blue values and quickly calculate the clusters for you. When I say quickly, it takes around half a second to calculate 20,000 postcodes.

CREATE TYPE kmean_rgb_cluster_id AS (gid integer, red double precision, green double precision, blue double precision, clust integer);

CREATE OR REPLACE FUNCTION r_rgb_cluster_id(
     num_of_clusters integer,
     gid integer[],
     red double precision[],
     green double precision[],
     blue double precision[]
)
RETURNS SETOF kmean_rgb_cluster_id AS

$BODY$
     data <- cbind(red, green, blue)
     cl <- kmeans(data, num_of_clusters)

     clust = cl$cluster

     result <- data.frame(gid = gid, red = red, green = green, blue = blue, clust = clust)
     return(result)

$BODY$
LANGUAGE 'plr';

The function takes as input the number of clusters you want found (= legend items on the map) and arrays for gid, red, green and blue values. The gid is the standard geographical index in a spatial table and ensures its possible to map the result back to the original table.

Running the script returns an R dataframe of the results, which is treated as an array by SQL, with the cluster number (e.g. 1, 2, 3) added to each line. In this example below, cluster 1 equated to ‘Build’ on the original map, so a quick CASE statement was used to make reading the data a little easier.

SELECT (clust.r_rgb_cluster_id).gid AS gid,
       (clust.r_rgb_cluster_id).clust,
       CASE
         WHEN (clust.r_rgb_cluster_id).clust = 1 THEN 'Build' 
         WHEN (clust.r_rgb_cluster_id).clust = 2 THEN 'No Build'
         WHEN (clust.r_rgb_cluster_id).clust = 3 THEN 'Built'
       END AS cluster_id
FROM   (SELECT r_rgb_cluster_id(
            3, -- The number of items in the map legend
            array_agg(gid),
            array_agg("1_red"),
            array_agg("2_green"),
            array_agg("3_blue")
           )
        FROM postcodes_from_scanned_map_table
       ) AS clust

It’s not the tidiest and someone with more experience of Pl/R will probably wince but it works and the result is a table of all the postcodes in a given local authority area with a good indication of whether it’s safe to invest or not. (If anyone has improvement on my attempt, please let me know!)

Once you have the steps in place, the whole process is a lot quicker than having to manually input postcodes into a checker one by one. The longest bit is usually georeferencing the map in step 2 above as it’s important to get this right – a small shift in the map and you may be sampling the wrong pixels under each postcode.

It would clearly be more helpful if Local Authorities released tables of postcodes alongside the local authority maps but where they don’t then it’s possible to reverse engineer them with a good degree of accuracy.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>