Friday, June 17, 2016

Crime and Census Data for England and Wales on a Map

Crime and Census Data for England and Wales on a Map

Glossary of Terms used

Two terms I will use throughout the blog and a base to map data for England and Wales.

Local authority districts (LAD) - Is a generic term to describe the 'district' level of local government

Lower Super Output Areas (LSOA) - Taken from National Stats site “ Super output areas (SOA) are a geographic hierarchy designed to improve the reporting of small area statistics. The 34,753 lower layer SOA in England (32,844) and Wales (1,909) were built from groups of output areas that were created in 2003 and maintained in 2011” referenceLink

Background and Details of my Crime Analysis

As per the explanation in my previous blogs I like the way choropleth maps can represent (or misrepresent) data on a map to tell a story about the data. While looking into the use of choropleths elsewhere I have found statements that they should not represent raw data (it can depend on the exact nature of the data) as this can lead to a misunderstanding of any map’s intention. The recommendation is to normalise raw data so it can be compared on a map, I like this article on the subject here. So I intend to show how the raw and normalised data can affect a choropleth map and the interpretation of it. This time around I have added workday population (link to site) from the 2011 census to my data set via an SAP HANA view. This now allows me to compare and contrast the following...
  1. Raw data of total crime in each area.
  2. Total crimes per LSOA area (km)
  3. Total crimes per LSOA normal resident population (only people who live in the area)
  4. Total crimes per LSOA workday population (an estimate of people who work or live in the area during a workday)

I had read that only using the resident population to normalise data was an issue for “special” areas such as Westminster and the City of London local authority districts. I even removed these areas from one of my previous maps to highlight the issue as shown in this link.

I have found that when looking at the individual districts (LADs) at the LSOA level, it appeared all crime was in the city centres of any particular district. So as these areas have a higher population during the working day I added this dataset to view on the maps. So as an example the largest LAD by population is Birmingham, my home town. And the reason to state Birmingham is my hometown is that my own bias came into account when I tried to visualise the crime data I had loaded into the SAP HCP previously. I did create a SCN blog about loading the data to the HCP to be used with Lumira. The last step of that blog was to investigate the anti-social crime type with the intention of naming the most anti-social place in England and Wales. From the raw data the place I identified was in fact my hometown but there must be some other reasons behind that finding! So I then researched how to present crime data and found ways to normalise data to reflect the raw data by area and population. E.g the population of Birmingham is the largest by LADs classification in England and Wales - so based on the statement “more people more crime” then that would mean Birmingham came out on top of most categories. So it comes down to the fact of what information I want to show can impact how I visualise and present the data. Also to state my hometown Birmingham is a great place and well worth a visit.

The below map shows Birmingham’s raw crime data from Jan 2011 to Dec 2013

There is a similar result for the normalised crime data per LSOA area (km) (on the left) and per 1000 resident population (on the right of the map below).

However for the crime data per 1000 workday population shows the following.

The above shows a more widespread crime wave across all of Birmingham! However as stated before this visualisation of crime data does reflect across all LADs I have looked at so far. In that the raw and normalised data by resident population does indicate a high crime rate for central/city locations.

The below map is the pure census data from the 2011 data for resident population. For this map I would expect a more even spread of population per LSOA (as the stated mean average is 1500). I have not normalised this data and it is the raw data. The data can be normalised by area in my map but for this map I wanted to show the actual data.

The above does show one area in particular that is big by area and a high population. If I zoom into this area then it shows that the University of Birmingham and I assume has a high student population. As per my previous blog there is one type of crime that I guess is quite high in this area and more of that later.

The next map shows the workday population per LSOA, this is an estimate of the workday population from the 2011 census.

As you can see the center of Birmingham does have a high workday population in comparison to the other LSOA areas.

Now back to the University of Birmingham and the crime type I guessed would be high in this area is “bicycle theft” as it has been stated that students are more likely to be victims of this crime -link tweet

The above map does indicate bike theft around the University of Birmingham is high in comparison to other parts of Birmingham.
Although it is flawed in that it is the raw data on the map but I am using it to make my point :).

As the choropleth is useful to identify areas of high crime, it does not provide the ability to drill down into the individual crime locations. For this I used a great  Leaflet.js  plugin called PruneCluster that allows me to show a large dataset via a cluster of crime locations. I can show over 360,000 crime locations (via a cluster by looping through my aggregated data) for Birmingham in one call to HANA. The map on the left is the aggregated crime data shown in a choropleth map and the map on the right is the cluster of crime locations. The choropleth map controls the cluster map and is kept in sync with another leaflet plugin called leaflet.sync.

I have also chosen to use the site overpass turbo on my site to highlight certain aspects of the location related to a crime type. In my previous use of overpass turbo I highlighted the university of Oxford buildings/sites and linked this to the location of bike theft. This time I used a Leaflet.js plugin/extension leaflet-layerJSON to show various points of interest on a map where I have related them to the crime. E.g. for bike theft my map will show Bicycle parking locations, for shoplifting the location of shops and for vehicle crime then car parks are shown. Overpass turbo site has the ability to provide a lot of data and more information than I can show on my map. Therefore it only shows the points of interest in a single location and not an entire building for example. A policeman may make an appearance every now and again for all other crime types as this indicates a police station on the map.

So back to the University bike theft map with a focus on the crime locations and bicycle parking areas.

Comparing Bike theft to Cambridge which I know from my previous blog has a high rate of bike theft in the data.

First I noticed how many bicycle parking areas there are and how much Cambridge appears to be bike friendly and has a high level of people commuting by bike from googling the subject. So another reason, for a high rate of bike crime, that comes to my mind “more bikes more bike crime” and how to show that. That will be for another day….

So I have learned that I can control and pick a way to tell a story I want with my investigation into the crime data. It does come down to the way I chose to normalise the data for the choropleth maps. For the cluster map then from my point of view that is just the pure data and no control over this (that I know of) however the raw data does not provide the ability to normalise data and compare against other areas from a common baseline.

Thanks for reading and will finish with a data quality statement and credits for the maps.

Data Quality

I have checked the data in my map in various ways to ensure I get the correct results as per the base data from the Police site. From the various checks I have made I am sure I have got the calculations and selections correct. However as the HANA modelling is something I am teaching myself there may be some issues that I have yet to discover, the way I have created my HANA views has changed over the years to use Analytic and Calculation views. 
There is one known issue with special characters in the LAD name. e.g. “Bristol, city of” is available in the drop down and will not work until I fix an issue with the selection (and 3 other LADS with either , or ‘ in the name which fail).


Crime Data

As covered in my previous blog (link above) I use the crime data from The data is made available under theUK Open Government Licence.

Census Data

I use the official labour market statistics site for the census data which can be found at the following web address


All the tiles for my map are from OpenStreetMap

Choropleth Map Data

For the choropleth maps on my site then I used the Office for National Statistics to download the geographical reference data here at this link ONS Geo Portal

Contains Ordnance Survey data © Crown copyright and database right 2015
Contains National Statistics data © Crown copyright and database right 2015

Leaflet - Javascript library for maps

My entire site relies on Leaflet.js to bring all the contents together in one place on a map. Thank you Vladimir. Also thanks to the extensions I use for leaflet,  leaflet.syncleaflet prunecluster and leaflet-layer.json

OverpassTurbo I use the api from this stie to display Openstreetmap data on my map. License link for the API. So many options for extracting the data and the way I use it is to highlight certain features on a map to correspond to a chosen crime type.


The OpenStreetMap tiles are provided by Mapquest

Using Ordnance Survey Open Data with GeoServer running in the SAP HANA Cloud Platform

Using Ordnance Survey Open Data with GeoServer running in the SAP HANA Cloud Platform (HCP)

By Robert Russell

I use the SAPHCP as a runtime only version of GeoServer and use a local platform independant master copy of GeoServer on my Mac computer. I make my changes and updates on my local machine then use scripts to update a WAR file to deploy to the HCP. If you are interested in following these as a way to deploy to the HCP then I suggest that the HCP documentation is followed first link here. What follows however can be used to setup Ordnance Survey data on any GeoServer installation. As the GeoServer data directory is not a permanent filesystem on the SAPHCP then I use the extended scripts to keep my local and cloud HCP installations of GeoServer in sync (and I do not make any updates on the HCP version that I intend to keep as it will be lost when GeoServer restarts in the HCP)

Ordnance Survey Data Downloads

Download data from Ordnance Survey, I was interest in Solihull in the West Midlands, so downloaded the SP grid reference square.

An download link arrived via email and I extracted the files to a dedicated directory for the shapefiles.
I downloaded the stylesheets from Github

Download the ZIP file and I extracted to a dedicated directory for these style sheets.

For the Shapefiles and stylesheets I noted down the directories

GeoServer Setup

Download the platform independent version of GeoServer - I chose the same version as the WAR file for the HCP

Extract the file and find the “” script in the extracted directories to make a change to the JAVA_HOME settings. As I have already setup Eclipse and deployed GeoServer to the HCP I already had a SAPJVM 7 in place. GeoServer makes no mention at all of SAPJVM as a supported JVM but it works ;) .

Make sure is an executable and run the script to start GeoServer
Login with the standard user and password ( it is usually admin/geoserver )

Change the contact details (and the email address not shown in the screenshot above)
Change the default password
Select “server status” and note down the “Data directory” location as I used that in my scripts
So now up to three directories with the two Ordnance surveys and the GeoServer data directory.

Script to setup Ordnance survey shapefiles with GeoServer

I created an empty directory

Based on the stylesheet Quick start guide pdf

I created a simple csv file matching the shapefiles to the stylesheets.

vi matchShapeStyle.csv

The following script using curl and the REST services of GeoServer to publish the Ordnance Survey shapefiles to GeoServer.

The user and password needs to be changed CHANGETHISuser:password
The Download directories need to be updated as well /Users/robert/Downloads/{shapefiles/stylesheets}

#SLD copy master
###BEFORE running delete SP styles, shapefile store and layers
#####Leave SP_HeritageSite Site Max scale in
#SLD files are in

rm *sld
rm *sld.n*
cp "/Users/robert/Downloads/OS-VectorMap-District-stylesheets-master/ESRI Shapefile stylesheets/GeoServer stylesheets (SLD)/Full Colour style/"*sld .
cp "/Users/robert/Downloads/OS-VectorMap-District-stylesheets-master/ESRI Shapefile stylesheets/GeoServer stylesheets (SLD)/Backdrop style/"*sld .

SP="file:/Users/robert/Downloads/OS VectorMap District (ESRI Shape File) SP/data"

curl -v -u CHANGETHISuser:password -XPUT -H "Content-type: text/plain" -d "file:/Users/robert/Downloads/OS VectorMap District (ESRI Shape File) SP/data"  "http://localhost:8080/geoserver/rest/workspaces/cite/datastores/shapefiles/external.shp?configure=all"


while read i


NAME=`echo $i| awk -F, '{print $1}'`
NAME_S=`echo $NAME|sed "s/.shp//g"`
UPSLD=`echo $i| awk -F, '{print $2}'|tr -d '\r'`

sed -e '1,4d' < ${UPSLD}|sed -e "s/^M//g"|sed "s#vmdsymbols/FullColour/#file:#" >${UPSLD}.no
cat masterHEAD ${UPSLD}.no > ${UPSLD}.new

# "HeritageSite_FullColour.sld" "RailwayStation_FullColour.sld"

case ${UPSLD} in
echo "${UPSLD} Match" >>/tmp/checker
cat ${UPSLD}.new| grep -v "<MinScaleDenominator>7000" >${UPSLD}
echo "${UPSLD} Match" >>/tmp/checker
cat ${UPSLD}.new| grep -v "<MinScaleDenominator>7000" >${UPSLD}
echo "${UPSLD} Match" >>/tmp/checker
cat ${UPSLD}.new| grep -v "<MinScaleDenominator>7000" >${UPSLD}
echo "${UPSLD} Match" >>/tmp/checker
cat ${UPSLD}.new| grep -v "<MinScaleDenominator>7000" >${UPSLD}
echo "${UPSLD} Match" >>/tmp/checker
cat ${UPSLD}.new| grep -v "<MinScaleDenominator>7000" >${UPSLD}
echo "${UPSLD} Match" >>/tmp/checker
cat ${UPSLD}.new| grep -v "<MinScaleDenominator>7000" >${UPSLD}
echo "${UPSLD} Match" >>/tmp/checker
cat ${UPSLD}.new| grep -v "<MinScaleDenominator>7000" >${UPSLD}
echo "${UPSLD} Match" >>/tmp/checker
cat ${UPSLD}.new| grep -v "<MinScaleDenominator>7000" >${UPSLD}
cat ${UPSLD}.new| grep -v "<MinScaleDenominator>7000"|grep -v "<MaxScaleDenominator>25000" >${UPSLD}


ls -l $UPSLD



echo $SLD

curl -u CHANGETHISuser:password -XPOST -H 'Content-type: text/xml'  -d ${SLD} http://localhost:8080/geoserver/rest/workspaces/cite/styles

curl -v -u CHANGETHISuser:password -XPUT -H "Content-type: application/vnd.ogc.sld+xml"  -d @${UPSLD}  http://localhost:8080/geoserver/rest/workspaces/cite/styles/${NAME_S}

curl -u CHANGETHISuser:password -XPUT -H 'Content-type: text/xml' -d ${MAPSLD}  http://localhost:8080/geoserver/rest/layers/cite:${NAME_S}

#curl -u CHANGETHISuser:password -XPUT -H 'Content-type: text/xml'  -d ${SETSRS} http://localhost:8080/geoserver/rest/layers/cite:${NAME_S}

#curl -u CHANGETHISuser:password -XPUT -H 'Content-type: text/xml'  -d "<featureType><srs>EPSG:900913</srs></featureType>" http://localhost:8080/geoserver/rest/workspaces/cite/datastores/shapefiles/featuretypes/${NAME_S}

done < matchShapeStyle.csv


#echo $LG
curl -v -u CHANGETHISuser:password -XPOST -H 'Content-type: text/xml' -d ${LG}  http://localhost:8080/geoserver/rest/layergroups

echo "UPDATE BBOX Feature to limit to Solihull"
echo "1) IMPORtant to add bbox restriction to layers"
echo "    BBOX(the_geom, 408813.38462, 272194.17965, 427732.57921, 290850.25285)"
echo "2) also manually add roadabout and road stylings to layer group SP"
echo "3) place disk quota to 100mb for geowebcache"
echo "4) add SP layer group to cite workspace"
echo "5) REALLY important to have relative path name to SP store"

The Last 5 items displayed at the script are required as per the following

  1. As the SAPHCP trial account has usage limits I choose to limit the shapefiles via a layer bounding box CQL selection as the following.
  2. As SP_Road and SP_Roundabout have 2 styles in the Layer Group I manually import the style sheet and add the layers to the Layer Group
  3. Limit the GeoWebCache to 100mb - again purely due to the SAPHCP limits - this settings HAS to be made again once deployed to the SAPHCP.
  4. Add the SP Layer group to the CITE workspace.
  5. IMPORTANT to have a relative path to the “shapefiles” Data Source Name. file:data/SP

Deploy to SAPHCP

**Important to Stop and Delete the neogeo28 application from the Cloud cockpit

Created a script
Created an empty directory
Downloaded/Copied the STANDARD GeoServer WAR file to a dedicated directory

Information on the deploy script below
The script is part of the Tomcat SDK for SAPHCP.
**Important that the local Data directory is used /Users/robert/Downloads/geoserver-2.8.1/data_dir
The caffeinate command is used to prevent my Mac from sleeping during the deploy to the SAPHCP as it can take some time to deploy
Change the CHANGETHISuser@email to the account email address for the SAPHCP

mkdir -p /Users/robert/Downloads/geoserver-2.8.1/data_dir/data/SP
cp "/Users/robert/Downloads/OS VectorMap District (ESRI Shape File) SP/data/"* /Users/robert/Downloads/geoserver-2.8.1/data_dir/data/SP

sed -i "" "s#/Users/robert/Downloads/OS%20VectorMap%20District%20(ESRI%20Shape%20File)%20SP/data/#data/SP#g" /Users/robert/Downloads/geoserver-2.8.1/data_dir/workspaces/cite/shapefiles/datastore.xml

rm -rf /Users/robert/Desktop/root/geoserver/deploy/neo
mkdir /Users/robert/Desktop/root/geoserver/deploy/neo

cd /Users/robert/Desktop/root/geoserver/deploy/neo

unzip /var/tmp/geoserver.war -d .
rm -rf data
mkdir data

cp -pr /Users/robert/Downloads/geoserver-2.8.1/data_dir/. /Users/robert/Desktop/root/geoserver/deploy/neo/data/.

zip -r neogeo28.war .

caffeinate -i /Users/robert/Downloads/neotom/tools/ deploy --host --account p1248461150trial --application neogeo28 --source /Users/robert/Desktop/root/geoserver/deploy/neo/neogeo28.war --user CHANGETHISuser@email

/Users/robert/Downloads/neotom/tools/ start --host --account p1248461150trial --application neogeo28 --user CHANGETHISuser@email

**Once Deployed ENSURE THE limit of the Disk Quota

Google +