Friday, June 17, 2016

Crime and Census Data for England and Wales on a Map

Crime and Census Data for England and Wales on a Map

Glossary of Terms used

Two terms I will use throughout the blog and a base to map data for England and Wales.

Local authority districts (LAD) - Is a generic term to describe the 'district' level of local government

Lower Super Output Areas (LSOA) - Taken from National Stats site “ Super output areas (SOA) are a geographic hierarchy designed to improve the reporting of small area statistics. The 34,753 lower layer SOA in England (32,844) and Wales (1,909) were built from groups of output areas that were created in 2003 and maintained in 2011” referenceLink

Background and Details of my Crime Analysis

As per the explanation in my previous blogs I like the way choropleth maps can represent (or misrepresent) data on a map to tell a story about the data. While looking into the use of choropleths elsewhere I have found statements that they should not represent raw data (it can depend on the exact nature of the data) as this can lead to a misunderstanding of any map’s intention. The recommendation is to normalise raw data so it can be compared on a map, I like this article on the subject here. So I intend to show how the raw and normalised data can affect a choropleth map and the interpretation of it. This time around I have added workday population (link to site) from the 2011 census to my data set via an SAP HANA view. This now allows me to compare and contrast the following...
  1. Raw data of total crime in each area.
  2. Total crimes per LSOA area (km)
  3. Total crimes per LSOA normal resident population (only people who live in the area)
  4. Total crimes per LSOA workday population (an estimate of people who work or live in the area during a workday)

I had read that only using the resident population to normalise data was an issue for “special” areas such as Westminster and the City of London local authority districts. I even removed these areas from one of my previous maps to highlight the issue as shown in this link.

I have found that when looking at the individual districts (LADs) at the LSOA level, it appeared all crime was in the city centres of any particular district. So as these areas have a higher population during the working day I added this dataset to view on the maps. So as an example the largest LAD by population is Birmingham, my home town. And the reason to state Birmingham is my hometown is that my own bias came into account when I tried to visualise the crime data I had loaded into the SAP HCP previously. I did create a SCN blog about loading the data to the HCP to be used with Lumira. The last step of that blog was to investigate the anti-social crime type with the intention of naming the most anti-social place in England and Wales. From the raw data the place I identified was in fact my hometown but there must be some other reasons behind that finding! So I then researched how to present crime data and found ways to normalise data to reflect the raw data by area and population. E.g the population of Birmingham is the largest by LADs classification in England and Wales - so based on the statement “more people more crime” then that would mean Birmingham came out on top of most categories. So it comes down to the fact of what information I want to show can impact how I visualise and present the data. Also to state my hometown Birmingham is a great place and well worth a visit.

The below map shows Birmingham’s raw crime data from Jan 2011 to Dec 2013

There is a similar result for the normalised crime data per LSOA area (km) (on the left) and per 1000 resident population (on the right of the map below).

However for the crime data per 1000 workday population shows the following.

The above shows a more widespread crime wave across all of Birmingham! However as stated before this visualisation of crime data does reflect across all LADs I have looked at so far. In that the raw and normalised data by resident population does indicate a high crime rate for central/city locations.

The below map is the pure census data from the 2011 data for resident population. For this map I would expect a more even spread of population per LSOA (as the stated mean average is 1500). I have not normalised this data and it is the raw data. The data can be normalised by area in my map but for this map I wanted to show the actual data.

The above does show one area in particular that is big by area and a high population. If I zoom into this area then it shows that the University of Birmingham and I assume has a high student population. As per my previous blog there is one type of crime that I guess is quite high in this area and more of that later.

The next map shows the workday population per LSOA, this is an estimate of the workday population from the 2011 census.

As you can see the center of Birmingham does have a high workday population in comparison to the other LSOA areas.

Now back to the University of Birmingham and the crime type I guessed would be high in this area is “bicycle theft” as it has been stated that students are more likely to be victims of this crime -link tweet

The above map does indicate bike theft around the University of Birmingham is high in comparison to other parts of Birmingham.
Although it is flawed in that it is the raw data on the map but I am using it to make my point :).

As the choropleth is useful to identify areas of high crime, it does not provide the ability to drill down into the individual crime locations. For this I used a great  Leaflet.js  plugin called PruneCluster that allows me to show a large dataset via a cluster of crime locations. I can show over 360,000 crime locations (via a cluster by looping through my aggregated data) for Birmingham in one call to HANA. The map on the left is the aggregated crime data shown in a choropleth map and the map on the right is the cluster of crime locations. The choropleth map controls the cluster map and is kept in sync with another leaflet plugin called leaflet.sync.

I have also chosen to use the site overpass turbo on my site to highlight certain aspects of the location related to a crime type. In my previous use of overpass turbo I highlighted the university of Oxford buildings/sites and linked this to the location of bike theft. This time I used a Leaflet.js plugin/extension leaflet-layerJSON to show various points of interest on a map where I have related them to the crime. E.g. for bike theft my map will show Bicycle parking locations, for shoplifting the location of shops and for vehicle crime then car parks are shown. Overpass turbo site has the ability to provide a lot of data and more information than I can show on my map. Therefore it only shows the points of interest in a single location and not an entire building for example. A policeman may make an appearance every now and again for all other crime types as this indicates a police station on the map.

So back to the University bike theft map with a focus on the crime locations and bicycle parking areas.

Comparing Bike theft to Cambridge which I know from my previous blog has a high rate of bike theft in the data.

First I noticed how many bicycle parking areas there are and how much Cambridge appears to be bike friendly and has a high level of people commuting by bike from googling the subject. So another reason, for a high rate of bike crime, that comes to my mind “more bikes more bike crime” and how to show that. That will be for another day….

So I have learned that I can control and pick a way to tell a story I want with my investigation into the crime data. It does come down to the way I chose to normalise the data for the choropleth maps. For the cluster map then from my point of view that is just the pure data and no control over this (that I know of) however the raw data does not provide the ability to normalise data and compare against other areas from a common baseline.

Thanks for reading and will finish with a data quality statement and credits for the maps.

Data Quality

I have checked the data in my map in various ways to ensure I get the correct results as per the base data from the Police site. From the various checks I have made I am sure I have got the calculations and selections correct. However as the HANA modelling is something I am teaching myself there may be some issues that I have yet to discover, the way I have created my HANA views has changed over the years to use Analytic and Calculation views. 
There is one known issue with special characters in the LAD name. e.g. “Bristol, city of” is available in the drop down and will not work until I fix an issue with the selection (and 3 other LADS with either , or ‘ in the name which fail).


Crime Data

As covered in my previous blog (link above) I use the crime data from The data is made available under theUK Open Government Licence.

Census Data

I use the official labour market statistics site for the census data which can be found at the following web address


All the tiles for my map are from OpenStreetMap

Choropleth Map Data

For the choropleth maps on my site then I used the Office for National Statistics to download the geographical reference data here at this link ONS Geo Portal

Contains Ordnance Survey data © Crown copyright and database right 2015
Contains National Statistics data © Crown copyright and database right 2015

Leaflet - Javascript library for maps

My entire site relies on Leaflet.js to bring all the contents together in one place on a map. Thank you Vladimir. Also thanks to the extensions I use for leaflet,  leaflet.syncleaflet prunecluster and leaflet-layer.json

OverpassTurbo I use the api from this stie to display Openstreetmap data on my map. License link for the API. So many options for extracting the data and the way I use it is to highlight certain features on a map to correspond to a chosen crime type.


The OpenStreetMap tiles are provided by Mapquest

No comments:

Post a Comment

Google +