Regionality & complexity

This post follows on in part from a post I wrote a couple of years ago on regionality. It will also begin with an apology: the maps presented here will be very difficult for colour blind readers to understand, for which I am sorry. Unfortunately, the technique involved is somewhat limited in terms of control of colour (as it requires three colour channels), so it is not possible (or at least very difficult) to improve the maps to make them more legible for colour blind readers. As such, I would not propose publishing these particular visualisations in any formal setting, but hopefully I can get away with it in a blog post!

Before we get to the maps themselves, I shall describe briefly the mapping technique involved, which is partly inspired by the work of a former colleague of mine at the University of Leicester, Martin Sterry (departmental webpage; academia.edu). Essentially, this method can be used to describe the relationship between three different spatial variables that can be mapped as density surfaces. First, we create density surfaces (KDE here) for each variable and then we combine them into an RGB image using the Composite Bands tool in ArcGIS, with the first layer forming the red channel, the second layer forming the green channel, and the third layer forming the blue channel. However, RGB images (so-called “additive colours”, which work from black by adding light in the red, green, and blue channels), can be rather dark / muddy, so I then converted the images (using “Invert” in Photoshop) to CMY images instead (so-called “subtractive colours” where one works from white by subtracting light in the cyan, magenta, and yellow channels: this is how colour printers work). To do so cleanly, one must set up one’s map document so that anything one wishes to be white in the final image is black in the map document and vice versa. The same applies to greys, which must be set to their inverse (e.g. a 30R 30G 30B grey as seen below for Wales / Scotland / Man should be set to 225R 225G 225B, being 255-30 in each case). This may sound somewhat complicated but the end result is as follows:

  • Cyan (turquoise) tones represent high values in Channel 1, e.g. “complex farmsteads” in the first example below.
  • Magenta tones represent high values in Channel 2, e.g. “enclosed farmsteads” in the first example below.
  • Yellow tones represent high values in Channel 3, e.g. “unenclosed farmsteads” in the first example below.
  • Blue tones represent high values in Channels 1 and 2.
  • Red tones represent high values in Channels 2 and 3.
  • Green tones represent high values in Channels 1 and 3.
  • Dark grey / black tones represent high values in all three Channels.
  • White or pale tones represent low values in all three Channels.

Here is a close up of the colour category zones for the first two examples below:

2 CMYK_RRSP

I began by examining the three main categories of Roman farmstead defined by the Roman Rural Settlement Project (RRSP) at Reading, using their excellent data that is available online (Allen et al. 2015). As they defined only three specific categories, this is an ideal dataset to map in this way. For a first attempt, I made three KDE layers using a 10km kernel (or search window) to structure the size of the clusters in the resulting output, then combined them as described above. When plotted against the regions defined based upon variation in their data by the RRSP team (Smith et al. 2016: Chapter 1), we can see that there is a degree of agreement between the regions and the clustering of particular colours:

1 RRSP_psychedelia_v3_inc_regions

However, there is also clearly considerably more complexity to the data than a simple regional classification might suggest (as the RRSP team would certainly acknowledge, so this is not intended as a criticism in any way). If we construct a new model using a wider kernel (in this case 50km), we can get a really nice sense of regional variation in the data without the need to draw lines on a map:

3 RRSP_psychedelia_v2

There is some interesting structure in this model. For example, one can see a focus on enclosed farmsteads in the north and west, so-called complex farmsteads in parts of the southern and eastern midlands (largely alongside enclosed farmsteads), with quite a different focus on enclosed and unenclosed farmsteads in the south east. The strong peak in enclosed farmsteads in south Yorkshire / the north midlands is also quite striking. Although it relies too much on good colour vision in a reader, I think this model and technique works quite well here, so I decided to apply it to another dataset: our own.

Before we get to the next stage, here is a close-up of the colour category zones for the next two maps (with RO = Roman; PR = Prehistoric; EM = early medieval):

5 CMYK_Englaid

Based on another technique which we published recently (Green et al. 2017), the following two maps are created from a measure of the “complexity” of our datasets: specifically the number of different types of site / monument (based upon our thesaurus of types; see Portal to the Past) per 1x1km square. This measure was calculated for each square for each time period in our database and then density surfaces created for each time period (using a 5km kernel in this instance). A shortcoming of the mapping technique comes into play here: it can only map three categories at once. As such, we had to combine the Bronze Age and Iron Age models into a composite model for later prehistory. The three time period based complexity models were then combined into a single image as previously:

4 complexity_psychedelia_global

There are various nice patterns in this dataset, including the clear strength of prehistory and the early medieval in the south western peninsula, the intense focus on major river valleys (partly due to the large gravel quarry excavations in those areas), and the appearance of Roman roads highlighted in magenta. The Roman period also looks quite dominant generally, with lots of pinks, blues, and reds visible on the map. There is also a very clear difference in intensity between eastern / southern England and northern / western England.

It is possible to lessen the effects of regional and period based variation, by constructing a series of larger kernel density surfaces and using these to “correct” for regional variation in the period based models. This produces a new model which reflects complexity on a more local scale. Essentially, the first model can be thought of as a model of “globally” scaled (by which I mean the whole of the dataset, not the whole of the planet) complexity and the new model can be thought of as a model of locally scaled complexity:

6 complexity_psychedelia_local

This model also shows some interesting patterns. It is much less dominated by single periods in particular regions, with Roman dominance mostly along the Roman roads and Hadrian’s Wall. There are also some nice dark areas, which show high levels of local complexity across all three time periods. These cluster mostly along rivers again or around the large Roman towns, along with a similar cluster in southern Yorkshire / the north Midlands to that seen in the RRSP data.

As with all models of English archaeology, the images presented here represent a very complex data history, being influenced by both where more (and more visible archaeologically) activity took place in the past and where more modern archaeological activity takes place in the present (largely driven by development). They also, as previously noted, come with considerable caveats in regards to legibility, due to the relatively large minority of people with restricted colour vision (c.8-10% of men, and maybe 1% of women). The technique is also restricted by its inability to map more than three variables, but more than three variables would probably overcomplicate matters even if it were possible. However, I hope that this post gives a sense of the variation and complexity in the English archaeological record, locally, regionally, and nationally.

EngLaId is now winding down, having officially ended at Christmas, so this will probably be the last substantive post on technique or data for a while. We will however announce here when any new publications come out, including our main books.

Chris Green

References:

Allen, M., T. Brindle, A. Smith, J.D. Richards, T. Evans, N. Holbrook, M. Fulford, N. Blick. 2015. The Rural Settlement of Roman Britain: an online resource. York: Archaeology Data Service. https://doi.org/10.5284/1030449

Green, C., C. Gosden, A. Cooper, T. Franconi, L. Ten Harkel, Z. Kamash & A. Lowerre. 2017. Understanding the spatial patterning of English archaeology: modelling mass data from England, 1500 BC to AD 1086. Archaeological Journal 174(1): p.244–280. http://www.tandfonline.com/doi/full/10.1080/00665983.2016.1230436

Smith, A., M. Allen, T. Brindle & M. Fulford. 2016. New Visions of the Countryside of Roman Britain. Volume 1: the Rural Settlement of Britain. Britannia Monograph Series No. 29. London: Society for the Promotion of Roman Studies.

Playlist of EngLaId talks, etc.

We have compiled a playlist of a few EngLaId talks that are available to watch on Youtube. There are a couple of videos of Chris Green giving talks on EngLaId topics and a couple of videos of Chris Gosden giving longer lectures on the project. There is also a video of Miranda Creswell drawing at Danebury and a short video of reactions to a talk given by Chris Gosden.

PAS paper

This is just a short post to announce the publication of our new paper on performing analysis of PAS data. It is open access and has been published by Internet Archaeology:

https://doi.org/10.11141/ia.45.1

Abstract:

This study tackles fundamental archaeological questions using large, complex digital datasets, building on recent discussions about how to deal with archaeology’s emerging ‘data deluge’ (Bevan 2015). At a broad level, it draws on the unprecedented volume of legacy data gathered from many different sources – almost one million records in total – for the English Landscape and Identities project (Oxford, UK). More specifically, the paper focuses in detail on artefact evidence – material derived primarily from surface surveys, stray finds and metal detecting. Novel computational models are developed that extend and connect ideas from usually distinct research realms (different arenas of artefact research, digital archaeology, etc.). Major interpretative issues are addressed including how to approach background factors that shape the archaeological record, and how to understand spatial and temporal patterning at various scales. Overall, we suggest, interpreting large complex datasets sparks different ways of working, and raises new theoretical concerns.

Effective communication and cartography

I have been thinking a lot recently about using maps as effective tools for visual communication of data. Chen et al. (2014) wrote that visualization of data should be about getting your message across in a time-efficient manner, which Kent (2005) stated depends upon producing aesthetically pleasing results. All maps (being one form of data visualization) are imperfect models of the world (as all models are imperfect) and we must take care to make sure that our maps communicate the messages we wish to express effectively.

Without wishing to get unduly political, I want to work through these ideas using the example of this summer’s “Brexit” vote. Data on the referendum results can be found here and data on UK boundary lines here. There are many (infinite?) different potential ways of visualising this data spatially, but I am going to explain the messages I see in a few examples here.

First up, we have a simple rendering of the results using the district divisions by which the data was originally counted and parcelled up, in which the saturation of the yellows (remain) and blues (leave) show the percentage lead each vote had in districts which each side “won”:

1_brexit_percent

Yellow and blue have been used as that seems to be the convention settled on by most of our media. This map shows which areas felt particularly strongly one way or the other about the question asked and works well in that regard. However, it also gives a somewhat misleading message, as some of the high value districts are of relatively low population density. As an alternative then, we can keep the same division into “leaver” and “remainer” districts, but instead use the shading to show population density:

2_brexit_density

This map loses the nuance of showing how strong the vote was in either direction, but gains something by showing which districts have more people living in them. Most notable is the stark difference between the districts in eastern England around The Wash, which are of low population density (for the UK!), but which felt very strongly that the UK should leave the EU.

We can also look at the result in much more stark terms. The recent High Court decision has increased the likelihood of their being a Parliamentary vote on invoking Article 50, so I wanted to see which way the various constiuencies fell in terms of “leave” or “remain”. This is not simple, however, as the results were reported using districts, which often do not match constiuencies. As such, I reapportioned the vote from districts between consituencies on the basis of spatial area (e.g. if a constiuency covered half a district, it would receive half the votes). This is imperfect, as population density is not uniform across any district, but was the best I could do with the data to hand. The results show that, if Parliament does get to vote on Article 50 and MPs vote as their constituents voted, then “Leave” will comfortably win (Northern Ireland has not been included, but does not have enough MPs to make a difference either way):

3_brexit_constituency

All of these maps work reasonably well at expressing one element of the data, but I wanted to come up with a visualization that produced a more complex picture of the results yet without abandoning geographic space (i.e. I did not want to use a cartogram):

4_brexit_hexes

This final map reworks the results into hexagonal spatial bins, using the same method as when I reworked the results into constiuencies (i.e. assignment by spatial area overlap). Here, the blue / yellow shading has returned to showing the strength of the result, but we can now also see data on population at the same time through the thickness / blackness of the lines around the hexagons. I feel that this map does a pretty good job of showing the distribution of the vote (spatially, strength-wise, and population-wise) whilst still allowing people to locate themselves reasonably well geographically (which would not be the case with a cartogram). Hexagons have been preferred over squares largely to their visual appeal and due to the fact that humans have a tendency to see false straight lines in data binned into square-based grids.

Whatever you think of the referendum result, I hope that my worked example has helped to explain how making a map is not always a simple task. Careful thought about audience, message, and data structure needs to go into any visualisation if effective communication is to be achieved. I hope that my final map succeeds in that task!

Chris Green

References

Chen, M., L. Floridi, and R. Borgo. 2014. What is visualization really for? The Philosophy of Information Quality. Springer Synthese Library Volume 358, 75-93

Kent, A.J. 2005. Aesthetics: a lost cause in cartographic theory? Cartographic Journal 42(2), 182-188

All maps contain Ordance Survey data (C) Crown copyright and database right 2016

New paper: modelling mass data

Our new paper has just gone up for online first access. It’s available here if you have access to the Archaeological Journal via a library / university:

http://www.tandfonline.com/doi/full/10.1080/00665983.2016.1230436

It’s about our experiments in modelling EngLaID datasets on a broad scale, with a mild focus on population density. It also covers some issues discussed here (Affordance; Pottery I, II).