Regionality & complexity

This post follows on in part from a post I wrote a couple of years ago on regionality. It will also begin with an apology: the maps presented here will be very difficult for colour blind readers to understand, for which I am sorry. Unfortunately, the technique involved is somewhat limited in terms of control of colour (as it requires three colour channels), so it is not possible (or at least very difficult) to improve the maps to make them more legible for colour blind readers. As such, I would not propose publishing these particular visualisations in any formal setting, but hopefully I can get away with it in a blog post!

Before we get to the maps themselves, I shall describe briefly the mapping technique involved, which is partly inspired by the work of a former colleague of mine at the University of Leicester, Martin Sterry (departmental webpage; academia.edu). Essentially, this method can be used to describe the relationship between three different spatial variables that can be mapped as density surfaces. First, we create density surfaces (KDE here) for each variable and then we combine them into an RGB image using the Composite Bands tool in ArcGIS, with the first layer forming the red channel, the second layer forming the green channel, and the third layer forming the blue channel. However, RGB images (so-called “additive colours”, which work from black by adding light in the red, green, and blue channels), can be rather dark / muddy, so I then converted the images (using “Invert” in Photoshop) to CMY images instead (so-called “subtractive colours” where one works from white by subtracting light in the cyan, magenta, and yellow channels: this is how colour printers work). To do so cleanly, one must set up one’s map document so that anything one wishes to be white in the final image is black in the map document and vice versa. The same applies to greys, which must be set to their inverse (e.g. a 30R 30G 30B grey as seen below for Wales / Scotland / Man should be set to 225R 225G 225B, being 255-30 in each case). This may sound somewhat complicated but the end result is as follows:

  • Cyan (turquoise) tones represent high values in Channel 1, e.g. “complex farmsteads” in the first example below.
  • Magenta tones represent high values in Channel 2, e.g. “enclosed farmsteads” in the first example below.
  • Yellow tones represent high values in Channel 3, e.g. “unenclosed farmsteads” in the first example below.
  • Blue tones represent high values in Channels 1 and 2.
  • Red tones represent high values in Channels 2 and 3.
  • Green tones represent high values in Channels 1 and 3.
  • Dark grey / black tones represent high values in all three Channels.
  • White or pale tones represent low values in all three Channels.

Here is a close up of the colour category zones for the first two examples below:

2 CMYK_RRSP

I began by examining the three main categories of Roman farmstead defined by the Roman Rural Settlement Project (RRSP) at Reading, using their excellent data that is available online (Allen et al. 2015). As they defined only three specific categories, this is an ideal dataset to map in this way. For a first attempt, I made three KDE layers using a 10km kernel (or search window) to structure the size of the clusters in the resulting output, then combined them as described above. When plotted against the regions defined based upon variation in their data by the RRSP team (Smith et al. 2016: Chapter 1), we can see that there is a degree of agreement between the regions and the clustering of particular colours:

1 RRSP_psychedelia_v3_inc_regions

However, there is also clearly considerably more complexity to the data than a simple regional classification might suggest (as the RRSP team would certainly acknowledge, so this is not intended as a criticism in any way). If we construct a new model using a wider kernel (in this case 50km), we can get a really nice sense of regional variation in the data without the need to draw lines on a map:

3 RRSP_psychedelia_v2

There is some interesting structure in this model. For example, one can see a focus on enclosed farmsteads in the north and west, so-called complex farmsteads in parts of the southern and eastern midlands (largely alongside enclosed farmsteads), with quite a different focus on enclosed and unenclosed farmsteads in the south east. The strong peak in enclosed farmsteads in south Yorkshire / the north midlands is also quite striking. Although it relies too much on good colour vision in a reader, I think this model and technique works quite well here, so I decided to apply it to another dataset: our own.

Before we get to the next stage, here is a close-up of the colour category zones for the next two maps (with RO = Roman; PR = Prehistoric; EM = early medieval):

5 CMYK_Englaid

Based on another technique which we published recently (Green et al. 2017), the following two maps are created from a measure of the “complexity” of our datasets: specifically the number of different types of site / monument (based upon our thesaurus of types; see Portal to the Past) per 1x1km square. This measure was calculated for each square for each time period in our database and then density surfaces created for each time period (using a 5km kernel in this instance). A shortcoming of the mapping technique comes into play here: it can only map three categories at once. As such, we had to combine the Bronze Age and Iron Age models into a composite model for later prehistory. The three time period based complexity models were then combined into a single image as previously:

4 complexity_psychedelia_global

There are various nice patterns in this dataset, including the clear strength of prehistory and the early medieval in the south western peninsula, the intense focus on major river valleys (partly due to the large gravel quarry excavations in those areas), and the appearance of Roman roads highlighted in magenta. The Roman period also looks quite dominant generally, with lots of pinks, blues, and reds visible on the map. There is also a very clear difference in intensity between eastern / southern England and northern / western England.

It is possible to lessen the effects of regional and period based variation, by constructing a series of larger kernel density surfaces and using these to “correct” for regional variation in the period based models. This produces a new model which reflects complexity on a more local scale. Essentially, the first model can be thought of as a model of “globally” scaled (by which I mean the whole of the dataset, not the whole of the planet) complexity and the new model can be thought of as a model of locally scaled complexity:

6 complexity_psychedelia_local

This model also shows some interesting patterns. It is much less dominated by single periods in particular regions, with Roman dominance mostly along the Roman roads and Hadrian’s Wall. There are also some nice dark areas, which show high levels of local complexity across all three time periods. These cluster mostly along rivers again or around the large Roman towns, along with a similar cluster in southern Yorkshire / the north Midlands to that seen in the RRSP data.

As with all models of English archaeology, the images presented here represent a very complex data history, being influenced by both where more (and more visible archaeologically) activity took place in the past and where more modern archaeological activity takes place in the present (largely driven by development). They also, as previously noted, come with considerable caveats in regards to legibility, due to the relatively large minority of people with restricted colour vision (c.8-10% of men, and maybe 1% of women). The technique is also restricted by its inability to map more than three variables, but more than three variables would probably overcomplicate matters even if it were possible. However, I hope that this post gives a sense of the variation and complexity in the English archaeological record, locally, regionally, and nationally.

EngLaId is now winding down, having officially ended at Christmas, so this will probably be the last substantive post on technique or data for a while. We will however announce here when any new publications come out, including our main books.

Chris Green

References:

Allen, M., T. Brindle, A. Smith, J.D. Richards, T. Evans, N. Holbrook, M. Fulford, N. Blick. 2015. The Rural Settlement of Roman Britain: an online resource. York: Archaeology Data Service. https://doi.org/10.5284/1030449

Green, C., C. Gosden, A. Cooper, T. Franconi, L. Ten Harkel, Z. Kamash & A. Lowerre. 2017. Understanding the spatial patterning of English archaeology: modelling mass data from England, 1500 BC to AD 1086. Archaeological Journal 174(1): p.244–280. http://www.tandfonline.com/doi/full/10.1080/00665983.2016.1230436

Smith, A., M. Allen, T. Brindle & M. Fulford. 2016. New Visions of the Countryside of Roman Britain. Volume 1: the Rural Settlement of Britain. Britannia Monograph Series No. 29. London: Society for the Promotion of Roman Studies.

Effective communication and cartography

I have been thinking a lot recently about using maps as effective tools for visual communication of data. Chen et al. (2014) wrote that visualization of data should be about getting your message across in a time-efficient manner, which Kent (2005) stated depends upon producing aesthetically pleasing results. All maps (being one form of data visualization) are imperfect models of the world (as all models are imperfect) and we must take care to make sure that our maps communicate the messages we wish to express effectively.

Without wishing to get unduly political, I want to work through these ideas using the example of this summer’s “Brexit” vote. Data on the referendum results can be found here and data on UK boundary lines here. There are many (infinite?) different potential ways of visualising this data spatially, but I am going to explain the messages I see in a few examples here.

First up, we have a simple rendering of the results using the district divisions by which the data was originally counted and parcelled up, in which the saturation of the yellows (remain) and blues (leave) show the percentage lead each vote had in districts which each side “won”:

1_brexit_percent

Yellow and blue have been used as that seems to be the convention settled on by most of our media. This map shows which areas felt particularly strongly one way or the other about the question asked and works well in that regard. However, it also gives a somewhat misleading message, as some of the high value districts are of relatively low population density. As an alternative then, we can keep the same division into “leaver” and “remainer” districts, but instead use the shading to show population density:

2_brexit_density

This map loses the nuance of showing how strong the vote was in either direction, but gains something by showing which districts have more people living in them. Most notable is the stark difference between the districts in eastern England around The Wash, which are of low population density (for the UK!), but which felt very strongly that the UK should leave the EU.

We can also look at the result in much more stark terms. The recent High Court decision has increased the likelihood of their being a Parliamentary vote on invoking Article 50, so I wanted to see which way the various constiuencies fell in terms of “leave” or “remain”. This is not simple, however, as the results were reported using districts, which often do not match constiuencies. As such, I reapportioned the vote from districts between consituencies on the basis of spatial area (e.g. if a constiuency covered half a district, it would receive half the votes). This is imperfect, as population density is not uniform across any district, but was the best I could do with the data to hand. The results show that, if Parliament does get to vote on Article 50 and MPs vote as their constituents voted, then “Leave” will comfortably win (Northern Ireland has not been included, but does not have enough MPs to make a difference either way):

3_brexit_constituency

All of these maps work reasonably well at expressing one element of the data, but I wanted to come up with a visualization that produced a more complex picture of the results yet without abandoning geographic space (i.e. I did not want to use a cartogram):

4_brexit_hexes

This final map reworks the results into hexagonal spatial bins, using the same method as when I reworked the results into constiuencies (i.e. assignment by spatial area overlap). Here, the blue / yellow shading has returned to showing the strength of the result, but we can now also see data on population at the same time through the thickness / blackness of the lines around the hexagons. I feel that this map does a pretty good job of showing the distribution of the vote (spatially, strength-wise, and population-wise) whilst still allowing people to locate themselves reasonably well geographically (which would not be the case with a cartogram). Hexagons have been preferred over squares largely to their visual appeal and due to the fact that humans have a tendency to see false straight lines in data binned into square-based grids.

Whatever you think of the referendum result, I hope that my worked example has helped to explain how making a map is not always a simple task. Careful thought about audience, message, and data structure needs to go into any visualisation if effective communication is to be achieved. I hope that my final map succeeds in that task!

Chris Green

References

Chen, M., L. Floridi, and R. Borgo. 2014. What is visualization really for? The Philosophy of Information Quality. Springer Synthese Library Volume 358, 75-93

Kent, A.J. 2005. Aesthetics: a lost cause in cartographic theory? Cartographic Journal 42(2), 182-188

All maps contain Ordance Survey data (C) Crown copyright and database right 2016

CAA 2014, Paris

I just returned from this year’s Computer Applications and Quantitative Methods (CAA) conference, which was held in Paris last week.  Overall, the conference was a great success, despite a number of teething troubles (particularly with IT support [ironically?]).

1622434_639379606132117_4733947181406672037_o

I spoke on the Friday morning about using Heisenberg’s Uncertainty Principle as a metaphor for good cartographic practice.  I’ll try to write more about that at a later time.

10256777_639379412798803_365403414691508152_o

One particularly impressive visualization of data that I saw was Lost Change, which maps PAS coins and their mint locations.  Another very interesting paper I heard was about MicroPasts (another British Museum backed venture), which is designed to allow archaeologists to access crowdsourced labour and crowdfunded funding.  I also enjoyed Philip Verhagen’s paper, as his project is encountering many of the same data rationalization issues as our own (and he only has to work with a single source database, rather than the 70+ that we are trying to combine).

10247238_639798242756920_1798436893331002653_n

There is a storify of the conference tweets here: http://storify.com/EngLaID_Oxford/caa-2014-paris

Next year’s CAA will be in Siena, Italy.  They know how to pick places with good food and good cheer!

Chris Green

Extracting trends (VIII)

This is yet another short post about trend surfaces, following on from previous (I)(II)(III)(IV)(V)(VI)(VII), but with a new dataset.  After this, I think I have probably exhausted the possibilities for getting information out of our data using trend surface modelling, which is best thought of as an initial exploratory technique in any event.

This time, I have been looking at spatial trends present in English Heritage’s Excavation Index, which has been kindly supplied to us by Tim Evans at the ADS, who recently wrote an excellent journal article on the potential of the Index as a research tool.  This is a record of excavations and investigations that have taken place in England since around the mid nineteenth century.  I do not think that it pretends in any way to be comprehensive, but it is another way of filling in gaps in our data, especially for archaeological work that took place before 1990.

In any event, here are the trend surfaces that I have created based upon the Excavation Index (to different scales [the values being records per sq.km], but the broad picture is the important thing):

1 eh_excind_trend all
12th power linear trend surface for all data in the Excavation Index.
2 eh_excind_trend englaid
12th power linear trend surface for EngLaID period data in the Excavation Index.
3 eh_excind_trend PR
12th power linear trend surface for unspecified prehistoric data in the Excavation Index.
4 eh_excind_trend BA
12th power linear trend surface for Bronze Age data in the Excavation Index.
5 eh_excind_trend IA
12th power linear trend surface for Iron Age data in the Excavation Index.
6 eh_excind_trend RO
12th power linear trend surface for Roman data in the Excavation Index.
7 eh_excind_trend EM
12th power linear trend surface for early medieval data in the Excavation Index.

So, what can we see from looking at these maps?  Overall, the Index shows greatest density of work in the south, particularly around Bristol, London and Kent.  For the EngLaID period as a whole, the pattern is similar, but with the area around Dorset becoming more important.  The unspecified prehistoric is biased towards London and Kent, but there are too few of these records to say that this is particularly meaningful.  The Bronze Age stands out as very distinct from all other periods, with clear peaks in Wessex, eastern Yorkshire and the Peak District: my assumption is that this represents particular research projects undertaken by EH.  The Iron Age shows peaks north of London and stretching down to Kent and towards Wessex.  The Roman trend is similar to the overall pattern for all periods, which is not surprising due to the high numbers of Roman records in the database.  The early medieval peaks around Hampshire, Kent and London, with greater emphasis also on East Anglia than the other periods.

Overall, most of these trends are fairly similar to those seen with previous datasets, at least when considered on a broad brush basis.  The major exception is for the Bronze Age, where the high trend surface peaks previously seen in south west England are no longer as dramatic.  London is also standing out more strongly in the Index than it had in most previous datasets, I think (although this is less pertinent when comparing with the NRHE, as we did not receive NRHE data for London).

Chris Green

Extracting trends (VII)

Further to my previous post, I have now had another go at constructing trend surfaces for the four broad main periods covered by this project.  This time, however, I have filtered out records that are explicitly related only to artefact findspots (for each period).  This was in an attempt to downplay the influence in the previous trends from differential inclusion of PAS material between HERs.  The remaining records should, hopefully, thus primarily relate to sites with other archaeological evidence beyond just one or more artefacts.

Here are the results (to the same attribute scale as previous):

Trend surface for Bronze Age HER data, exc. findspots
Trend surface for Bronze Age HER data, exc. findspots
Trend surface for Iron Age HER data, exc. findspots
Trend surface for Iron Age HER data, exc. findspots
Trend surface for Roman HER data, exc. findspots
Trend surface for Roman HER data, exc. findspots
Trend surface for early medieval HER data, exc. findspots
Trend surface for early medieval HER data, exc. findspots

Comparing to the previous surfaces, we can see a general reduction in trend peaks, especially over Norfolk and Yorkshire.  The Bronze Age remains similar to previous; the Iron Age also, albeit with much lower peaks; the Roman period shows an increasing strength across Gloucestershire; the early medieval shows the most distinct reductions in eastern regions.

Chris Green

Extracting trends (VI) and national synthesis update

This post follows on from my previous posts on trend surface modelling (I)(II)(III)(IV)(V) and my posts on synthesis of multiple datasets using grid squares (I)(II)(III)(IV).

As our HER dataset is now nearly complete (only Merseyside is expected from now on; North Somerset and Bath & North East Somerset are unable to provide data), we are finally able to begin attempting to study the data which we have gathered on a nationwide scale.  Broad period classifications (Prehistoric; Bronze Age; Iron Age; Roman; early medieval; uncertain; “bad date” [i.e. outside our period]) were calculated for the HER data using a script (based upon the multitude of period designations applied by individual HERs or upon start / end dates) and the data was converted to shapefile format and merged into a single point layer.  This shapefile layer can then be very coarsely queried to produce distributions of records of different periods.

As an initial method for investigating this mass of data (around 400,000 records), I experimented with the production of a few trend surfaces.  First, one for all of the data received:

Trend surface for all EngLaId HER data
Trend surface for all EngLaId HER data

I think that there are two major factors at play in this trend.  The first is the general bias in English archaeology towards greater density of (probably) settlement and (certainly) fieldwork in the south and east of the country.  The second (possibly more dominant?) is the variation in recording methods used across the country.  Even where the same software is used, different HERs catalogue their data somewhat differently: some like to split everything up into individual periods and types, others like to collate into multi-period sites; some cast their nets wide to include as much data as possible (e.g. PAS data, MORPH data), others like to only include sites of certain and clear provenance.  This means that the density of data across the country is as much about modern practice as it is about activity in the ancient past.

We can then produce similar surfaces for our broad periods (all to the same numerical scale):

Trend surface for Bronze Age HER data
Trend surface for Bronze Age HER data
Trend surface for Iron Age HER data
Trend surface for Iron Age HER data
Trend surface for Roman HER data
Trend surface for Roman HER data
Trend surface for early medieval HER data
Trend surface for early medieval HER data

These four surfaces still reflect to some extent the differences seen in modern practice, but they are closer to the genuine distribution of past activity.  The Bronze Age surface seems to be biased towards uplands and towards Wessex.  The Iron Age surface has a clear bias towards the south east.  The Roman surface is biased towards lowland Britain but also towards the pockets of military activity in the north of England.  The early medieval surface is biased towards the eastern parts of England.

However, the distributions behind all of these trends are still heavily influenced by modern archaeological and CRM practices.  This is only going to get worse when we begin to produce duplication in our dataset by building in English Heritage NRHE data and other datasets.  As discussed in previous posts, one way in which to minimise these modern effects and reduce the influence of duplication is to collate data by 1 by 1 km grid cells.  This requires the application of a thesaurus containing simplified monument terms and the step already undertaken of assigning standardised period terms.  The result is a tessellation of 1 x 1 km grid squares across England recording the presence of different types of archaeological site for each of our broad periods, which we can then query and use to produce maps.

As an example, I constructed a few more trend surfaces, based upon the presence / absence of evidence for sites within our broad “domestic and civil” category.  This category includes: town, burh, civitas capital, colonia, hamlet, village, vicus, canabae legionis, oppidum, hillfort, anything defined using the word “settlement”, midden, timber platform (several of these sub-types belong to more than one broad category).  We can then look at how the underlying trends behind this category changed over time (these trend surfaces are logistic rather than linear, reflecting the probability of binary presence / absence relationship rather than density):

Trend surface (logistic) for synthesised data: Bronze Age "domestic and civil" category.
Trend surface (logistic) for synthesised data: Bronze Age “domestic and civil” category.
Trend surface (logistic) for synthesised data: Iron Age "domestic and civil" category.
Trend surface (logistic) for synthesised data: Iron Age “domestic and civil” category.
Trend surface (logistic) for synthesised data: Roman "domestic and civil" category.
Trend surface (logistic) for synthesised data: Roman “domestic and civil” category.
Trend surface (logistic) for synthesised data: early medieval "domestic and civil" category.
Trend surface (logistic) for synthesised data: early medieval “domestic and civil” category.

There is still some bias in these trend surfaces from the amount of data recorded by different modern archaeological entities (e.g. Northamptonshire is a very “completist” HER, which partially accounts for it showing up so strongly in many of the trend surfaces seen in this blog post), but the patterns are still quite interesting.  The Bronze Age is heavily influenced by the very high number of records present on Dartmoor and Bodmin Moor.  The Iron Age is probably mostly interesting for the low probability area across the “waist” of England from Cheshire to Lincolnshire.  The Roman is pretty much how I would expect it: high likelihood in the lowland zone and around Hadrian’s Wall (this includes “native” sites [whatever that means!] of Roman period date).  The early medieval is fairly flat, showing settlement across the country with greatest probability in central and eastern England (the peaks in Devon possibly need further investigation).

All of this is just a very preliminary, very coarse analysis of what is a very large and detailed set of data.  Some interesting patterns are beginning to emerge, but these may diminish as we continue to work on our material.

Chris Green

Extracting trends (III)

Following again on from my previous two posts (1)(2), I have been experimenting further with constructing trend surfaces, this time for specific sub-sets of my downloaded AIP data for evaluations and post-determination / research results from 1990 to 2010.

First, I removed all of the data for investigations that had no results in terms of dated features, which results in a very similar trend surface to that for all of the data including investigations with no substantive positive evidence:

1 AIP_trend_noNegEvid
12th power trend surface for AIP data (excluding investigations with no positive results)

Then I constructed trend surfaces for the same data but filtered down to investigations producing results for each of EngLaId’s four main broad time periods:

2 AIP_trend_BA
12th power trend surface for AIP data (Bronze Age)
3 AIP_trend_IA
12th power trend surface for AIP data (Iron Age)
4 AIP_trend_RO
12th power trend surface for AIP data (Roman)
5 AIP_trend_EM
12th power trend surface for AIP data (early medieval)

These results all look quite interesting to me, especially as they all vary quite significantly from the overall trend for all periods (albeit this is less the case for the Roman data).  The Bronze Age data shows a very clear bias towards an arc across south-eastern England from Dorset through to Kent and up into parts of East Anglia (the dry bits essentially), with the exception of the South Downs and the Weald.  The Iron Age is very strongly biased towards the counties north of London up to Cambridgeshire, across to north-east Kent and along the south coast.  There is also more of a northern trend than in the Bronze Age, with quite a significant peak in East Yorkshire.  The Roman data is distinctly biased towards London, Kent, the south coast, East Yorkshire and the Severn estuary region.  There is a surprising lack of any significant peak in the Tyneside area, considering the significant peak there in the data for all periods and the presence of Hadrian’s Wall.  For the early medieval, there is a very clear bias towards eastern England around the Fens and towards Kent.

I particularly like these results as they largely differ so significantly from the overall trend for all periods, which suggests that these patterns are more likely to be due to genuine distributions of underlying archaeological data, not just due to patterns of modern fieldwork (albeit this will still remain a very significant factor).  I am not sure any of the results are particularly surprising, interpretively, but they do confirm for me that we can extract spatial patterning from AIP data that is not just wholly biased towards areas of significant modern development.

Chris Green