OS terrain models

This week, in an attempt to avoid any substantive work, I have been playing around with the Ordnance Survey’s Digital Terrain Models (DTM) that are available for free as part of their OpenData archive to anybody who wishes to use them.  The spur for this was the launch in July of a new DTM onto the OpenData site.

Previously (and still today), the OS made available a dataset known as PANORAMA.  This was created using contour data surveyed in the 1970s.  In order to turn this into a rasterised DTM, some interpolation algorithm (I don’t know which) was used to estimate elevation values between contours to result in a continuous field (50m by 50m pixels) of elevation values for all of the UK.  The heights in PANORAMA are recorded as integers, i.e. to the nearest whole metre.

In July, the OS released a new product, known as Terrain 50.  This DTM was created using LiDAR data surveyed from the air and then averaged out to 50m by 50m grid cells.  A lot of data processing goes into turning raw LiDAR data into a terrain model, but this all takes place behind the scenes, so it is difficult to know exactly what has been done.  The heights in Terrain 50 are recorded as floating point numbers, so apparently convey more precision than PANORAMA.  However, due to the relatively coarse nature of the grid used (50m by 50m pixels), this does carry a degree of spurious accuracy (as we are inevitably dealing with averages).

This map shows both products for comparison (click to enlarge):

humber_dtm_compare
Comparison of PANORAMA and Terrain 50 DTMs for an area of East Yorkshire / Humber.

Certain things stand out when you compare these images, but more obviously when you look at the hillshade (click to enlarge):

humber_hillshade_compare
Comparison (hillshade) of PANORAMA and Terrain 50 DTMs for an area of East Yorkshire / Humber.

The main things to note are:

  • The contour origin and whole number data model of PANORAMA produces a stepped plateau appearance, being especially apparent in areas of gradual change in elevation.
  • PANORAMA produces a substantially smoother picture of change in elevation over space.
  • Terrain 50 appears much more accurate, but also “noisy”.
  • Human impacts on the landscape (e.g. quarrying) show up much more obviously in Terrain 50.

On the face of it, Terrain 50 looks a much more accurate representation of the terrain of the UK and, as such, would likely be most peoples’ first choice when choosing between these two DTMs.

As I have so far been working with the PANORAMA DTM, I wanted to test how different it was from Terrain 50 in order to see if I should go back and rerun some of my analyses with the newer product.  The simplest way to do this is to compare the elevation values recorded in each product for the same piece of terrain, i.e. subtract one grid from the other in the Raster Calculator in ArcGIS and then calculate some basis statistics on the result.

However, this is complicated somewhat by the fact that the two grids are not aligned directly on top of each other: the origin of a pixel in one is in the middle of a pixel in the other, i.e. they are offset by 25m east / west and 25m north / south.  To enable a direct comparison to be made, I reprocessed the PANORAMA DTM to split each cell into four and then aggregated sets of four cells (using the mean) on the same alignment as Terrain 50.  This will have resulted in some smoothing of the resulting surface, I expect, but hopefully not to the extent of making the comparison invalid (as PANORAMA already possessed a relatively smooth surface).

The results can be seen on this map (click to enlarge):

dem_diff_rb
Difference between PANORAMA and Terrain 50 cells.

White cells show little difference.  Yellow cells are slightly higher elevation in Terrain 50 and red cells are significantly higher.  Cyan cells are slightly higher elevation in PANORAMA and blue cells are significantly higher.  Certain things stand out on this map:

  • Differences between the two DTMs are greatest in upland areas.  This will at least partly be due to the need to draw contours legibly forcing cartographers to underplay the steepness of very steep slopes.
  • The sea tiles are quite interesting in the way they vary.  This seems to be due to PANORAMA using a single value for sea cells across the whole dataset, whereas Terrain 50 seems to use a single value for sea cells on each 10km by 10km tile, but different values between tiles.
  • We can also see some differences being much greater on one side or other of the division between tiles aligning with 1000m divisions on the OS grid.  This must be due to Terrain 50 data being processed on a tile by tile basis, more on which later.

Overall, however, the differences between the two DTMs are not great.  If we remove the negative sign from the difference layer (by squaring, then square rooting the result) and clip out sea cells, we can plot a histogram of the difference in elevation (across all 92 million cells):

dem_diff_hist_aligned
Histogram of elevation difference between PANORAMA and Terrain 50.

From this graph, we can see that although there are cells with differences of up to nearly 230m, the vast majority of cells are within 5m of elevation of their counterpart.  The mean difference is 1.91m and the standard deviation 2.26m; 75% of all values are within 2.5m of their counterpart.  As such, PANORAMA and Terrain 50 are actually very similar in elevations recorded.

We can also plot this difference layer on a map, with some interesting results:

dem_diff_somerset
Difference in elevation between PANORAMA and Terrain 50 for an area of Somerset (black = no difference, white = high difference).

Black cells on this map show no difference or minimal difference, shading up through grey to white for cells of relatively high difference in elevation between the two DTMs.  Certain features stand out, some of which I have annotated onto this map:

dem_diff_somerset_annotated
Difference in elevation between PANORAMA and Terrain 50 for an area of Somerset (black = no difference, white = high difference). Features annotated.

The motorway is clearly a feature that appears in Terrain 50 but not PANORAMA.  The contour lines are clearly an artifact of the origins of PANORAMA.  The reservoir is presumably a similar issue to the sea level variation.  The variation on the Mendips is presumably due to the “noisier” more precise nature of Terrain 50 contrasting against the smoothed appearance of PANORAMA.

The appearance of the grid lines worries me somewhat though.  They were not apparent (to my eye) when looking at the raw data or hillshade layers for either dataset, so presumably they are the result of quite a subtle effect.  My assumption (as mentioned above) is that these arise from the LiDAR data behind Terrain 50 being processed as a series of tiles rather than as a single dataset: this is of course inevitable as a continuous high resolution LiDAR dataset for all of the UK would be mind bogglingly immense.  My fear is that any sensitive analyses of terrain using Terrain 50 might show up these grid edges in their results.  However, this is even more true of the 1m contour “cliff edges” that appear in PANORAMA.  At least grid lines will be obvious to the human eye if they do cause strange effects.

So, what does this all mean?  Well, I would argue that the generally minimal difference between elevations recorded for the same place in the two datasets means that previous analyses (especially coarse analyses) undertaken using PANORAMA should not be considered invalidated by the (presumably) more accurate new Terrain 50 DTM.  Also, the “noisy” nature of Terrain 50 and the presence therein of more features of human origin might mean that the smoother PANORAMA could still be the best choice of DTM for certain applications (especially in archaeology, where features like the M5 would not generally be a useful inclusion).

Chris Green

Geo-spatial visualization

I recently attended a Workshop on Challenges in Geo-spatial Visualization run by the OeRC at Pembroke College, here in Oxford.  The workshop was organised by Prof Min Chen and his colleagues in order to consider challenging problems in the visual analytics of spatial data and to discuss potential solutions.

Jason Dykes and Jo Wood of the giCentre at City University London presented particularly interesting ideas and visualizations, based around cartograms and visualising spatial ‘flows’.  They also emphasised the critical element of visual salience: this is the concept that (spatially) large objects tend to dominate on a map, whereas (interpretatively) important objects ought to be what our attention is drawn to.

I was also particularly taken with the ideas Simon Walton (of the OeRC) in regard to the importance of spatial frequency to visual perception (e.g. if we look at a Google Earth image of a city from space, we arguably tend to think that we distinguish between city and countryside based upon colour [i.e. greens vs greys], but we are in fact more influenced in this regard by the complexity of what we are seeing, with countryside being quite plain and cities complex).

Overall, the workshop was very engaging and challenged my thinking on how I might approach the spatial analysis of EngLaId’s datasets.  In particular, I think I am rather too wedded to the conventional map and, as such, have been experimenting with some alternative visualizations since the workshop.

One idea raised in discussion by Jo Wood was that of making graphs where one axis represents space (in some way) and the other an attribute associated with data located within that space.  It occurred to me that one common concept seen in much archaeological interpretation on the scale of England / Britain was that of difference between the lowland zone of southeastern England and the highlands of the west and north.  Conceptually, we can thus think of this as a trend from south east to north west.

In order to organise our data in such a way as to make it possible to graph data along this axis, I first defined an (arbitrary) point off the south east of England and then created a Euclidean distance raster radiating out from this point:

euclid
Euclidean distance raster from point marked by X.

I then generalised this into 10km width bands and joined the results to the vector grid tessellation that I am using to analyse data on the scale of England:

bands
10km distance bands from point marked by X.

It is then possible to use this banding to plot other attributes recorded in the grid square layer as a graph, such as mean elevation or terrain ruggedness (TRI).  As our datasets are not yet quite complete, I do not currently have the ability to query these down to subsets based upon archaeological site type / period.  Therefore, I experimented with creating some graphs based upon the entire dataset, thus showing patterns along this SE-NW axis for England as a whole.

crazy graphs 1 - elev
Graph of mean elevation of grid squares: x-axis = distance band; y-axis = mean elevation. Points are individual data; heat map shows clustering. Deformed England below x-axis to show approximate spatial element.
crazy graphs 2 - TRI
Graph of mean TRI of grid squares: x-axis = distance band; y-axis = mean TRI. Points are individual data; heat map shows clustering. Deformed England below x-axis to show approximate spatial element.

These two graphs (created in Veusz from a .csv table exported from ArcGIS) are constructed so that the distance bands run from left (SE) to right (NW), with the mean elevation / TRI being shown on the y-axis (with the TRI, the higher the number, the more rugged the terrain).  The dots show individual records and the ‘heat map’ behind shows the frequency /clustering of those dots.  The deformed England map below each graph is intended to show an approximation of where these bands fall spatially, although obviously this is an imperfect relationship.  These graphs both show how the English landscape becomes more elevated / rugged at its extremes as you head north or west from the south east, albeit with its main clustering remaining at fairly low elevations and at fairly low degrees of ruggedness.

crazy graphs 3 - obsc
Graph of percentage obscuration of ground surface of grid squares: x-axis = distance band; y-axis = percentage. The red colouring is for “human” factors; the grey colouring superimposed includes soil / geological obscuration in addition.

This final graph shows the frequency / clustering of the percentage of grid cells in each band that are obscured from the air for the purposes of aerial photography.  The red shading shows ‘human’ factors only (see previous post), with the greyscale shading also including geological / soil type factors (see this post).  This graph is a little harder to read, so probably requires more thought.

If we compare these three graphs, we can see that the areas of the country most obscured by human activity (which in this instance includes woodland and lakes) cluster in the same bands as the areas of England which are predominantly of low elevation / ruggedness.  This suggests that there is a relationship between landscape morphology and human activity (as we would expect), with humans tending to prefer to settle in areas which are arguably easier to live in (i.e. lower, flatter terrain).

This is all very experimental at the moment and the conclusions reached are not yet particularly relevant to archaeological study, but it does prove that there is potential for a methodology such as this to elucidate patterns in our data.  Once we are able to query down this grid square dataset to only include cells with particular types of archaeological feature in them, we will be able to create many different graphs such as these and, as such, attempt to quantify the difference / similarity in the distributions of different archaeological features, based upon several attributes (i.e. elevation, TRI, ground obscuration).

Clearly, the banding chosen for this experiment reflects a particular concept of how distributions might vary across England, albeit one that is very common in archaeological interpretation (e.g. the three zones seen by Roberts & Wrathmell [2000 / 2002] and by Jeremy Taylor [2007] in their respective works): it is thus desirable to test different axes across the country to see whether different patterns might emerge.  It would also be possible to do something similar for bands created around all instances of a particular type of site, although this might be argued to be a little too processualist perhaps…

In conclusion, I do think the methodology outlined has potential for studying patterns in our data, but it will require a lot more thought and experimentation to be certain.

Chris Green

References:

Roberts, B. and S. Wrathmell. 2000. An Atlas of Rural Settlement in England. London: English Heritage.

Roberts, B. and S. Wrathmell. 2002. Region and Place. A Study of English Rural Settlement. London: English Heritage.

Taylor, J. 2007. Atlas of Roman Rural Settlement. London: English Heritage.

Aerial photography and ground obscuration (part 3)

I have previously written twice about factors affecting the possibility of discovering archaeological features using aerial photography (1) (2).  The original version of this took into account human factors (see post 1), based upon OS OpenData, specifically buildings, roads, railways, and woodlands / rivers / lakes (I count woodland as a human factor, as many of them are plantations and others have been left / managed by humans, rivers / lakes are obviously less human influenced).  Here is a reworked example, showing the percentage of the ground surface obscured by such factors:

obscuration_human
Ground obscuration taking into account woodland and human structures. Relevant to aerial photography generally.

To add to this previous work, a few months back at our Project Advisory Board meeting, Jeremy Taylor of the University of Leicester told us about a paper which classified the different soil types of England and Wales according to their prospects for showing buried features (whether geological or archaeological) as crop marks (Evans 1990).  This paper grouped the different soil types defined by the 1983 Soil Survey of England and Wales into five categories:

  • Soils that show extensive crop marks;
  • Soils that show extensive crop marks in dry conditions;
  • Soils that show frequent crop marks over small areas;
  • Soils that rarely show crop marks;
  • Soils that never show crop marks.

Usefully, the 1983 soil survey maps have been digitised and are made available via the National Soil Resources Institute (NSRI) at Cranfield University.  As a bona fide researcher, this is available on payment of a processing fee (i.e. without paying royalties), so we obtained this data to try to create a map of Evans 1990 classification.  On receipt, I found that there were five soil types (924a, 924b, 952, 961, 962) in the NSRI data that did not show up in the Evans 1990 classification.  However, these were all types of industrial spoil heap or reconstructed ground surface, so I assumed that they would fall within the category of soils that never show crop marks.  As a result, I was able to reclassify the NSRI soils data into Evans’ five types:

soils_2_small
Soil types of England and Wales graded according to possibility of showing crop marks. After Evans 1990.

In respect of ground obscuration, I decided to only take the “never” category as being a factor.  I therefore combined this with the “human” factors and the peat / alluvial sub-soils from the British Geological Survey data (see post 2) to create a new map of ground obscuration:

obscuration_all
Ground obscuration taking into account soil type, some sub-surface geology (peat / alluvium), woodland, and human structures. Relevant to crop marks.

This latest map is only really relevant to crop marks, as presumably earthworks would still show even in the more unsuitable soils (under the right lighting / weather conditions).  We can see, however, that large parts of England are not at all suitable for aerial prospection for crop marks.  I am not sure that the large peaty areas seen in the east would necessarily mask all crop marks, but then all models are imperfect.  It might be useful to include the “rare” crop mark soils in this map too, perhaps with a lower weighting to reflect their partial effect, but I decided to work for now on a stricter basis.

The point of all this is to assess the effect of ground obscuration on the patterning of archaeological discoveries.  As an example, we can compare the distribution of NMP projects against this map of ground obscuration:

NMP_vs_obscuration
NMP projects as of 2012 compared against ground obscuration (re. crop marks).

This shows a reasonably good correlation between areas in which NMP projects have been (or are being) undertaken and areas suitable for aerial prospection (for crop mark features).  Where the NMP projects cover areas unsuitable for crop mark discovery, these tend to be upland areas where thin soils ought to mean earthworks are fairly prominent (I think).  It also shows that although the NMP covers around 50% of England, much of the remaining 50% is not necessarily suitable for NMP type projects (as they are reliant on aerial photography), with the particular exception of the rural West Midlands and parts of East Anglia.  However, many of these less suitable areas might be suitable for LiDAR survey.

Further down the line, I will be comparing distributions of archaeological sites against these obscuration maps to see if I can discover any intrinsic biases towards areas suitable for aerial prospection within the data.

Chris Green

References:

Evans, R.  1990.  “Crop patterns recorded on aerial photographs of England and Wales: their type, extent and agricultural implications.”  Journal of Agricultural Science, Cambridge 115, 369-382.

More on landscape morphology and synthesis of multiple datasets

I have been thinking some more about measures of landscape morphology and how we might bring together our multiple datasets into a single synthesis.

During a small symposium we held last week on the subject of “scale”, I spoke briefly to Andrew Lowerre of English Heritage about my previous post on landscape morphology.  He pointed me towards some of the more complex measures of landscape “bumpiness” than simply using average slope.  Investigating further, I came across an apparent description of Riley’s Terrain Ruggedness Index (or TRI), which seemed like a good, relatively simple measure.  This is fairly simple to calculate in ArcGIS if you were to follow the instructions taken from the webpage just linked.  For England, using the OS OpenData 50m DEM, the result should look something like this:

Riley's_TRI
“Riley’s” Terrain Ruggedness Index by 50m grid square

If you compare the values on this TRI map (between 0 and 18.9 metres) against the classification given on the webpage linked above, you would end up classifying all of England as “level”, which seems fairly ridiculous, but this is down to a few obvious factors.  First, a 50m cell size is quite small (Riley et al. used 1000 by 1000 m cells), so changes in terrain are going to be relatively small as well: changes in elevation of greater than 900 vertical metres across 50 horizontal metres are presumably only seen in landscapes like the Alps or Yosemite.  Second, the DEM was interpolated by the OS from their contour dataset, which inevitably will have resulted in some smoothing of landscape features.  Third (and most importantly), I have investigated further since and discovered that the maths / process are not actually correct for Riley’s TRI: when I found the original article of Riley et al. (1999), they are in fact calculating their TRI differently (theirs being the sum of the absolute [i.e. removing any negative sign] difference between a cell and all of its eight neighbours, whereas this result is the square root of the difference between the lowest and highest neighbouring cell value).  This is a good lesson to learn (that I should have learnt already) in not trusting random websites found as the result of a Google search!

However, visual examination suggests the result is somewhat robust on its own terms (when compared against the results from the implementation of TRI in GDAL [which looks somewhat similar on the map, but has a range of 0 to 155m], and which uses the mean difference between a central pixel and its surrounding cells based upon Wilson et al. 2007, i.e. Riley’s sum but divided by 8 to produce the mean).  In other words, even if the numbers are wrong, the relationship between the value in two cells should still be approximately correct, e.g. a cell with a TRI of “8m” ought to be more rugged than a cell with a TRI of “4m” even if the value of their TRIs ought to be much higher.  I am not at all satisfied with this long term, but it will suffice for the purposes of the rest of this exercise.  The Wilson et al. / GDAL implementation of TRI seems the most sensible to use in the long run, probably:

GDAL_TRI
GDAL Terrain Ruggedness Index by 50m grid square

I have also been experimenting more with our synthesis methodology.  This is based around the idea of applying a thesaurus to each of our input datasets to simplify their terminologies and then collating results by grid square.  We have been working at length on defining a thesaurus that strikes the right balance between simplicity and complexity, so that it is interpretively useful but also computationally feasible.  Our current thesaurus, based in part on groupings in the EH NMR thesaurus, is as follows:

01 – Agriculture and subsistence
A –     Coaxial field system
B –     Linear field system
C –     Aggregate field system
D –    Strip field
E –     Unspecified field system
F –     Linear earthwork
G –     Pit alignment
H –    Waterhole
I –      Corn drying oven
J –      Granary
K –     Cairn
L –     Fishpond
02 – Religious, Ritual and Funerary
A –     Inhumation burial
B –     Cremation burial
C –     Inhumation cemetery
D –    Cremation cemetery
E –     Barrow
F –     Cairn
G –     Temple
H –    Shrine / sanctuary
I –      Church
J –      Abbey / Monastery / Minster
K –     Standing stone
L –     Stone circle / cove
03 – Domestic and Civil
A –     Town / Small town
B –     Burh
C –     Civitas Capital / Colonia
D –    Hamlet / Village
E –     Vicus
F –     Canabae Legionis
G –     Oppidum
H –    Hillfort
I –      Unenclosed settlement
J –      Enclosed settlement
K –     Linear settlement
L –     Palisaded settlement
M –    Riverside settlement
N –    Dispersed settlement
O –    Nucleated settlement
P –     Road-side settlement
04 – Domestic architectural forms
A –     Villa
B –     Mansio
C –     Roundhouse
D –    Longhouse
E –     Farmstead
F –     Ringwork
G –     D-shaped enclosure
H –    Sub-rectangular enclosure
I –      Banjo enclosure
J –      Aisled building
K –     Other rectilinear building
L –     Burnt mound
M –    Grubenhaus
05 – Industrial
A –     Metal working site
B –     Bronze working site
C –     Iron working site
D –    Mineral extraction site
E –     Quarry
F –     Pottery manufacturing site
G –     Tile works
H –    Lime kiln
I –      Salt production site
J –      Mint
06 – Communication and Transport
A –     Road
B –     Trackway
C –     Hollow Way / Ridgeway
D –    Drove road
E –     Quay / Jetty / Harbour
F –     Bridge
G –     Canal
H –    Aqueduct
I –      Causeway
07 – Defensive
A –     Hillfort
B –     Fort (castellum)
C –     Fortress (castrum)
D –    Fortlet
E –     Burh
F –     Ringwork
G –     Dyke
08 – Other
A –     Mound
B –     Ditch
C –     Pit
D –    Find
E –     Hoard
F –     Metalwork deposit
G –     Watercraft

This is not a fixed list and we will undoubtedly continue to refine it over the coming months.  The result of applying the thesaurus to a dataset is a new field containing a string of values based on the list above.  The values are codes defining a particular period / type of site (sites can have multiple periods / types), e.g. RO04A would be a Roman villa site or IA07A would be an Iron Age hillfort.  When we then apply a tessellation of grid squares across our input datasets, we can collate results using these codes to remove duplication, resulting in a map showing the presence of each type of site by grid square across England.

I have also been looking at different tessellations of grid squares.  Until now, I had been using squares of 2 by 2 km dimension, as that seemed a sensible resolution for looking at patterns on the scale of England, but when applying this methodology in my latest tests, I have come to the decision that the resolution is too coarse.  This is particularly compounded by the issue that when a data point falls on the intersection between four grid squares, it is registered as falling within all four squares, with the result that some sites have undue prominence in the synthesised dataset.  I tried to find a way around this by using two overlapping tessellations with the origin of each cell in set B being at the central point of each cell in set A (as suggested to me at CAA by an audience member), but soon came to realise that merging these two datasets (accepting the presence of a site type only if it was in both overlapping cells) would simply replicate the result of applying a 1 by 1 km grid square tessellation layer.

Therefore, to decrease the coarseness of my grid square layer, I have now started using a tessellation of 1 by 1 km grid squares.  The results are much more aesthetically pleasing.  As a large number of our points will fall on the origins of these squares (where data locations are only known to the nearest kilometre square), I have also offset this new 1 by 1 km grid square layer by 500m east and 500m north.  Therefore, these points that fall on the 1000m intervals will only be counted in a single grid square.  This does mean that they might be slightly misplaced (by up to 500m east and 500m north) if they fell somewhere towards the northeastern corner of their kilometre square in actuality, but this is a very minor spatial misrepresentation on the scale of all of England.

As a test, I have run this synthesis methodology on the data received to date from English Heritage.  This is the National Record of the Historic Environment (NRHE), which contains the former NMR records and records of sites found through the National Mapping Programme.  Once processed, it consists of point, line and polygon shapefiles with associated attribute data.  We currently have data for EH’s South West, South East, Eastern, and East Midlands regions.  Here are some example results, with filled squares showing the presence of each period / type of site, bearing in mind that the empty swathes of the country to the north and west for each distribution are due to the lack of data for now (click on an image to zoom in):

BA02L - BA stone circles
Distribution of Bronze Age stone circles by 1km grid square based upon EH NRHE data
IA07A - IA hillforts
Distribution of Iron Age hillforts by 1km grid square based upon EH NRHE data
RO04A - RO villas
Distribution of Roman villas by 1km grid square based upon EH NRHE data
EM02C - EM inhumation cemeteries
Distribution of early medieval inhumation cemeteries by 1km grid square based upon EH NRHE data

Once the NRHE dataset is complete and once we also bring in the HER data that we are currently gathering for all of England, the distributions plotted will be much more complete, but I think that the result is beginning to show some promise.

Taking this further, we can then compare these distributions statistically against the various measures of landscape morphology calculated for each grid square, being the mean elevation and the mean TRI (i.e. the incorrect version originally calculated, see above).  Elevation has a maximum value of 903.8m and the TRI a maximum value of 9.95m (when aggregated out from 50m cells to 1000m cells).  Here are the results for each of the distributions shown above (bear in mind these are the results for 1km by 1km squares recording the presence of one or more of each of these types):

Bronze Age stone circles (150 results): mean elevation 235.9m ± 116.5m (1σ) (range 0.0 – 471.0m); mean TRI 3.2m ± 0.9m (1σ) (range 0.1 – 5.3m).

Iron Age hillforts (764 results): mean elevation 116.3m ± 69.5m (1σ) (range 0.0 – 439.6m); mean TRI 3.2m ± 1.1m (1σ) (range 0.0 – 7.6m).

Roman villas (2495 results): mean elevation 77.1m ± 53.8m (1σ) (range 0.0 – 360.5m); mean TRI 2.0m ± 0.9m (1σ) (range 0.0 – 6.5m).

Early medieval inhumation cemeteries (760 results): mean elevation 60.7m ± 53.8m (1σ) (range 0.0 – 234.0m); mean TRI 1.9m ± 1.0m (1σ) (range 0.1 – 5.4m).

As these distributions are incomplete, we should not read too much into these results, but some patterns do seem obvious.  Hillforts (and stone circles) tend to be at higher elevations (logically) and on more rugged terrain, but have outliers right down to 0m elevation (these tend to fall along the coasts which are, thus, more likely to be edge effects than genuine 0m OSD hillforts, e.g. where a monument polygon overlaps a grid square with almost nothing but sea in it [I can partly correct for this by clipping my DEM to the coastline at a later date], but one is in northern Cambridgeshire [Stonea Camp]: should such a site really be defined as a “hill” fort?).  By contrast, villas and early medieval cemeteries tend towards lower, flatter landscapes.

These are generally logical and fairly obvious results that we might expect to see without calculating any statistical measures, but it is still a useful exercise to run these analyses to try to confirm our intuitive assumptions and to attempt to discover any unusual cases that do not match what we might have expected (such as “hillforts” at sea level).  When applied more extensively to more complete datasets for each of the thesaurus types defined above and for each period (where types exist in more than one period), we might discover some interesting patterns.

Obviously, this methodology remains a work in progress and I will continue to refine it over the coming months as more data comes in.  This includes revising our thesaurus as research questions and the nature of our datasets becomes clearer and deciding on a better measure of terrain ruggedness (probably being the GDAL version).

Chris Green

Landscape morphology

(Apologies to any colour blind readers for the maps in this post, but these are variables that are quite hard to illustrate without using wide colour schemes.)

We have begun to think about some of the different variables that we will want to compare our distributions against once we reach the stage of analysing data for our national survey.  One of these variables will probably be the morphology of the landscape itself.  There is a fear in archaeology these days of being accused of “environmental determinism”, but this fear sometimes means that we ignore environmental variables that do have an impact on past human choices: Chris Gosden, our boss, suggested today that this was denying landscape its own agency.  As such, we do believe that this is a legitimate set of variables to take into consideration when studying distributions of archaeological sites.

We can plot and derive various morphological variables when we have an elevation model of England to hand.  Fortunately, again the OS OpenData can provide here: it includes a Digital Elevation Model (DEM) of the British Isles at a pixel resolution of 50 x 50 metres (interpolated, I believe, from contour mapping).  This is more than sufficiently detailed for nationwide or regional studies (a higher resolution DEM would be preferable for more focused scales of study).

The DEM provides elevation data, which is the first characteristic to be studied.  From the DEM, we can also derive two further morphological variables using standard tools within ArcGIS: aspect and slope.  Aspect shows the predominant compass direction in which a cell is pointing.  Slope shows the degree (or percent) of slope of each cell, as you might very well guess.

These three variables are all at a 50m pixel resolution, but for our national survey we will be studying distributions at a 2000m pixel resolution.  Therefore, we need to consider: (a) whether there is any validity to studying these variables at this coarse resolution; and (b) how to generalise the data from 50m cells to 2000m cells.

Elevation is fairly non-controversial as elevation varies quite predictably across the landscape in most cases.  Therefore, we can simply use the mean average elevation as an expression of the approximate elevation of each cell.  Slope is more problematic, as slope can vary a great deal within a 2 by 2 km area.  However, it does serve well as a type of proxy for the general “bumpiness” of a cell.  It is important to consider this in addition to elevation, as it helps distinguish between more flat (i.e. plateau) and more “bumpy” (i.e. mountain) uplands, more on which below.  Aspect is much more difficult to generalise, however: I will present the results below, but I am unconvinced that they have any great validity.

So, to begin with elevation, we can simply classify this into bands, convert the raster image into a point vector dataset, run the Identify tool in ArcGIS (which seems to be becoming my favourite) against our distribution of polygon grid squares (which we are using to plot our archaeological distributions), and then join the results to said grid square layer.  In this way, it becomes straightforward to statistically test distributions against elevation band: by comparing the statistical profile of a distribution of a specific sub-set of site types (by period or generally) against the statistical profile of all sites, we can test whether any patterns seen are meaningful.  Here is an example of a set of elevation bands to prove that 2km cells still show useful pattern:

elevation_grade
Mean elevation for 2km grid squares

Moving on to slope, we can work in exactly the same way, producing again a mean value for slope in each cell.  As stated above, this result is less meaningful, but I still feel it has some useful validity in picking out the edges of major uplands and in differentiating between flat and “bumpy” areas of the landscape (the numbers themselves are not too important, more the variation between areas):

mean_slope
Mean slope for 2km grid squares

As stated, aspect is much more problematic for several reasons.  Firstly, ArcGIS will derive an aspect for all but the most flat of cells, with the result that areas that would appear flat to the naked eye will acquire an interpretatively meaningless aspect value.  However, we can construct a mask from the slope layer to reclassify the aspect of cells with less than a certain degree of slope (in this case, 3 degrees) as being flat.  Secondly, because flat cells are classified as having a slope of -1, generalising using the mean value becomes impossible.  We cannot reclassify these cells as NoData, as then they will be ignored.  Therefore, we have to reclassify the aspect layer to a category of five (or nine if you including the intermediate directions) cardinal directions expressed numerically: flat (0), north (1), east (2), south (3), west (4).  We can then generalise to the median direction to produce our 2km aspect map, which we then link to our 2km vector cells as before and convert to natural language terms (flat, north, east, south, west).  Here is the result:

median_aspect
Median aspect for 2km grid squares

As should be apparent, this result is a rather messy and problematic one.  The dominance of northerly and easterly aspects seems incorrect, and the overall pattern seems too incoherent to be convincing.  As such, I don’t believe that there is any great feasibility of using aspect for this scale of survey.  However, it may prove more fruitful when approached at the case study level during the latter part of this project.

As a final, more complex, example, I tried combining slope and elevation into a composite model.  The idea was that in combination these two variables could help differentiate between relatively flat and relatively “bumpy” upland and lowland areas.  The resulting map is quite hard to read, but I will explain it below:

slope_elevation
Slope and elevation combined for 2km grid squares

Ignore the white cells around the edges of England on this map, that was my error in forgetting at which stage in the process I should clip the results to England.  On this map, slope is represented by colour (purple/blue = flat; green = gentle; yellow = steep; red = severe) and elevation by saturation (i.e. the brighter the colour, the greater the elevation).  This shows how you can use the HSV colour space to display two variables at once, albeit with slightly difficult results to read.  However, I do think it is possible to derive certain conclusions from visual examination of this map: in particular, I like the way in which you can see a strong difference between landscapes that are truly mountainous (such as the Lake District and parts of the Pennines) and landscapes that are more plateau-like in character (such as Bodmin Moor and Dartmoor and other parts of the Pennines).  Of course, whereas visual examination of this map is quite difficult, it would be simple to derive statistical measures from it.

In conclusion, then, I believe that there is strong potential for comparing archaeological distributions on the scale of England against certain aspects of landscape morphology.  Certainly elevation, probably slope (especially in combination with elevation), but probably not aspect.  I may continue to try to produce a more useful result for aspect, however, but I don’t think the prospects are particularly strong.

Chris Green

Creating a GIS layer of major watercourses

I have been thinking this week about producing a GIS layer of major (modern) watercourses for use in our analysis.  The difficulty with this is that most data sources will provide far too much detail in this regard.  The two OS OpenData products with hydrological data in them are VectorMap and Meridian 2, both of which map a great many watercourses across the country.  This is all well and good when examining data in detail, but is rather confusing when looking at the bigger picture:

Input data
The Meridian 2 rivers layer

Fortunately, there is a simple method by which we can extract only important watercourses from the Meridian 2 data.  The data features a “Name” field, which records the name of the watercourses plotted, but this is (usefully to my purpose) only applied in the case of major watercourses.  Therefore, we can use the attribute select tool to select all of the features in the dataset with a name (the query is: NOT “NAME” = ”):

Stage 1
Selecting all items with a name

However, this still includes canals, which are not of interest to those of us who are not working on post-medieval archaeology.  Therefore, we change the option in the attribute select tool from “Create new selection” to “Remove from current selection” and deselect features with the word “Canal” in their name (the query is: “NAME” LIKE ‘%Canal%’):

Stage 2
And then deselecting all items with "Canal" in their name

We can then export these selected features to a new layer, which is the basis of our major watercourses layer.  There is one small difficulty that arises in that many small sections of interlinking watercourses do not appear to be named in Meridian 2.  I therefore spent a couple of hours working through the new layer and drawing these back in using the edit tools in ArcGIS, based upon the data in the original layer, in order to produce a map that looked more like a proper river network.  I also deleted some of the straight watercourses left in The Fens where they clearly did not take the place of previous natural watercourses.  I only did this cleaning task for England (due to the remit of our project), but it would not take long to do for the country as a whole.  Incidentally, some of these gaps were caused by there being reservoirs along the course of the rivers in question, which could easily be plotted on the map using the relevant Meridian 2 layer*: these were not plotted for our purposes due to being modern features.

The final result was as follows:

Output data
The result!

Clearly, the result is not perfect, but I think it should serve its purpose quite well.  Obviously, the biggest issue with this map from an archaeological perspective is that these are modern watercourses, which clearly do not in all cases have a direct equivalence to the form of past watercourses: one very obvious area where this is the case is in The Fens.  The more recent your period of interest, the less problematic this is (as things will have changed less), but it is worth keeping in mind even for more modern archaeological work.  Creating a map of ancient watercourses is a much less simple task…

Chris Green

* A process similar to this would not work as well for the lakes / reservoirs layer in Meridian 2, simply due to the fact that most reservoirs are not called “Something Reservoir”, but given more bucolic names like “Draycote Water”.  As such, it would be much harder to extract these modern features from the dataset than it was with canals.

Aerial photography and ground obscuration

Examination of aerial photography is one of the primary methods by which archaeologists have surveyed the landscape of England for new sites and for new information about known sites, in a process that continues to this day.  However, it is only possible to find buried archaeological features by this method under certain conditions.  One particular adverse condition that halts all aerial photographic survey work is the obscuration of the ground surface by human and natural features.  Woodlands / forests (LiDAR can see through these to some extent, but photography cannot), lakes, buildings, roads, railways, etc. can hide the ground surface and make the detection of surface and subsurface features impossible.

As a result, distributions of archaeological sites discovered through aerial prospection will inevitably be biased towards areas of open country, particularly arable and pasture lands.  If we wish to make quantitative statements about such distributions, we need a methodology by which to quantify the obscuration of the ground surface, in order to demonstrate which areas of apparent blankness on such a distribution map are, in fact, only blank due to the impossibility of aerial prospection.

When the Ordnance Survey made available some of its data under its OpenData initiative, it became possible to undertake this quantification of obscuration using some quite simple (albeit intensive) computational methods.  This is because the Vector Map product produced by the OS is organised thematically, making it quite simple to download and join together thematic map layers for the whole of the UK (as the current project is only concerned with England, the method discussed below has only been undertaken for England, however).  This forms a series of data layers that would have been very difficult to pull together prior to the OpenData initiative.

To build up a map of ground obscuration for England, the following OS OpenData layers were downloaded and joined together for several regions (European parliamentary constituencies) that together spanned the whole country*: buildings, water areas, forested areas (all polygons), roads, and railways (line data).  It would have been possible to include other layers (such as glasshouses), but it was decided that those listed above were sufficient to produce a good generalised map.  The spatial precision of these layers actually appears very good, especially for the resolution of analysis undertaken (see below).  Buffers were generated for the roads and railway lines, of varying width depending on the type of entity (based on a quick survey of a few entities of each type on Google Earth): 10m for most types of road and for narrow gauge railways; 15m for A roads and single track railways; 20m for trunk roads; and 25m for motorways and multi track railways.

The buffer layers, buildings, water areas and forest layers were then joined together using the union tool in ArcGIS to create a polygon map of ground obscuration for each region.  A 1km by 1km polygon grid square layer was generated using Geospatial Modelling Environment and then reduced down to the outline of England via a spatial overlap query.  The identity tool in ArcGIS was then used to calculate how the polygons in the obscuration layers overlapped with the grid polygons, and the area of each resulting overlap polygon was then calculated.  The attribute tables were exported for these output layers and joined together in Excel into one big table listing the ID number (CELLID) for the related grid square and the area of each obscuration polygon within that square.  A python script was written which went through this table, adding together the total area of obscuration for each CELLID (this took around seven hours to process), and outputting a new table listing CELLID and total area of obscuration.

This output table was joined to the 1km by 1km grid square layer in ArcGIS based upon the CELLID.  We now knew the total area of obscuration for each kilometre grid square of England.  The percentage obscuration was calculated and this percentage figure was then used to create a 1km resolution raster layer showing what percentage of each cell’s ground surface area was obscured by buildings, woodland, water, roads and railways:

% obscuration of ground cover for 1km grid squares in England
% obscuration of ground cover for 1km grid squares in England

Obviously, as with all models, this is not a perfect or perfected result, but I do believe that it provides a very useful quantification of the extent to which the ground surface of England is obscured to any aerial visual observer (the picture would be somewhat different for LiDAR prospection, as then I would not have included trees as a form of obscuration).  There are undoubtedly other types of obscuration feature that could also have been included (areas of alluvium or peat, perhaps) and there may be some types of included feature that can, in certain circumstances, be seen through.  It does, however, provide a good basis for quantifying the extent to which gaps in aerial prospection results for England have resulted from the impossibility of achieving results through that method.  In the context of this project, this is particularly relevant when dealing with English Heritage’s National Mapping Program data, as this was constructed on the basis of aerial survey.

– Chris Green

* This division into regions was purely to ease the processing burden on ArcGIS and my computer.