Regionality & complexity

This post follows on in part from a post I wrote a couple of years ago on regionality. It will also begin with an apology: the maps presented here will be very difficult for colour blind readers to understand, for which I am sorry. Unfortunately, the technique involved is somewhat limited in terms of control of colour (as it requires three colour channels), so it is not possible (or at least very difficult) to improve the maps to make them more legible for colour blind readers. As such, I would not propose publishing these particular visualisations in any formal setting, but hopefully I can get away with it in a blog post!

Before we get to the maps themselves, I shall describe briefly the mapping technique involved, which is partly inspired by the work of a former colleague of mine at the University of Leicester, Martin Sterry (departmental webpage; academia.edu). Essentially, this method can be used to describe the relationship between three different spatial variables that can be mapped as density surfaces. First, we create density surfaces (KDE here) for each variable and then we combine them into an RGB image using the Composite Bands tool in ArcGIS, with the first layer forming the red channel, the second layer forming the green channel, and the third layer forming the blue channel. However, RGB images (so-called “additive colours”, which work from black by adding light in the red, green, and blue channels), can be rather dark / muddy, so I then converted the images (using “Invert” in Photoshop) to CMY images instead (so-called “subtractive colours” where one works from white by subtracting light in the cyan, magenta, and yellow channels: this is how colour printers work). To do so cleanly, one must set up one’s map document so that anything one wishes to be white in the final image is black in the map document and vice versa. The same applies to greys, which must be set to their inverse (e.g. a 30R 30G 30B grey as seen below for Wales / Scotland / Man should be set to 225R 225G 225B, being 255-30 in each case). This may sound somewhat complicated but the end result is as follows:

  • Cyan (turquoise) tones represent high values in Channel 1, e.g. “complex farmsteads” in the first example below.
  • Magenta tones represent high values in Channel 2, e.g. “enclosed farmsteads” in the first example below.
  • Yellow tones represent high values in Channel 3, e.g. “unenclosed farmsteads” in the first example below.
  • Blue tones represent high values in Channels 1 and 2.
  • Red tones represent high values in Channels 2 and 3.
  • Green tones represent high values in Channels 1 and 3.
  • Dark grey / black tones represent high values in all three Channels.
  • White or pale tones represent low values in all three Channels.

Here is a close up of the colour category zones for the first two examples below:

2 CMYK_RRSP

I began by examining the three main categories of Roman farmstead defined by the Roman Rural Settlement Project (RRSP) at Reading, using their excellent data that is available online (Allen et al. 2015). As they defined only three specific categories, this is an ideal dataset to map in this way. For a first attempt, I made three KDE layers using a 10km kernel (or search window) to structure the size of the clusters in the resulting output, then combined them as described above. When plotted against the regions defined based upon variation in their data by the RRSP team (Smith et al. 2016: Chapter 1), we can see that there is a degree of agreement between the regions and the clustering of particular colours:

1 RRSP_psychedelia_v3_inc_regions

However, there is also clearly considerably more complexity to the data than a simple regional classification might suggest (as the RRSP team would certainly acknowledge, so this is not intended as a criticism in any way). If we construct a new model using a wider kernel (in this case 50km), we can get a really nice sense of regional variation in the data without the need to draw lines on a map:

3 RRSP_psychedelia_v2

There is some interesting structure in this model. For example, one can see a focus on enclosed farmsteads in the north and west, so-called complex farmsteads in parts of the southern and eastern midlands (largely alongside enclosed farmsteads), with quite a different focus on enclosed and unenclosed farmsteads in the south east. The strong peak in enclosed farmsteads in south Yorkshire / the north midlands is also quite striking. Although it relies too much on good colour vision in a reader, I think this model and technique works quite well here, so I decided to apply it to another dataset: our own.

Before we get to the next stage, here is a close-up of the colour category zones for the next two maps (with RO = Roman; PR = Prehistoric; EM = early medieval):

5 CMYK_Englaid

Based on another technique which we published recently (Green et al. 2017), the following two maps are created from a measure of the “complexity” of our datasets: specifically the number of different types of site / monument (based upon our thesaurus of types; see Portal to the Past) per 1x1km square. This measure was calculated for each square for each time period in our database and then density surfaces created for each time period (using a 5km kernel in this instance). A shortcoming of the mapping technique comes into play here: it can only map three categories at once. As such, we had to combine the Bronze Age and Iron Age models into a composite model for later prehistory. The three time period based complexity models were then combined into a single image as previously:

4 complexity_psychedelia_global

There are various nice patterns in this dataset, including the clear strength of prehistory and the early medieval in the south western peninsula, the intense focus on major river valleys (partly due to the large gravel quarry excavations in those areas), and the appearance of Roman roads highlighted in magenta. The Roman period also looks quite dominant generally, with lots of pinks, blues, and reds visible on the map. There is also a very clear difference in intensity between eastern / southern England and northern / western England.

It is possible to lessen the effects of regional and period based variation, by constructing a series of larger kernel density surfaces and using these to “correct” for regional variation in the period based models. This produces a new model which reflects complexity on a more local scale. Essentially, the first model can be thought of as a model of “globally” scaled (by which I mean the whole of the dataset, not the whole of the planet) complexity and the new model can be thought of as a model of locally scaled complexity:

6 complexity_psychedelia_local

This model also shows some interesting patterns. It is much less dominated by single periods in particular regions, with Roman dominance mostly along the Roman roads and Hadrian’s Wall. There are also some nice dark areas, which show high levels of local complexity across all three time periods. These cluster mostly along rivers again or around the large Roman towns, along with a similar cluster in southern Yorkshire / the north Midlands to that seen in the RRSP data.

As with all models of English archaeology, the images presented here represent a very complex data history, being influenced by both where more (and more visible archaeologically) activity took place in the past and where more modern archaeological activity takes place in the present (largely driven by development). They also, as previously noted, come with considerable caveats in regards to legibility, due to the relatively large minority of people with restricted colour vision (c.8-10% of men, and maybe 1% of women). The technique is also restricted by its inability to map more than three variables, but more than three variables would probably overcomplicate matters even if it were possible. However, I hope that this post gives a sense of the variation and complexity in the English archaeological record, locally, regionally, and nationally.

EngLaId is now winding down, having officially ended at Christmas, so this will probably be the last substantive post on technique or data for a while. We will however announce here when any new publications come out, including our main books.

Chris Green

References:

Allen, M., T. Brindle, A. Smith, J.D. Richards, T. Evans, N. Holbrook, M. Fulford, N. Blick. 2015. The Rural Settlement of Roman Britain: an online resource. York: Archaeology Data Service. https://doi.org/10.5284/1030449

Green, C., C. Gosden, A. Cooper, T. Franconi, L. Ten Harkel, Z. Kamash & A. Lowerre. 2017. Understanding the spatial patterning of English archaeology: modelling mass data from England, 1500 BC to AD 1086. Archaeological Journal 174(1): p.244–280. http://www.tandfonline.com/doi/full/10.1080/00665983.2016.1230436

Smith, A., M. Allen, T. Brindle & M. Fulford. 2016. New Visions of the Countryside of Roman Britain. Volume 1: the Rural Settlement of Britain. Britannia Monograph Series No. 29. London: Society for the Promotion of Roman Studies.

OS terrain models

This week, in an attempt to avoid any substantive work, I have been playing around with the Ordnance Survey’s Digital Terrain Models (DTM) that are available for free as part of their OpenData archive to anybody who wishes to use them.  The spur for this was the launch in July of a new DTM onto the OpenData site.

Previously (and still today), the OS made available a dataset known as PANORAMA.  This was created using contour data surveyed in the 1970s.  In order to turn this into a rasterised DTM, some interpolation algorithm (I don’t know which) was used to estimate elevation values between contours to result in a continuous field (50m by 50m pixels) of elevation values for all of the UK.  The heights in PANORAMA are recorded as integers, i.e. to the nearest whole metre.

In July, the OS released a new product, known as Terrain 50.  This DTM was created using LiDAR data surveyed from the air and then averaged out to 50m by 50m grid cells.  A lot of data processing goes into turning raw LiDAR data into a terrain model, but this all takes place behind the scenes, so it is difficult to know exactly what has been done.  The heights in Terrain 50 are recorded as floating point numbers, so apparently convey more precision than PANORAMA.  However, due to the relatively coarse nature of the grid used (50m by 50m pixels), this does carry a degree of spurious accuracy (as we are inevitably dealing with averages).

This map shows both products for comparison (click to enlarge):

humber_dtm_compare
Comparison of PANORAMA and Terrain 50 DTMs for an area of East Yorkshire / Humber.

Certain things stand out when you compare these images, but more obviously when you look at the hillshade (click to enlarge):

humber_hillshade_compare
Comparison (hillshade) of PANORAMA and Terrain 50 DTMs for an area of East Yorkshire / Humber.

The main things to note are:

  • The contour origin and whole number data model of PANORAMA produces a stepped plateau appearance, being especially apparent in areas of gradual change in elevation.
  • PANORAMA produces a substantially smoother picture of change in elevation over space.
  • Terrain 50 appears much more accurate, but also “noisy”.
  • Human impacts on the landscape (e.g. quarrying) show up much more obviously in Terrain 50.

On the face of it, Terrain 50 looks a much more accurate representation of the terrain of the UK and, as such, would likely be most peoples’ first choice when choosing between these two DTMs.

As I have so far been working with the PANORAMA DTM, I wanted to test how different it was from Terrain 50 in order to see if I should go back and rerun some of my analyses with the newer product.  The simplest way to do this is to compare the elevation values recorded in each product for the same piece of terrain, i.e. subtract one grid from the other in the Raster Calculator in ArcGIS and then calculate some basis statistics on the result.

However, this is complicated somewhat by the fact that the two grids are not aligned directly on top of each other: the origin of a pixel in one is in the middle of a pixel in the other, i.e. they are offset by 25m east / west and 25m north / south.  To enable a direct comparison to be made, I reprocessed the PANORAMA DTM to split each cell into four and then aggregated sets of four cells (using the mean) on the same alignment as Terrain 50.  This will have resulted in some smoothing of the resulting surface, I expect, but hopefully not to the extent of making the comparison invalid (as PANORAMA already possessed a relatively smooth surface).

The results can be seen on this map (click to enlarge):

dem_diff_rb
Difference between PANORAMA and Terrain 50 cells.

White cells show little difference.  Yellow cells are slightly higher elevation in Terrain 50 and red cells are significantly higher.  Cyan cells are slightly higher elevation in PANORAMA and blue cells are significantly higher.  Certain things stand out on this map:

  • Differences between the two DTMs are greatest in upland areas.  This will at least partly be due to the need to draw contours legibly forcing cartographers to underplay the steepness of very steep slopes.
  • The sea tiles are quite interesting in the way they vary.  This seems to be due to PANORAMA using a single value for sea cells across the whole dataset, whereas Terrain 50 seems to use a single value for sea cells on each 10km by 10km tile, but different values between tiles.
  • We can also see some differences being much greater on one side or other of the division between tiles aligning with 1000m divisions on the OS grid.  This must be due to Terrain 50 data being processed on a tile by tile basis, more on which later.

Overall, however, the differences between the two DTMs are not great.  If we remove the negative sign from the difference layer (by squaring, then square rooting the result) and clip out sea cells, we can plot a histogram of the difference in elevation (across all 92 million cells):

dem_diff_hist_aligned
Histogram of elevation difference between PANORAMA and Terrain 50.

From this graph, we can see that although there are cells with differences of up to nearly 230m, the vast majority of cells are within 5m of elevation of their counterpart.  The mean difference is 1.91m and the standard deviation 2.26m; 75% of all values are within 2.5m of their counterpart.  As such, PANORAMA and Terrain 50 are actually very similar in elevations recorded.

We can also plot this difference layer on a map, with some interesting results:

dem_diff_somerset
Difference in elevation between PANORAMA and Terrain 50 for an area of Somerset (black = no difference, white = high difference).

Black cells on this map show no difference or minimal difference, shading up through grey to white for cells of relatively high difference in elevation between the two DTMs.  Certain features stand out, some of which I have annotated onto this map:

dem_diff_somerset_annotated
Difference in elevation between PANORAMA and Terrain 50 for an area of Somerset (black = no difference, white = high difference). Features annotated.

The motorway is clearly a feature that appears in Terrain 50 but not PANORAMA.  The contour lines are clearly an artifact of the origins of PANORAMA.  The reservoir is presumably a similar issue to the sea level variation.  The variation on the Mendips is presumably due to the “noisier” more precise nature of Terrain 50 contrasting against the smoothed appearance of PANORAMA.

The appearance of the grid lines worries me somewhat though.  They were not apparent (to my eye) when looking at the raw data or hillshade layers for either dataset, so presumably they are the result of quite a subtle effect.  My assumption (as mentioned above) is that these arise from the LiDAR data behind Terrain 50 being processed as a series of tiles rather than as a single dataset: this is of course inevitable as a continuous high resolution LiDAR dataset for all of the UK would be mind bogglingly immense.  My fear is that any sensitive analyses of terrain using Terrain 50 might show up these grid edges in their results.  However, this is even more true of the 1m contour “cliff edges” that appear in PANORAMA.  At least grid lines will be obvious to the human eye if they do cause strange effects.

So, what does this all mean?  Well, I would argue that the generally minimal difference between elevations recorded for the same place in the two datasets means that previous analyses (especially coarse analyses) undertaken using PANORAMA should not be considered invalidated by the (presumably) more accurate new Terrain 50 DTM.  Also, the “noisy” nature of Terrain 50 and the presence therein of more features of human origin might mean that the smoother PANORAMA could still be the best choice of DTM for certain applications (especially in archaeology, where features like the M5 would not generally be a useful inclusion).

Chris Green

Processing raster NMP tiles (part 3)

We are now in receipt of all the NMP data (and associated NRHE data) currently possessed by English Heritage, alongside a couple of regions which were kindly supplied directly by the local HERs (Norfolk and Essex), and we would like to extend our thanks to Simon Crutchley, Lindsay Jones and Poppy Starkie for their work in pulling together these datasets for us.

I have previously discussed methodologies for processing the scanned (raster) maps which represent the results of the earlier NMP surverys (1) (2).  I am reasonably satisfied with the polygon result, but one issue that I have discussed with Simon Crutchley is whether it is possible to convert areas of rig and furrow (drawn with a dotted outline) into polygons representing their extent (rather than individual polygons for each dot).  Here is an example of the raster NMP:

1 raster
Raster NMP example

And here is the same data converted into polygons (with grid marks removed):

2 vectorised
Vectorised polygons generated

The first stage in converting the dotted outlines into filled polygons is to generate the line version of the same raster input data:

3 lines
Vectorised lines (red) generated, overlaid on polygons (click to enlarge)

We then create a 5m buffer around these lines (i.e. total width 10m):

4 mask
5 metre buffers around lines (red)

And then use this buffer layer to delete most linear features from the polygon version of the data (using the Erase tool in ArcGIS):

5 erase
Buffered areas erased from polygons

Most of the remaining objects are associated with areas of rig and furrow.  However, we can further improve the result.  First, we recalculate the areas of each polygon and filter the layer down so that we are only dealing with polygons of between 3 and 30 square metres in extent:

6 selection 3-30m2
Erased result filtered down to polygons of 3 to 30 sq. metres

This removes a few remaining linear features that were not previously erased.  Next, we generate centroids for each polygon (using the Feature to Point tool), run the Near tool on the result to get the distance to each resulting point’s nearest neighbour and filter (red dots) out those that are above a certain distance from their nearest neighbour (blue dots).  In this instance, I chose 40 metres, but I think a smaller value would have been better (probably 30m):

7 centroids red near 40m
Centroids generated for each polygon, filtered down to those within 40 metres of another point (red); blue points are those eliminated

We now have a point layer which for the most part represents the vertices for creating our rig and furrow polygons.  We can run the Near tool again, this time asking it to give the spatial location of the nearest neighbouring point for each point, and use the Calculate Geometry tool to insert two fields into the layer giving the location of each origin point.  The XY to Line tool can then create lines between each point and its nearest neighbour:

8 generated lines
Lines generated from points to their nearest neighbour

This result is getting fairly close to what we desire, but has some considerable problems.  First, none of the lines perfectly enclose the polygons, making it currently impossible to immediately process this result into a polygon layer.  It might be possible to fill some of these gaps using the Extend Line tool, but that is a very computer intensive task and liable to still produce an imperfect result (I left it running for 24 hours on this relatively small dataset before I gave up and cancelled it).  Second, in some instances, two lines of parallel dots can be closer to each other than the dots are within each line.  In this instance, the generated lines are drawn as links between the two parallel lines rather than along each individual line.

As a result, currently, if we were to try to convert this result into polygons, we would first need to fill in all of the gaps manually using editing tools and possible also delete all of the small sections of line that have no association with rig and furrow (albeit we could ignore most of these as they will not produce closed polygons in the result or, if they did, they are likely to be small in area and thus possible to easily filter out).  I do not think it is possible with out-of-the-box tools in ArcGIS to improve upon the line generation result, although it might be possible to develop a new tool to do so (perhaps with a directional bias towards its nearest neighbour to encourage linearity?).

I shall keep thinking about how to improve this process.

Chris Green

The Archaeological Investigations Project (AIP)

As previously mentioned, the Archaeological Investigations Project (AIP), based at the University of Bournemouth, has been gathering brief details of developer-funded archaeology that has taken place in England since 1990.  They provide an excellent, useful resource for archaeological researchers, but unfortunately we understand that their funding has now come to an end, at least until early 2013.  We were particularly sad to hear that Ehren Milner was losing his job, as he has been very helpful to us in our work on the EngLaId project.  We wish Ehren all the best for the future and hope that the AIP is able to continue their excellent work at some point soon.

In the meanwhile and until they are back up and running properly, I have written some Python code to convert the “xls” files downloadable from their website into a data format more suited to GIS analysis (i.e. a shapefile).  This is complicated slightly by the fact that the downloads are not, in fact, xls files but tables coded in html.  The code is rather EngLaId specific in some of its content, but I thought I would share it here in case it is of use to anyone else using AIP data.  The code should be available here:

https://github.com/drchrisgreen/Englaid_code_bits/blob/master/process_aip_web.py

It is probably a bit overcomplicated and not especially efficient (as I wrote it for my own occasional use only, primarily), but it might help anyone else trying to do similar things.  I cannot promise that it will work 100% of the time, but it seems to have worked for me!  The code is commented, so adaptation did not ought to be too difficult…

Chris Green

Processing raster NMP tiles (part 2)

In my previous post on vectorising raster NMP tiles, I concluded that for certain purposes it would be more analytically useful to create a version that consisted of lines rather than polygons (in order to attempt to look at the topology of field systems in particular).  I said that I would look into using the Thin algorithm in GRASS GIS, but I then noticed that ArcGIS also has a Thin tool, so I decided to play with that instead!

After some experimentation, I came up with a process for performing this conversion, implemented as three iterative models in ArcGIS.  The first stage reclassifies the raster tiles so that white areas have a value of ‘NoData’ and then runs the Thin tool on each tile (to, as you might guess, thin all of the lines):

stage1
Model stage 1

The second stage then iterates through the results of the first, performs the same reclassification (as the Thin tool seems to output a binary result with values of 0 and 1), then converts each raster tile into line shapefiles:

stage2
Model stage 2

For the final stage, we then make sure the projection is defined correctly, trim any small lines that are likely to be artifacts of the process rather than real features (less than 3m in length), and then run the Smooth Line tool to try to improve the aesthetic quality of the result:

stage3
Model stage 3

The results of this process look quite good on first appearance, and removing grid marks / lines worked even better than with the polygon results (using the same methodology as previously described):

brickwork fields
Line version of vectorised raster NMP (brickwork fields in Nottinghamshire).

However, the results are significantly more problematic than when converting to polygons.  As an example, areas of rig and furrow are represented in the raster NMP drawings as dotted outlines with arrows showing the direction of the furrows.  These show up fine in the polygon version, but the dotted outlines disappear in the line version.*  Yet the arrows remain.  Therefore, if studying the line version in isolation, there is no way to tell that the lines representing the arrows were once arrows and are not drawings of archaeological features.  Something similar happens with text on the maps.

line_version_RandF
Comparison of line (red) and polygon (grey / black) vectorised versions of raster NMP: note the disappearance in the line version of the dotted outlines of rig and furrow areas.

As such, on balance, this line version is less useful than the polygon version.  However, for areas of field systems it will be simpler to use this version for the study of topological relationships.  In order to do so, however, it is necessary to extract the field systems from other features and fill small gaps in the drawings of what are clearly continuous features in reality (which presumably exist due to gaps in the crop marks / earthworks).  The former could only be done manually, but I did make an attempt to extend the lines automatically to fill gaps.  The tool took 15 hours to run for a small section of Nottinghamshire and, although successful in some cases, on the whole produced a very messy (and useless!) result:

messy_brickwork
The rather messy result of trying to automate filling gaps in lines (originals in black, extensions in red).

As such, if I am to use this data to study the topology of field systems, I will have to make all these edits manually: to eliminate small gaps in continuous features and to remove features that are not themselves part of the field systems.  I will experiment with this and report again at a later date.

Chris Green

*  Incidentally, although this result is problematic for most purposes, it may provide a route into converting these areas defined by dotted outlines into polygons covering the extent of the enclosed area, by creating buffers around the line result and erasing the buffered areas from the polygon result (and then processing this somehow into filled polygons: I’m not sure how!), but that is something that will require a lot more thought / experimentation.

Processing raster NMP tiles

One of the most important datasets with which we are working on this project is the English Heritage (EH) administered National Mapping Programme (NMP), which has been working all across England to map archaeological features seen in aerial photographs.  This project has been ongoing for about 20 years and, as a result, the methodologies used by those undertaking the work has evolved over time.  More recent NMP projects were undertaken in CAD software, with the result that the drawings produced exist as vector data, which is very flexible for GIS mapping purposes.

However, I have been thinking over the last few days about how to deal with the older areas of the NMP that exist as scanned raster images of line drawings.  This raster NMP data is less flexible, in that raster imagery does not scale well when looking at broader regional patterns (when compared to vector data) and it is difficult to display any background data behind raster tiles (making the white areas invisible tends to make the black lines fade, as obviously does making the layer partly transparent).  I have also been talking recently via email with Helen Wickstead about some of her ideas for examining the topological relations in field system layouts and this type of analysis would be impossible with raster scanned maps.

1 raster NMP
Example of raster NMP tiles

As a result, I decided to try to convert the raster NMP tiles received from EH to date into vector data.  I came up with quite a simple system to do so, after a bit of trial and error.  First, we reclassify the data in the raster tile so that the white areas have a value of ‘NoData’ and the black areas (i.e. the drawings) retain a value of 1 (for most tiles, the black areas already have a value of 1 and the white of 0, but this is reversed for most of Kent and some tiles in Lincolnshire).  The next stage is then to use the Raster to Polygons tool to convert this result into a polygonal vector layer.  Ideally we would want to convert them to lines rather than polygons, but they are too ‘thick’ for this to work (more on which later).

The result includes a large amount of small polygons which add little to the overall picture other than making the layer render on screen slowly.  If we assume that these original drawings, which were done at a scale of 1:10,000, were undertaken with a minimum pen width of 0.1mm, then any polygon smaller than 3.14m² would be smaller than the head of a pen and, as such, unlikely to be deliberately plotted.  Therefore, we then calculate the area of the polygons in the layer, select those with an area of greater than 3m² and export that as our final result.  For the datasets processed to date, these small polygons amounted to about 20% of the total, which have been removed to no obvious visual effect.

Obviously, this is a lot of work to undertake when dealing with large sets of 5 x 5 km tiles for large swathes of England.  As such, I experimented with the Model Builder in ArcGIS to see if I could automate the process.  The process had to be split into two stages, due to the software getting confused by the new file names.  Stage 1 iterates through a folder of raster layers, reclassifies them and then converts them to polygons:

2 model stage 1
Stage 1 of the model to convert raster NMP tiles to vector data

For some reason, the Model Builder was not attaching projection information to the output layers, despite the fact that I had specified this under the tool’s instructions.  The projection needs to be defined to accurately calculate the area of extent of the polygons.  As such, Stage 2 iterates through the data output by Stage 1, defines the projection for each layer as British National Grid, and then calculates the area of extent of each polygon:

3 model stage 2
Stage 2 of the model to convert raster NMP tiles to vector data

We then merge all of the layers output into a new composite layer, use the Select by Attribute tool to select all of those polygons with an area of more than 3m² and export the results.  Here we can see the vectorised result for the area of raster NMP shown above:

4 vectorised result
Vectorised version of raster NMP tiles

I am reasonably satisfied with this result, but there are a few things I would still like to do to tidy up the layer.  That is to remove the grid marks (or grid lines in the case of some NMP projects, e.g. Dartmoor and most of Kent) and, ideally, to remove the pieces of text on some of the images (particularly Dartmoor).  I cannot (yet) think of an automated way to deal with the latter, but removing the grid marks / lines can be automated, albeit with a somewhat imperfect result.

5a
Removing grid marks: input data

We begin with a layer of grid lines that fall on each of the 1000m points east and north of the origin of the British National Grid, created using the Geospatial Modelling Environment software:

5b
Removing grid marks: 1000m grid lines

We then create buffers 5m either side of these grid lines (i.e of 10m width).  This covers most of each grid mark / line (for areas of raster NMP that include drawn grid lines, the next two steps in the process are not needed, we would skip straight to the erasing):

5c
Removing grid marks: 5m buffers each side of grid lines (i.e. 10m total width)

We then create 45m circular buffers around the vertices between these grid lines:

5d
Removing grid marks: 45m buffers around grid line vertices

Then we clip the line buffers by the vertex buffers to create a polygon layer that masks the grid marks:

5e
Removing grid marks: grid line buffers clipped to buffers around vertices to produce cross mark mask

Finally, we use the Erase tool in ArcGIS to remove the areas of the vectorised raster NMP layer covered by the grid mark (or line) mask (at this stage, I also clipped the layers to the coastline for coastal areas):

5f
Removing grid marks: the result after erasing polygons under cross mark mask

The result is clearly imperfect, but does look cleaner than the original result.  For some areas, bits of grid mark or line still appear within the resulting layer, where they slipped outside the mask, but these can be cleaned up manually where necessary.  The main problem with this automated approach is that it can cut out small areas of the drawings that relate to archaeological data where they overlap grid marks / lines, but these should be fairly minimal, and largely invisible at broader scales of rendering.

6 grid removed
Vectorised version of raster NMP tiles with grid marks removed

By way of comparison, here is an area where vector and raster NMP data meet:

7 compare vector & raster
Comparison of vectorised raster NMP tiles (in black) against vector data (coloured)

We can see that the vector NMP areas have more attribute data attached to them, so that we can plot this data in different colours according to the morphology of the feature (i.e. bank, ditch, etc.), but the vectorised raster NMP areas are now easier to plot alongside the vector areas, especially on these type of broader landscape scales.

I am quite pleased with this result, and it certainly helps greatly with the plotting of the areas of the NMP which exist as raster data, but it does not help greatly with the previously suggested idea of examining the topological relationships within field systems.  I think that there may be a way to produce a line (rather than polygon) result by processing the original data using the Thin algorithm in GRASS GIS, but this would need care and is probably only feasible for quite restricted areas.  I shall return to this idea at a later date!

Chris Green

ArcMap: generating squares to represent NGR precision

Often when dealing with spatial data recorded using Ordnance Survey (OS) National Grid References (NGRs for short), you will find that different objects within your data are recorded to differing levels of spatial precision.  This can lead to misrepresentative results if treated simply as point objects (especially for objects poessessing only a very vague location) and, therefore, we need a method of representation that is truer to the actual spatial precision of the data objects.

To help explain, NGRs are formatted as follows (fee free to skip this paragraph if you understand NGRs): the OS divides the UK up into 100km by 100km grid squares, which are designated using two letters.  Therefore, an NGR consists of two letters followed by a string of numbers, the numbers representing the object’s location within the large grid square.  There should be an even number of these digits, with the first half representing the easting and the second half the northing.  However, the exact number of digits will vary according to how precisely located the object is.  For example, an NGR such as SQ2222233333 is in the square SQ at the location 22222 metres east, 33333 metres north; however we would have to add zeroes to a shorter NGR as the precision of the point is less well known, e.g. SP444555 is in the square SP at the location 44400, 55500.  For more on converting NGRs to numeric coordinates, see my previous post.

However, simply counting the digits is not always a sufficient measure of coordinate precision, as the person who entered the data may have tagged zeroes onto the end of the easting and northing.  For example, the NGR ST5500066000 appears to be accurate to the nearest metre, but it is unlikely that a point would fall exactly onto the 55000, 66000 point and, as such, it is much more likely that this point is recorded to the nearest kilometre.  I have a small Python script to perform this test, which can be downloaded here.

In any event, the end result of this is that an NGR that is apparently a point is, in fact, properly the origin point (southwest corner) of a square of width and height equal to its precision, e.g. the NGR ST5566 represents a square 1km by 1km spreading northeast from its location, whereas the NGR SQ2222233333 is a 1m by 1m square.  Therefore, if we want to represent these points properly in a GIS environment, we have to present them as squares of appropriate size.

So how can we automate creating this representation in ArcMap (version 10)?*  I will assume reasonable familiarity with the software.

If we have a point shapefile of our locations (use a duplicate to preserve the original), with their associated attribute data including their spatial precision**, we can create these squares in four relatively simple steps.  Here are our points on a map:

Points
The points...

First, we open up the attribute table for the layer.  Then we add a new field to the table, which we shall call MIDPOINT and which will be of the Float type (i.e. a floating point number).  Then we open up the Field Calculator for the MIDPOINT field and enter the calculation [Precision field] / 2.0 as follows:

Field Calculator (i)
Calculating the midpoint of our square.

This new field will be used to add on to the x and y coordinates of each point in the layer to move them to the centre of each square to be created.  We then open the Field Calculator for the Shape field and enter the code snippet seen here (make sure you have the Python parser selected at the top):

Field Calculator (II)
Moving the points.

The expression can be downloaded here.  This results in the points moving to a new location to the north east in the midpoint of the squares we will soon create (notice that some points have moved a long way yet others remain close to their origin):

Moved points
New point locations plotted in red against old in black.

Next we create buffers for our moved points using the Buffer tool in ArcToolbox.  You want to set it to use the MIDPOINT field as the buffer size (we use the MIDPOINT rather than the full precision field as the size will be the radius of the circle created, not the diameter) and you do not want it to dissolve the buffers where they overlap, as we want one object for each point.  The result should look something like this:

Buffers
Buffers of our points.

Finally, we use the Feature Envelope to Polygon tool in ArcToolbox using the buffer layer as input.  This, helpfully, retrieves the envelopes that enclose each circle (used to make spatial queries faster etc.), which is equivalent to the squares that we were after in the first place.  Therefore, this layer is the result that we desired:

Final result
The final result of the process.

Any points that do not seem to have output squares in the image do, in fact, have them but they are too small to see at this scale (e.g. 1m or 10m precision).  Overall, this is a simple way to do this task in just a few minutes whilst avoiding any overly complex scripting.  I think I will work on a Python script to achieve this task in one step, but the method just described works perfectly well in the interim.

The final result means that we now both have a visual representation of the area to which each of our NGR “points” actually relates, and also a layer which can be used in further analyses when trying to understand the contents of our layer: all of the layer attributes from the original layer should find their way automatically into the final output, so we can perform whatever analyses we wish on our new, more properly representative geographic objects.

Chris Green

* ArcMap is part of ESRI’s ArcGIS and is the GIS software that we are predominantly using in this project.

** If you did not record the precision of the NGRs whilst converting them, you could use the Field Calculator to work it out (in the same way as described above for the other calculations).  You would have to create a fields called something like ‘PRECISION’ and ‘TRUEPRECIS’, and then use the expression code downloadable here and here in turn on the first and then second fields.