This is just a brief update to outline some of the latest developments to the project’s GIS element.
Web mapping: constructing an Open Source geostack
John, Xin and I have been thinking this month about how we might go about constructing the web-based mapping element seen as a vital component of the final website that we will produce as part of the project. Xin constructed a good initial plan and then I found a very useful tutorial and software package on OpenGeo. This OpenGeo Suite is well-documented and seems to function well as a route to provide webpage-embedded interactive maps.
The team installed the Suite on the OeRC’s server and used it to feed data to a test webpage (written based upon the OpenLayers API). The results were very promising and we hope to use this software / process as part of the construction of our final website output, where appropriate. What data and results the final website will map is still very much an open question!
Converting Ordnance Survey grid references
Whilst looking at and importing AIP data for field systems into the project’s GIS database, I wrote a small piece of Python code to convert Ordnance Survey National Grid References (NGRs) to numeric x and y coordinates for GIS implementation (in metres, not latitude / longitude). The code should be available for download here. It runs in Python (2.7) and would need expanding to include the capture of NGRs from a source file and the writing of output x y coordinates to a results file, as it is only really of use for bulk conversion purposes. Apologies for the quality of my coding…
My previous (brief) online guide to this subject can be found here. Incidentally, my main tip for doing this manually is: make sure you do not confuse your easting and your northing!
Processing NMP data for GIS analysis
I have also been working on processing the NMP data for the south west and south east received from English Heritage.
It was found to be useful to copy the raster layers into a Raster Catalog object in ArcGIS. This makes possible various analytical methods that would otherwise be closed to raster data (such as the Select by Location tool), and also makes it easier to handle multiple tiles in one batch (including controlling the display of blank vector tiles at inappropriate scales for raster rendering, e.g. when zoomed out too far to show any detail). The Raster Catalog will be updated as further data is received.
With the vector data (in CAD format), it was found to be useful to convert the files into a single Geodatabase (.gdb) object in ArcGIS. This, again, makes it easier to handle multiple tiles at once (including maintaining symbology across tiles) and also makes it easier to output composites of multiple tiles to other formats (such as shapefiles). Again, this Geodatabase will be updated as further data is received.
The results look good and should form a strong basis for the research undertaken by the project (in conjunction with our other data sources, of course).
Thank you to all the HER professionals who the EngLaId project team are currently in dialogue with about accessing HER data for the whole of England – your constructive input is very much appreciated. We are particularly grateful to David Evans who sent though the HER dataset for South Gloucestershire this week. Our technical experts in eResources are now busy building the HER data which we’ve already been given into the project database. Thanks also to Steve Northcott who has volunteered to produce a database of grey literature for Bodmin Moor, one of the project’s main case studies. This will list all grey reports available for sites producing evidence dating from 1500 BC-1086 AD on Bodmin Moor using data from the Archaeological Investigations Project (AIP) website (http://csweb.bournemouth.ac.uk/aip/aipintro.htm). It will also indicate which of these reports are currently available online in the Archaeological Data Services’ (ADS) grey literature library. This will be invaluable resource when it comes to looking in detail at evidence from the case study areas. Once this task is completed for Bodmin Moor, we hope to produce similar databases for other case study areas.
The EngLaId project would like to express a big Thank You to Peter Insole, Faye Glover and Chris Webster for providing the HER datasets for Bristol City, Wiltshire & Swindon and Somerset respectively! Of course many (belated) thanks must also go to the kind staff at English Heritage, for providing the first set of NMP data, and to the team of the Portable Antiquities Scheme for providing us with a bulk download of all their data. Special thanks as well to Guy Sakeld of the National Trust for sending their data in January!
Following on from my previous post on this subject, I have now produced a second version of the obscuration layer. On the suggestion of Graham Fairclough of English Heritage, this version includes the same obscuration factors as before (woodland, water, buildings / roads / railways), but also adds in areas of alluvial and peat sub-surface deposits. These types of deposit tend to obscure archaeological features that were present on the former land surface before they formed, due to their thickness. However, this is not as complete an obscuration as with the previous categories used, for several reasons:
1. Peaty soils across England are being eroded by agricultural / drainage practices, revealing their buried archaeological material.
2. Archaeological sites that were constructed after (or later on during) the formation of the deposits will not be (or will be less likely to be) obscured, i.e. the older a site is, the more likely it is to be obscured.
3. Peat / alluvium deposits may be thin enough for substantial buried archaeological features to show through the masking effect, especially if denuded by more modern intervention.
As such, this result should be viewed more critically than the previous one, in that some areas showing as highly obscured may, in fact, show some archaeological features from the air (notably the region around the Wash), especially when dealing with sites from more recent times. Also, as with all models, the result presented should not be taken as perfected. Here, then, is the map showing percentage obscuration for 1km x 1km grid squares across England (built environment, water, woodland, peat, alluvium):
The data for peat and alluvium deposits were taken from the British Geological Survey’s 1 in 625,000 geology dataset (superficial deposits) which they provide for free download and unrestricted usage (subject to providing appropriate acknowledgment) under their open data initiative. This data is provided at the perfect scale for a task such as the one undertaken (i.e. national patterning), but would be less useful for more intensive surveys. Together with the OS Open Data also used, however, it does demonstrate the excellent results that can be produced as a consequence of organisations opening up their data for free usage by researchers (and, by extension, the general public).
The EngLaId team is currently exploring the possibility of developing collaborations with Historypin (http://www.historypin.com).
Both projects share an interest in recording the past on a hitherto unprecedented scale through the use of innovative technology. Although Historypin works mainly with historical photographs and EngLaId focuses its attention on archaeological data, it was felt that both teams could learn from each other’s approaches and methodologies.
For that reason, the EngLaId team enjoyed a short presentation, given by Rebekkah Abraham, Historypin’s Content Manager, which set out the background to the Historypin initiative and described some of its current projects and collections from England and elsewhere in the world, including some wonderful collections of old excavation photographs. In return, Chris Gosden gave a brief presentation on the EngLaId project.
In the end, it was decided that a preliminary project may be developed in first instance between Historypin and the EngLaId project artist, Miranda Creswell. Miranda’s involvement in the EngLaId project shares Historypin’s strong interest in public engagement.
Possibly the biggest challenge built into the EngLaId project is in how we bring together and synthesise the diversely recorded datasets that we are using. Whilst some consistencies exist between the recording methods used by different data sources, there remains a considerable amount of diversity. English Heritage (EH), England’s 84 Historic Environment Records (HERs), the Portable Antiquities Scheme (PAS) and other data providers / researchers all keep their own separate databases, all recorded in different ways, and the entries within which relate to some of the same objects and to some different objects.
As a result, there is considerable duplication between different data sources, which is not at all easy to extract. Where data objects have names, such as in the case of many larger sites, this can be used to assess duplication (assuming all datasets use the same names for an object), but this does not apply to the much more common case of objects with no assigned names.
Therefore, the best way in which to discover duplication and attempt to present a synthesis between different datasets is to test for spatial similarity. In other words, if a Roman villa is present in the same space within two different datasets, we can assume that it is the same villa. However, this in turn is complicated by the fact that different data sources contain data recorded to different levels of spatial precision and using different data types (e.g. points vs polygons). The way that I am experimenting with to get around this problem is in applying a tessellation of grid squares over the map and testing the input datasets for which objects fall within each square, recording their type and period, and aggregating across datasets to assess presence or absence of each site type for each period.
The first stage is to simplify down the terms used in the input dataset to a set of (currently) eight output terms (these are still not fully defined as yet and the number of output terms will undoubtedly grow). This is partly so that the output text fields do not exceed the 254 character limit for ArcGIS shapefiles (I will be working on a solution to this, probably involving moving to the geodatabase format), and partly so that we can identify objects of similar type recorded using different terminologies. This is accomplished through the use of a Python script.
The grid square tessellations were created using the tools provided as part of the Geospatial Modelling Environment software, which is free to download and use. So far, I have created tessellations at resolutions of 1km x 1km, 2km x 2km, and 5km x 5km to cover the different scales of analysis to be undertaken (and ultimately to give flexibility in the resolution of outputs for publishing purposes with regard to the varying requirements of our data providers). These were then cut down to the extent of England using a spatial query.
ArcGIS’s identity tool was then used to extract which input objects fell within which grid square (or squares in the case of large polygons and long lines). The attribute tables for these identity layers were then exported and run through another Python script to aggregate the entries for each grid square and to eliminate duplication for each grid square. The table output by the script (containing the cell identifier, a text string of periods, and a text string of types per period) was then joined to the grid square tessellation layer based upon the identifier for each cell. The result is a layer consisting of a series of grid squares, each of which carries a text string attribute recording the broad categories of site type (by period) falling within itself.
This methodology means that we can bring together different datasets within a single schema. Input objects that overlap more than one output square can record their presence within several output squares* (assuming they are represented in the GIS as polygons / lines of appropriate extent). Querying the data to produce broad-scale maps of our different periods and/or categories of data is simple (using the ArcMap attribute query system’s ‘LIKE’ query, remembering to use appropriate wildcards [% for shapefiles] to catch the full set of terms within each text field**). The analysis can also be redone using different resolutions of grid tessellation, depending on the quality of input data and the spatial scale of research question considered (e.g. 1km x 1km or 2km x 2km or 5km x 5km squares).
So far, this methodology has only been tested using EH’s National Record of the Historic Environment (NRHE) data (as seen online at PastScape: the process described above is also capturing the relevant identifiers to link through the data to PastScape, with an eye on linked data output in our final website) and using an initial, rather arbitrary, set of simplification terms to produce test results, but it should be straightforward to extend this system to encompass the various other datasets that we are in the process of gathering. As an example of the output produced, here is a map of Roman settlement sites in the south west of England (settlement being defined here as entries containing any of the words: villa, house, settlement, hut, roundhouse, room, burh, town, barn, building, floor, mosaic; some of these terms obviously do not apply to the Roman period and the list will be subject to revision before final outputs are produced):
As can be seen, on the scale of a region the output is both clear and instructive. The result is one that shows presence or absence of a type of site within each cell, with no quantification given of how many of each type (as we ultimately will not know whether the total count is due to duplication or due to genuine multiplicity). This picture will only get better once we have fully defined the terms used in our simplification process and once we start building in more data from our other data sources.
I shall be presenting a paper on this subject at CAA in Southampton in March.
* Whether this is appropriate or whether they should fall only within the square within which the majority of the polygon falls is still open to debate. I feel that under a strict rationale of presence / absence, they should appear in all squares they overlap, but this could present a misleading picture in cases where, for example, a small site overlapped the junction of four large grid squares.