Fuzzy time II (14C and PAS)

Following on from a previous post (see which and also Green 2011 for more details on the methods discussed here), I have been experimenting more with the application of fuzzy probability modelling of our data. We decided to expand out the previous experiments, which had only been done using PAS data, to take in radiometric dates. Although our search was rather cursory, just taking in the CBA Index maintained by the ADS (and periodically updated by English Heritage) and a search of published dates from within the OxCal database (kindly conducted for us by Christopher Bronk Ramsey at RLAHA), we were able to create a database of over 5,000 radiocarbon (14C) dates that fell within (or partially overlapped) our time period of interest (for this exercise, being 1500BC to AD1050).

I rewrote my fuzzy probability calculation scripts to enable them to use the full detail of the radiocarbon probabilities output by OxCal and then ran them on a series of timeslices across this new dataset. Initially, I used the sub-periods defined in the previous experiment, but it became quickly apparent that the sub-periods chosen for the Late Iron Age and Roman period were too narrow to produce high enough probabilities of dates falling within them to be of interest. So I defined a different set of sub-periods, which resulted in a higher average probability for dates through the LIA-Roman period:

  1. 1500 to 1151BC
  2. 1150 to 801BC
  3. 800 to 401BC
  4. 400 to 151BC
  5. 150BC to AD49
  6. AD50 to 199
  7. AD200 to 410
  8. AD411 to 649
  9. AD650 to 849
  10. AD850 to 1050

The results were collated in ArcGIS and could then be mapped for each time-slice as follows:

1 raw dates 400 to 151 bc
Example of 14C date probabilities for 400 to 151BC

However, there is a problem with reading these maps due to the relatively clustered nature of the distribution which results in a lot of overlapping points. This results in some low probability dates obscuring higher probability dates within the same local area. To get around this, I collated the results using hexagonal bins, with the maximum probability of any date within a given bin being used to define the probability for that bin (maximum rather than summed values were used as 14C dates are not really discrete objects in the same way as finds and so multiple dates do not necessarily represent greater density of activity in the past):

2 collated dates 400 to 151 bc
Example of 14C dates collated by hexbin (max value) for 400 to 151BC

I then reran the probability calculations for PAS and other dated finds in our database using the new sub-periods and summed the results by hexagonal bin (summing was used rather than the maximum here as finds very much are discrete objects and, as such, more finds does imply more past activity, with certain caveats [modern archaeological / metal detecting practice being the most obvious one]):

3 finds dates 400 to 151 bc
Example of finds dates collated by hexbin (summed value) for 400 to 151BC

I then combined the two sets of results, using the maximum value across both datasets. As such, if the weighted finds probability within a cell was greater than 1.0, then it was preferred, but if less than 1.0 and less than the 14C probability within the cell, then the 14C probability was preferred. Although the finds dominate the results, the 14C does fill in some gaps and increase probabilities in some areas, especially in prehistory:

4 all dates 400 to 151 bc
Example of 14C and finds dates collated by hexbin (max value) for 400 to 151BC

The results for each time-slice can be viewed in the following animation (click to enlarge):

5-fuzzy-animation
Animation of combined finds and 14C date probabilities through time

What can we read into this? Well, firstly, it should be noted that this is just an experimental model and shouldn’t read too much into it. There is a possible element of duplication in some of the finds data, as some PAS records are present in both our PAS dataset and our HER dataset (dependent upon local HER practice). Secondly, the 14C dates only add something quite subtle to the finds dates, as we have far more finds dates than 14C dates in our possession, but the subtle addition is, I feel, an important one.

However, subject to these caveats and the further element of uncertainty introduced by the affordance factors at play in the background (see previous posts: PAS; monuments), there are certain tentative archaeological conclusions that we could draw. The picture I see in the animation is one of relatively widespread activity in earlier prehistory, which intensified in the south and east in the Late Iron Age and especially through the Roman period, with late Roman and especially early medieval activity being particularly focused on the central / southern / eastern area of England (essentially Cyril Fox’s lowland Britain). Whether this remains the case as we build in more sources of evidence, remains to be seen.

Chris Green

References:

Green, C.T. 2011. Winding Dali’s clock: the construction of a fuzzy temporal-GIS for archaeology.  BAR International Series 2234.  Oxford: Archaeopress.

CAA 2014, Paris

I just returned from this year’s Computer Applications and Quantitative Methods (CAA) conference, which was held in Paris last week.  Overall, the conference was a great success, despite a number of teething troubles (particularly with IT support [ironically?]).

1622434_639379606132117_4733947181406672037_o

I spoke on the Friday morning about using Heisenberg’s Uncertainty Principle as a metaphor for good cartographic practice.  I’ll try to write more about that at a later time.

10256777_639379412798803_365403414691508152_o

One particularly impressive visualization of data that I saw was Lost Change, which maps PAS coins and their mint locations.  Another very interesting paper I heard was about MicroPasts (another British Museum backed venture), which is designed to allow archaeologists to access crowdsourced labour and crowdfunded funding.  I also enjoyed Philip Verhagen’s paper, as his project is encountering many of the same data rationalization issues as our own (and he only has to work with a single source database, rather than the 70+ that we are trying to combine).

10247238_639798242756920_1798436893331002653_n

There is a storify of the conference tweets here: http://storify.com/EngLaID_Oxford/caa-2014-paris

Next year’s CAA will be in Siena, Italy.  They know how to pick places with good food and good cheer!

Chris Green

PAS ‘affordances’

Building out of the context of Anwen’s recent work on her Isle of Wight case study, we have recently been playing around with sampling biases in the PAS.  This is in very large part based upon the pioneering work of Katie Robbins, who did her PhD and is doing a postdoc on the subject (see references below: Katie’s thesis is available online).

Katie discussed many different relevant factors in her work, but three stood out to us as being particularly suitable for spatial modelling on a national scale: land cover, obscuration, and proximity to known monuments.  Other factors, such as landowner permissions or proximity to detectorists’ houses, would be very difficult to map nationally without a great deal of work.

Land cover: Using a simple reclassification of LCM 2007 data (via Edina Digimap), around 69% of PAS findspots of our period fall upon arable land, c.21% on grassland, c.4% in suburban areas, just short of 3% in woodland, and c.1% in urban areas. Other land cover types each accounted for less than 1% of PAS findspots.  The affordance surface constructed for this category was given a weighting of 1.0 for arable cells, with each other type given a weighting relative to this (e.g. grassland was given a rating of 0.2133/0.6914 or 0.31).

Obscuration: Various other factors should completely block out the possibility of finding artefacts through metal detecting (although other finding methods might still result in discovery, such as finding something sitting on a molehill whilst on a walk). Easily mappable elements that fall within this category are: scheduled monuments (via EH), Forestry Commission land (via the Forestry Commission), ancient woodland, country parks, local nature reserves, national parks, RAMSAR sites, SSSIs (all via Natural England), and built up areas (via OS OpenData).  The affordance surface was constructed by combining shapefiles for all of these elements, calculating the percentage obscuration of 1 by 1km grid cells and then constructing a kriged surface from the centroids of that data with 100x100m cells.  This was then reclassified so that 0.0 was high obscuration (i.e. low affordance) and 1.0 was low (i.e. high affordance).  Incidentally, the South Downs National Park is the one National Park with a relatively high number of PAS finds, as this was only founded in 2011, but I decided not to correct for this at this time.

Proximity to monuments: I undertook a simple spatial concurrence test of 1 by 1km grid cells (via our latest synthesis iteration: see this post for discussion of methodology) of presence of finds against presence of “monuments” (in the broadest sense) of each broad monument class for each of our period categories (e.g. Roman finds vs Roman agriculture and subsistence).  The major areas of concurrence between (broadly) contemporary finds and monuments were with Roman monuments of most types and early medieval monuments of a funerary nature.  Centroids of grid cells containing Roman monuments of most types or early medieval funerary monuments were used to construct a kernel density estimate layer, which was then tested against the PAS distribution for our period.  However, the relationship was not particularly strong, therefore this layer was reclassified so that any value above the first quantile of the surface was given an affordance value of 1.0, with values below that being classified relative to the first quantile.

The relationship between these three derived affordance surfaces and the relevant PAS data was then graphed to see how valid the model appeared.  Each line produces something close to the expected pattern.

biases_graph
Comparison of different PAS affordances, inc. mean of three coloured lines.

Combining the three input factors into a mean averaged model produces a very strong result in terms of spatial patterning.  Looking at the black combined line on the graph, we can see that c.60% of PAS records have an affordance (‘bias on the axis title’) value of over 0.8 and that c.90% exceed 0.6.  This is a strong pattern, showing that areas of high affordance on our map are much more likely to feature PAS finds than areas with low affordance.

Plotting individual findspots onto the map of this surface shows that most fall within high affordance areas.  We can also see this quite clearly if we plot a kernel density estimate of PAS finds (Bronze Age to early medieval) over the affordance surface (red is low affordance, blue is high), although the interpolation does result in some false overlaps with small areas of low affordance (particularly in East Anglia):

PAS_affordance
Main distribution of PAS finds of our period (Bronze Age to early medieval) over PAS affordances surface.

Two things stand out from this map: (a) that finds cluster in areas of high affordance; and (b) that there are areas of high affordance with few finds.  (a) is an excellent result as it shows that the model is teaching us something valid.  (b) can be explained in several possible ways (most likely a combination of all): differences in detecting practice / differences in reporting practice / the presence of other biases feeding into affordance  but not included in the model.

There are some areas of “double jeopardy” feeding into this model, particularly between the obscuration and land cover layers (e.g. buildings appear in both land cover as urban / suburban and in obscuration; most national parks are of an upland / wild character in land cover).  However, as the pattern seems robust, I am not too worried about this for now.  A more developed model might, instead of the mean average of the three surfaces, be the mean average of the land cover and monument surfaces multiplied by the obscuration surface.  I will experiment with this later, perhaps.

As such, although our model is clearly not perfect (but then, no model ever will be), it does help us to understand something of the underlying affordances helping to shape the distribution of PAS data.  The next stage in this analysis will be to use the affordance surface to try to smooth out variation caused by this factor in our PAS distributions.

Chris Green

References:

Robbins, Katherine.  2013a.  From past to present: understanding the impact of sampling bias on data recorded by the Portable Antiquities Scheme. University of Southampton, Archaeology, Doctoral Thesis.

Robbins, Katherine.  2013b.  “Balancing the scales: exploring the variable effects of collection bias on data collected by the Portable Antiquities Scheme.” Landscapes 14(1), pp.54-72.

Extracting trends (VII)

Further to my previous post, I have now had another go at constructing trend surfaces for the four broad main periods covered by this project.  This time, however, I have filtered out records that are explicitly related only to artefact findspots (for each period).  This was in an attempt to downplay the influence in the previous trends from differential inclusion of PAS material between HERs.  The remaining records should, hopefully, thus primarily relate to sites with other archaeological evidence beyond just one or more artefacts.

Here are the results (to the same attribute scale as previous):

Trend surface for Bronze Age HER data, exc. findspots
Trend surface for Bronze Age HER data, exc. findspots
Trend surface for Iron Age HER data, exc. findspots
Trend surface for Iron Age HER data, exc. findspots
Trend surface for Roman HER data, exc. findspots
Trend surface for Roman HER data, exc. findspots
Trend surface for early medieval HER data, exc. findspots
Trend surface for early medieval HER data, exc. findspots

Comparing to the previous surfaces, we can see a general reduction in trend peaks, especially over Norfolk and Yorkshire.  The Bronze Age remains similar to previous; the Iron Age also, albeit with much lower peaks; the Roman period shows an increasing strength across Gloucestershire; the early medieval shows the most distinct reductions in eastern regions.

Chris Green

CAA 2013 and more on PAS fuzziness

I have just got back from the 2013 Computer Applications in Archaeology (CAA) conference in Perth, Australia.  The conference was held in the University Club at the University of Western Australia:

UWA

UWA is in western Perth, close to the estuary of the aptly named Swan River:

Swan River

The conference overall was a fun one, with particularly interesting presentations by Oxford’s own John Pouncett and by his boyhood mentor, Dominic Powlesland.  I presented a paper in John and Gary Lock’s session on spatial scale, about how different scales inter-operate in the context of Englaid data.  I will summarise it on here at some point in the future, once we’ve thought through our ideas a bit more.

After the conference, I explored some distinctly non-English landscapes:

Pinnacles Desert

Moving on from my holiday snaps, I have been thinking a little more about temporal fuzziness with regards to PAS data (see previous post).  This time, I built in the data contained in the early medieval coin corpus at the Fitzwilliam Museum, Cambridge (EMC), to provide extra detail for the post-Roman period.

Using the “standard” time brackets discussed in the previous post, I then divided the data up according to (some of) our broad object type categories.  These are what we call “soft” categories, so that certain types of object can appear in more than one category (e.g. axes are categorised as both weapons and tools).  We can then produce graphs of the summed probabilities for each type, showing change in their deposition over time (x-axis is time, y-axis is summed probability):

PAS & EMC summed probability
Summed probability curves for combined PAS and EMC data, divided by broad type

Obvious things to note are the peaks in coinage deposition in the late Iron Age and the 4th century and the peaks in personal decorative items in the early Roman and during the earliest early medieval.  However, because of the vastly different amounts of objects found in each category and in each time period bracket, it is hard to pick out subtler patterning.  To do so, we can calculate the mean value and standard deviation for each category and then express the values in variation from the mean (in standard deviations) for each category (x-axis is again time, y-axis is summed probability in plus or minus standard deviations [0 is the mean, +1 is +1 st. dev., -1 is -1 st. dev., etc.]):

PAS & EMC summed probability, stdev
Summed probability curves for combined PAS and EMC data, divided by broad type, plotted by standard deviations from mean value

This graph then shows the same patterns we could draw out from the previous graph, but brings out various other details.  Most obvious is the huge peak in weapons and tools in the Bronze Age (especially later), but other patterns also come out (relatively high amount of tools during the Roman period; relatively high amount of weapons [i.e. average] in the earliest early medieval; etc.).

Similar graphs could be produced for regions of the country, rather than types, or for types within a region of the country.  These ideas still need further exploration, but I think they begin to show the power of using a fuzzy probability approach to the analysis of the temporality of our data.

Chris Green

Fuzzy time (and the PAS)

We’ve been thinking recently about why and how we might apply the concept of temporal fuzziness (uncertainty) to our data, particularly because it is a research interest of mine (see Green 2011 for more details).

The reason why dealing with temporal fuzziness is important is well illustrated by the following graph, based upon the work of Frédéric Trément.  The graph shows how a dating of this villa site based purely upon the well-dated finewares would disguise the fact that the villa was very active into the fifth and sixth centuries, which actually account for the greatest amount of coarseware pottery.  If you ignore the coarsewares because of their poor dating, thus, you produce a false narrative of the history of activity on the site.

Comparison of fineware and coarseware dates from the villa site at Sivier, France (redrawn from Trément 2000 - Fig 9.16)
Comparison of fineware and coarseware dates from the villa site at Sivier, France (redrawn from Trément 2000 – Fig 9.16)

One way in which we can include less closely dated material in our analyses is to take account of temporal fuzziness.  In essence, this means defining a set of sub-periods and then calculating the probability (as a percentage in this simplest instance) of each object in the dataset falling within each sub-period.  This is essentially an adaptation of aoristic analysis, created for the study of crime patterns by Ratcliffe (his 2002 paper covers a more robust method than his previous work) and experimented with by various archaeologists.  Where appropriate, we can then sum these probabilities for each time-slice, to produce a model of changing deposition over time.

The most obvious dataset of ours to apply this fuzzy temporal analysis to is the PAS (Portable Antiquities Scheme) data.  This is because most PAS records represent a single object which has had start and end dates defined for it by the PAS team.  Some records need start and end dates adding (based upon the start and end periods, or in the absence of those, the broad periods) and some records need their start and end dates correcting (typically where they have been mistakenly reversed or where dates BC have not been given negative numbers), but all of this is possible to automate using Python scripts.  Once this data standardization has been completed, it is then possible to define a set of sub-periods and calculate the probability of each object falling within each sub-period (again, using a Python script).

PAS: summed probability by century
PAS: summed probability by century

The graph above shows the summed probabilities of PAS data when calculated and collated by centuries.  We can see here the general temporal profile of the PAS for our period, involving low levels of Bronze Age finds, increasing activity during the Iron Age, especially after the introduction of coinage, a massive increase through the Roman period, and then a return to lower levels of activity through the early medieval period.

PAS: summed probability by century: only objects of greater than 90% probability
PAS: summed probability by century: only objects of greater than 90% probability

The graph above then shows how the summed probabilities look if we only include objects with a greater than 90% chance of falling within each century.  Obviously, this example is a little fatuous, as it is not really very easy to date objects that precisely prior to the introduction of coinage, but it does make the point that only including very precisely dated material produces a biased temporal pattern.

PAS: summed probability by century: count of objects of greater than 0% probability
PAS: summed probability by century: count of objects of greater than 0% probability

At the opposite extreme, the graph above shows the count of objects within each century that have a greater than 0% probability of falling within said century.  Thus, in this graph, if an object spans three centuries, it is counted equally in all three.  Naturally, this method then produces another biased temporal pattern, this time over-representing activity in each century.

As such, the first graph, which takes account of the probability of every object is, in my opinion, the most honest representation of the temporal pattern.  However, as hinted above, century brackets are not really ideal, as objects can only very rarely be dated that precisely before coinage came into use.

PAS: mean probability by century
PAS: mean probability by century

This graph shows the mean probability (from 0.0 [0%] to 1.0 [100%], albeit the graph doesn’t scale that far) for all of the objects within each century bracket.  It shows that (on average) Middle Bronze Age and most Iron Age material is coarsely dated, that Later Bronze Age and Late Pre-Roman Iron Age material is better dated, that Roman material (particularly 4th century) is finely dated, and that most early medieval material falls somewhere between the prehistoric and Roman data in terms of its precision of dating.  The very low probabilities in the 5th century partially reflect the post-Roman transition, but are likely to be largely caused by the huge bulk of essentially 4th century Roman material that has been given an end date of 410 or 411.

The conclusion I draw from this graph is that we need to vary the width of our sub-periods over time to reflect the changing level of precision of dating within each period.  This ought to produce the most useful representations of temporal pattern, and is as equally simple to calculate as fixed century blocks.

PAS: alternative sub-period groupings
PAS: alternative sub-period groupings

This final graph, then, shows the summed probabilities for three different possible sets of sub-periods.  The y-axis is the summed probability and the x-axis is time from -1500 (1500 BC) to +1065 (AD 1065).

The orange line shows the same century brackets as before, which clearly is the worst model, as it reduces prehistory to a low-level trace yet also removes significant change in the Roman period (notably the sharp drops around AD 200 seen in both other lines).

The blue line shows a set of conventional sub-periods.  This shows a much more interesting temporal pattern than the century brackets.

The red line shows an alternative set of sub-periods, designed to break away from conventional dates assignments and to take more account of the changing rate of dating precision over time.  This is probably my preferred model, but there is no reason to make hard and fast choices, we can continue to experiment with multiple sets of sub-periods for now.

In the context of our project, this is very much preliminary work, intended to test out some of the possibilities of working with fuzzy temporality using our datasets.  I have also begun experimenting with building the EMC (the Early Medieval Corpus of Coin Finds maintained at the Fitzwilliam Museum, University of Cambridge) into this dataset.  There is also potential for doing something similar with the HER data that we have gathered, albeit implementation is more complex due to the variable structure of that data.  Once we have our methodology nailed down, it will become possible to construct graphs like the final one above for different types of object or for different regions of England.  We could also create a series of maps showing changing probabilities over time, perhaps combined into animations.

Whether this proves fruitful, only time will tell, but I do believe that this type of analysis has great potential for helping to explore continuity and change in EngLaId data.

Chris Green

References:

Green, C.T. 2011. Winding Dali’s clock: the construction of a fuzzy temporal-GIS for archaeology.  BAR International Series 2234.  Oxford: Archaeopress.

Ratcliffe, J.H. 2002. “Aoristic signatures and the spatio-temporal analysis of high volume crime patterns.” Journal of Quantitative Criminology 18(1), pp. 23-43.

Trément, F. 2000. “Prospection et chronologie: de la quantification du temps au modèle de peuplement. Méthodes appliquées au secteur des étangs de Saint-Blaise (Bouches-du-Rhône, France).” In Francovich, R. and Patterson, H. (eds.) Extracting meaning from ploughsoil assemblages. Oxford: Oxbow, pp. 77-91.

Extracting trends (II)

Following on from my previous post about trend surfaces, I have now completed my proposed next step in extracting trends from the PAS data, by creating trend surfaces for the finds associated with each of our broad time periods.  To begin, here is a reminder of the trend surface for all PAS data:

1 PAS_trend_all
12th power trend surface for PAS data (all periods)

Now by period:

2 PAS_trend_BA
12th power trend surface for PAS data (Bronze Age)
3 PAS_trend_IA
12th power trend surface for PAS data (Iron Age)
4 PAS_trend_RO
12th power trend surface for PAS data (Roman)
5 PAS_trend_EM
12th power trend surface for PAS data (early medieval)

Although these should all be treated as very rough models, certain things do stand out.  For the Bronze Age, it is obvious that the main peaks seem to be occurring in the far west of Cornwall, the Isle of Wight and Suffolk.  In the Iron Age, the peaks are low, but I do think I can perhaps see the main areas of circulation for coinage showing up in the blue tones.  The Roman picture largely replicates the pattern for all periods (which is unsurprising as Roman finds make up a large proportion of the PAS database), but with a notable lack of finds in the west country.  For the early medieval, the peaks seem to be around the east and south-east coast.

To stretch interpretation perhaps beyond where I should, I might suggest that the fact that the peaks in the Roman and early medieval period are in similar locations to the overall dataset suggests that the dominant attribute governing where finds of these date are found would be where metal detecting activity is most popular (of course, this is partly self-fulfilling, as people are bound to find metal detecting more appealing in areas with lots of metalwork in the soil).  However, the fact that the peaks in the prehistoric periods are somewhat differently located suggests to me that patterns in the these data are more likely to be genuine representations of past behaviour.  Perhaps?

Chris Green

NB:  Scotland and Wales are marked as terra incognita on my maps as they are outside of EngLaId’s spatial remit.