## Abstract

If a change of use for industrial land is proposed in the UK, there is usually a requirement to demonstrate that the change of use will not result in the land becoming Contaminated Land, as defined under the Environmental Protection Act 1990. Under certain circumstances, this demonstration can be made by showing that the mean concentration of contaminants of potential concern is below a suitable assessment level appropriate to the proposed new use. How much sampling effort is required for this purpose? Using a relatively large dataset for arsenic in soil, a developed approach is presented to determine the number of measurements required for a clearance investigation to demonstrate absence of contamination based on minimizing expectation of financial loss, taking into account both the actual cost of investigation and the possible cost of incorrectly determining that contamination is still present and undertaking unnecessary remediation. Abstract probabilities are discussed in terms of money spent and money potentially saved.

**Supplementary material**: an Excel spreadsheet containing the full dataset, showing its sub-sampling, is available at https://doi.org/10.6084/m9.figshare.c.4415039

Investigation of potentially contaminated land is a commonplace activity, the objective being to determine whether measures are necessary to reduce risk to receptors. Contaminated land is defined in the UK by the Environmental Protection Act 1990 as amended (HMSO 1990), and by associated statutory guidance in England, Wales and Scotland (Scottish Executive 2006; DEFRA 2012). This paper is limited to consideration of a proposed new use, where UK practice is to require the positive demonstration of a low risk of contamination (CIEH 2008). There will be one of two outcomes: investigation demonstrates that the land is probably not contaminated in the context of proposed use and no further action is needed, or that the land probably is contaminated and either intervention of some kind is necessary or the proposed use must be changed.

A general framework for contaminated land investigation and management is set out in CLR11 (Environment Agency 2004). Good practice for implementing site characterization is provided by BSI (2011). Notably, these recommend that a preliminary investigation is conducted before the main phase of investigation is planned.

Investigation is generally time-consuming and often costly. There are obvious advantages in minimizing cost and time delay. It is also highly desirable that the investigation produces the ‘right answer’, i.e. that contaminated land is correctly identified for remediation, and uncontaminated land is correctly identified as requiring no further intervention. Absolute certainty in this is not possible with finite investigation effort, but the probability of error can be managed using statistical techniques to calculate the number of measurements (soil samples) needed.

For the situation considered here, the decision as to whether or not the land is contaminated will be made by comparing the concentration of the contaminant, averaged over a relevant area or volume, with an assessment level. If it cannot be demonstrated that the average concentration is less than the assessment level with the required degree of confidence, the land is considered to be contaminated. This is the ‘planning scenario’ discussed by CIEH (2008). The area or volume concerned will have been determined with reference to the conceptual model of contamination and the relevant contaminant linkages derived from this; guidance on doing this is provided by Environment Agency (2004). The average over an area is relevant only for a contaminant linkage that effectively samples the whole area in an unbiased way, for example a grazing animal, or the cultivation of crops for food.

Neither BSI (2011) nor Environment Agency (2000) provide guidance on designing a sampling scheme to provide the input to CIEH (2008); their emphasis is more on the design of a sufficiently dense grid to detect small areas of elevated concentration within a larger area. Suggested sampling densities cover the range of one sampling point every 10–50 m, giving sampling densities ranging from 1/100 m^{2}–1/625 m^{2} for ‘main investigations’ and 1/625 m^{2}–1/2500 m^{2} for ‘exploratory investigations’.

Valuable guidance on the principles of investigation design is provided by the US Environmental Protection Agency using the idea of Data Quality Objectives (DQO) (US EPA 2006). US EPA (2006) presents a method for determining the number of measurements required to support decisions made with a specified degree of certainty. The calculated number of measurements depends on four variables, one of which may be specified by, or agreed with, regulators, and one of which may be estimated from available data. There is no clear rationale established for choosing values for the remaining two variables, although there is interaction between them. This paper presents an approach to resolving this, based on minimizing expectation of financial loss, providing a unique solution controlled by the expected cost of remediation should it be necessary, the likely cost of a ‘clearance’ investigation, and the results of a preliminary investigation. Both the remediation cost and the investigation cost are likely to be reasonably well known, especially the latter. A ‘clearance’ investigation here is one designed to determine whether the average contaminant concentration is below the assessment level. A clearance investigation is likely to be considered either where a preliminary investigation has suggested that contamination is absent, or to demonstrate success after remedial work has been undertaken.

In this case, the objective was to demonstrate that no further remedial action was necessary before an area of land (5.2 Hectares) at the decommissioning Dounreay nuclear site could be signed-off as ‘suitable for any reasonable use’. Given the location of the area of land and the use of adjacent land, the most likely future use is grazing animals raised for meat. The area of land had never been used for any nuclear purpose, but was a contractors’ camp and laydown area in the 1970s. A number of investigations had been conducted over the last 16 years for various purposes, with the result that a larger than normal dataset was available for arsenic (138 measurements, average density 1/375 m^{2}), although historical research had not suggested that there were any grounds to suspect the presence of anthropogenic arsenic contamination. The substance is routinely analysed as part of a suite of toxic elements. The data are shown in Figure 1.

The large dataset enabled a robust demonstration to be made that the average arsenic concentration in the study area does not exceed the chosen assessment level (32 mg kg^{-1}). However, there is the obvious question – how many results would have been sufficient to demonstrate this, had an investigation for this specific purpose been conducted? The discussion will consider the size of a sample of measurements, here called the data-sample, i.e. how many measurements have been obtained. This is distinct from the soil aliquots (samples) on which measurements have been made.

## Designing a clearance investigation

### Theory

The starting position for design of a clearance investigation is that there is already a small data-sample from a preliminary investigation, probably collected in a targeted way and therefore likely to be biased, but in any event only a small number of measurements. The mean of this data-sample is less than the assessment level but possibly not all individual results are, suggesting that the population arithmetic mean *μ* might be below the assessment level (*AL*). However, we cannot demonstrate this convincingly with the data available so far. In formal terms, the null hypothesis H_{0}:*µ* > *AL* cannot be rejected in favour of the alternate hypothesis H_{A}:*µ* = *AL* with the required level of certainty (likely to be agreed with the regulator); by default, we assume that the land is contaminated and seek to show that this is unlikely to be true (CIEH 2008). The population mean is suspected to be truly no more than the assessment level, but this cannot be determined with the required certainty because there are too few measurements and the data-sample is possibly biased as a result of targeted sampling.

(Note that, if the mean of the data-sample from the preliminary investigation exceeds the assessment level, the most reasonable belief is that the site is contaminated, at least in part, and further work should first address this possible contamination, before proceeding to a clearance investigation. If all the measurements are below the assessment level, then clearly it is a reasonable belief that the site is not contaminated, but if there are only a few measurements and the selection was biased, is this sufficiently certain? More results generally produce more certainty.)

What is the best that can be done?

There are two hypothetical extreme options with associated possibility of benefit or financial loss.

Option 1 – remediate without doing any more investigation.

If the area proves to have been contaminated (deduced from waste sentencing results), the cost of remediation was well spent, although it may have been possible to remediate less.

If the area proves to be uncontaminated (sentencing of waste demonstrates that it is all clean), the remediation cost was a total loss.

Option 2 – collect a very large amount of data, hoping to make a case for no remediation.

If the area proves to be contaminated, the investigation cost was a total financial loss (although it may yield some information to assist the design of remediation).

If the area is demonstrated to be clean, then the investigation cost was well spent because it saved the cost of remediation (although it may have been possible to make this case with less data and therefore less investigation cost).

Given that collecting data is expensive, albeit cheaper than remediation (should remediation be cheaper than data collection, then remediate!), what is the optimum amount of data? It is considered here that the optimum amount of data is that which produces a decision with the required degree of certainty and the lowest expectation of financial loss.

Investigation must proceed since there is insufficient information at this stage to justify a ‘no remediation’ decision, so this factor of the cost is certain and its magnitude is a function of investigative effort. It is at least possible to make a guess about the cost of remediation – the effect of uncertainty in this cost on decision-making can be assessed later. The possible degree of contamination needed for this estimate is known from the preliminary investigation. For the purposes of the discussion it will be assumed that the remediation cost does not depend on the amount of investigation made and is thus a constant. Note that any such remediation is likely to be targeted, rather than blanket remediation of a whole site.

Since remediation is much more expensive than a ‘reasonable amount’ of investigation, but a very large amount of investigation is also very expensive, it seems likely that there is some balance possible, a proportionate investigation that minimizes the expectation of financial loss.

The test is H_{0}:*µ* > *AL* against H_{A}:*µ* = *AL* where *AL* is the assessment limit.

There are four possible outcomes:

H

_{0}is accepted correctly; conclude that the site is contaminated;H

_{0}is rejected correctly; conclude that the site is not contaminated;H

_{0}is rejected incorrectly; conclude that the site is not contaminated when it is. The probability of this (Type 1 error) is*α*;H

_{0}is accepted incorrectly; conclude that the site is contaminated when it is not. The probability of this (Type 2 error) is*β.*

In this case, Type 1 error results in risk to the public and should clearly be controlled to a low value; Type 2 error results in an unnecessary cost to the developer, who may wish to control this. But how?

We define an interval Δ below the *AL* in which we are prepared to accept Type 2 error. In the US literature this interval is referred to as the ‘gray region’. This is shown in Figure 2. *AL* = 32, Δ = 10; we are prepared to accept a 10% chance of deciding to undertake remediation even if the true value (unknowable) of the concentration is as low as 32–10 = 22 (22 mg kg^{-1} is the background level of arsenic in the area investigated). In the UK ‘grey’ has the connotation of ‘unsatisfactory’ and therefore this name may not be helpful. We are prepared to accept results in the ‘gray region’, even though there is a risk of undertaking remediation that is found to have been unnecessary and a waste of effort.

If the standard deviation of the population estimated from the sample is *s*, then it can be shown that the required number of samples *n* is*t* is the one-tailed Student's *t*-distribution for *n*−1 degrees of freedom (US EPA 2006, equation A7). This is the *t*-distribution that also appears in CIEH (2008) and the same applicability condition applies, that the underlying population has a distribution that is reasonably close to normal.

This equation must be solved iteratively since *n* appears on both sides of the equation. An approximation has been derived which can be solved more easily*z* is the standard normal distribution (US EPA 2006, equation A8).

It can be seen that the required number of samples is defined by four variables: *α*, *β*, *s* and Δ. The risk of incorrectly deciding that the site is clean may be specified by or agreed with the regulator; a value of *α* = 0.05 is often used, i.e. 95% certainty that the site is clean (CIEH 2008). The population standard deviation *s* can be estimated from site data assuming that a suitably designed preliminary investigation has been conducted, which captures the range of contaminant concentrations across the areas of interest. However, no obvious rationale has been provided for choosing *β* and Δ. Assignment may be arbitrary (see examples in US EPA 2006), or there is often experimentation with the two variables until the estimated number of samples seems (on the basis of various criteria) to be reasonable and affordable (see, for example, Ofungwu 2014). This is not a robust approach to designing an expensive piece of work with implications for public safety and a more rational approach is presented below.

### Optimized approach

The optimum data-sample size is considered here to be the one that results in the lowest expectation of financial loss in making a satisfactory case to the regulator that the site is clean. To determine the optimum data-sample size, it is necessary to relate expectation of financial loss to the data-sample size, and then to seek the data-sample size that minimizes the expectation of financial loss, calculated by equation (1).

The cost of investigation *C* is expected to be a simple function of the data-sample size (number of measurement results), perhaps a fixed sum for mobilization and reporting *a*, and a cost per measurement for soil aliquot collection and analysis *b,* giving a total cost of investigation of

The expectation cost of wasted remediation is the estimated cost of remediation, multiplied by the probability that it was unnecessary, which is *β*. It could be argued that this is unrealistic, since the remediation may or may not occur; *β*% of it will never occur. However, it is the expectation of loss, not the actual loss, that is being considered here as an input to making a decision. House insurance premiums are calculated in a similar way – you pay an amount based on the expectation your insurer has of having to pay for your house to be re-built, which is less than the cost of re-building the house.

This approach resembles that advocated by Ramsey *et al.* (2002) to determine the optimum level of uncertainty in single measurements used to identify the presence of contamination. This considered the expectation costs of incorrect decisions, compared with the increased cost of more accurate measurement, and showed that there is an optimum uncertainty. They concluded that it is generally not worth paying a high price for very accurate measurements.

The interval Δ must still be considered. It defines the lower boundary of the region in which there is acceptance that unnecessary remediation may occur. Given that there is a first-pass-estimate of the mean concentration *AL*, this boundary should be set so that it excludes this estimate, otherwise there is a high risk of remediation being required and we are seeking to confirm our suspicion that remediation is not required. Thus *AL* is the assessment limit. It would be possible to experiment with other calculations for Δ; another possibility is that it is set to exclude the background value of the contaminant, since remediating background is usually undesirable. The values of *s* and *α* are known, as discussed above. With *α*, Δ, *s* specified, it is possible to solve equation (4) for *β* as a function of the data-sample size *n*. Given costs of investigation and remediation, it is then possible to calculate expectation of financial loss as a function of data-sample size (see Fig. 3), and the data-sample size that minimizes the expectation of the total cost of investigation + expectation of loss. This is the financially optimum number of measurements, given the available information and the desire to avoid unnecessary remedial work.

The shape of the blue curve in Figure 3 and, therefore, the position of the minimum, depends on: remediation cost, fixed cost for mobilization + reporting, per sample cost for investigation, and offset Δ. Risk of Type 1 error is specified, and an estimate of *s* is known from site data. Figure 3 is straightforward to program in Excel and therefore this provides a simple tool to explore the sensitivity of the optimum data-sample size to the less well-known value of remediation cost. The values of Δ and *s* have been set on the basis of the best available information, which is the data-sample from the preliminary investigation. It is possible to explore the consequences of choosing a smaller value of Δ and a larger value of *s* or increased remediation cost. The result will be an increased optimum data-sample size and increased investigation cost. The risk of unnecessary remediation will reduce, but by an amount that cannot be quantified. There comes a point where

Figure 3 has been calculated using the simple linear function relating the cost of investigation to data-sample size given above, and using the simple formula for data-sample size (equation 4 above) for which explicit functions are available in Excel. This makes for an easy graphical display. The method is extendible in principle to more complex cost functions, and to more complex data-sample size expressions, although these may then need to be solved iteratively.

The actual data-sample size that is chosen is finally a management decision, which should include other considerations, including the nature of the source area and the contaminant linkages in the conceptual model, but by using this approach it is made in the informed context of the number of soil samples likely to demonstrate clearance given the available information and a desire to minimize expectation of financial loss, and in the context of understanding how this relates to already known information. It is easier to appreciate abstract risks when these are presented in terms of money. It is notable that the curve in Figure 3 is asymmetrical: away from the optimum, the total cost climbs more steeply for a smaller number of samples, than it does for a large number of samples.

If more than one contaminant is suspected, the optimum number of results for each can be calculated, and a decision made to proceed with the largest number.

### Verification of approach

These ideas were tested with the arsenic dataset described above. The arithmetic mean concentration was 28 mg kg^{-1} and the estimated standard deviation of the population was 11 mg kg^{-1}. The distribution of values was close to symmetrical about the mean but statistical testing showed that the data were not normally-distributed (Fig. 4). No spatial trends were apparent in the dataset – it appeared to be a data-sample from a single population. There had been no particular reason to believe anthropogenic arsenic contamination was present but it was analysed as part of a standard ‘toxic metals’ suite. A soil guideline value (SGV) for residential use of 32 mg kg^{-1} has been published for arsenic (Environment Agency 2009) and was used here as the lowest published number, although residential use of the area is unlikely given its remote location. Other data collected by Dounreay Site Restoration Ltd suggested that the wider regional soil arsenic concentration was 22 mg kg^{-1} with variability associated with fault-zone mineralization containing arsenical pyrite (Milodowski *et al.* 2013). The background value is less than one standard deviation from the SGV. More widely, the British Geological Survey (Johnson *et al.* 2012) gives a limiting background arsenic concentration for England, excluding mineralized areas, of 32 mg kg^{-1}, so the results for the Dounreay area do not seem exceptional.

For demonstration purposes, we set *α* = 0.05 in line with common practice, and arbitrarily, *β* = 0.1. We set Δ = 10, being the difference between the assessment level (32) and the local background (22). In doing this, we are saying that we accept no more than a 10% chance that we will be required to remediate, even if the true concentration is as low as the regional background. Application of equation (3) results in determining that a data-sample size of 22 is necessary. A data-sample size of 22 was simulated by taking 18 pseudo-random data-samples of size 22 from the entire data-sample of 138 results (see the Supplementary material for the method and the complete dataset). For seven of them, the 95%UCL (Upper Confidence Limit, see Heathcote 2018) calculated using the *t*-test method (CIEH 2008) exceeded the SGV and remediation would have been suggested (18 sub-samples is too few for a robust assessment of the error-rate, but suffices to show that it is high). This is not unexpected – in setting *β* = 0.1 we have indicated acceptance of a probability of unnecessary remediation of 0.1 if the true concentration is 22 mg kg^{-1} and we know that the likely true value is rather larger than this, on the basis of the large data-sample of 138 measurements. We know from the 138 results that the population mean is estimated to be below 32 and it can be shown that this is with >99% certainty (*t-*test method); therefore, on this basis, remediation is not justified. A data-sample size of 22 gives a relatively high probability of unnecessary remediation and is clearly too small, given what we actually know. However, since the 138 available data indicate that the mean is less than the assessment level to greater than 99% (*α* < 0.01) certainty, we could have met the requirement for 95% certainty with less data, avoiding cost.

For illustrative purposes, the calculation was run assuming a mobilization cost of £1000, a soil sample + analysis cost of £100 (sample collection from trial-pits and analysis for ‘metals’) and reporting at £5000, with Δ set to 4 (assessment level = 32 less best estimate from available data = 28). For the purposes of this example, remediation was set at a notional £100 000. The optimum number of samples is now 102, with an expectation of loss of *c.* £18 500 (Fig. 3). The optimum value of *β* is 0.02. Note that if *β* = 0.2 is used, as is often recommended, the required number of samples without changing other parameters is 48, but the expectation of loss increases to *c.* £30 800 because of the increased risk of unnecessary remediation. The saving in sampling cost is not worth the increased risk of having to remediate clean land. The high probability of deciding that remediation was necessary had only 22 measurements been made has been demonstrated above. If the analysis cost is higher (say £500 per sample (not unrealistic if radiochemical analysis is needed)), the optimum sample size is smaller (60) and the expectation of loss is higher (*c.* £48 700). The optimum value of *β* is 0.13 – the balance has shifted in favour of a slightly higher risk of unnecessary remediation, because of the greater investigation cost. These results are shown in Figure 5.

The larger number of samples required by using this approach, compared with the number calculated by equation (4) with arbitrary input data, is as a result of taking note of the information already available from a preliminary investigation. This showed that the best estimate arsenic concentration was close to the assessment level, and therefore that the greater certainty of avoiding unnecessary remediation resulting from a larger data-sample size was justified. Rather than choose an arbitrary value for *β*, a value resulting in minimum expectation of loss was chosen, taking into account the competition between the cost of acquiring data and the desirability of avoiding unnecessary remediation. If there is no preliminary investigation before the clearance investigation, arbitrary decisions will be necessary and the probability of unnecessary remediation will be uncertain. In making a decision to proceed to a clearance investigation without preliminary investigation, it may be useful to compare the cost saved by not undertaking the preliminary investigation, with the remediation cost that might be incurred.

Once the clearance data-sample has been collected, there will be revised estimates of *s* and whether or not H_{0} can be rejected with the required degree of certainty will be reassessed. If H_{0} is rejected, then all is well. If H_{0} cannot be rejected, it may then be appropriate to consider how strong the support is for remediation. Using the new value for *s*, at what value of Δ does *β* and the associated risk of paying for unnecessary remediation become small enough to seem reasonable? Is the required data-sample size now unfeasibly large, in which case a different strategy will be needed? The decision to remediate or not will always be a management decision, possibly discussed with the regulator, informed by the statistical assessment but not dictated by it.

## Concluding comments

If a preliminary investigation has concluded that the mean concentration of a contaminant is probably below the assessment level but this cannot be demonstrated with the required degree of certainty, then the data from the preliminary investigation should be used to calculate the size of data-sample needed for the clearance investigation. This can be optimized, taking into account the probability of incurring the cost of unnecessary remediation and the cost of acquiring data, to minimize the expectation of financial loss, given the mean and standard deviation from the preliminary investigation. Considering the abstract probabilities in terms of their financial consequences may assist in making management decisions. It is worth noting that the curves in Figures 3 and 4 are asymmetrical; expectation of loss increases more rapidly for fewer than the optimum number of results, than it does for more than the optimum number. It is important to remember that we are discussing probabilities – it is not possible to eliminate completely the possibility of making an error based on a statistical assessment, only to control the probability within known bounds.

Other approaches to making this decision are possible, including the ‘Bayesian’ approach, but the basics remain the same: using knowledge from the preliminary investigation, the limit of tolerable risk to the public, and taking into account both the costs of investigation and expectation of loss from nugatory remediation.

The final decision on the investigation protocol required is a management decision, perhaps discussed with the regulator, informed by the statistics but not dictated by it.

This investigation concluded that there was no significant arsenic contamination on the scale of the whole area considered and with no specific planned use. This does not preclude the possible need for further investigation if a specific sensitive use is proposed in the future for a small part of the area where higher than average arsenic concentrations were found during this investigation.

## Acknowledgements

The thinking presented in this paper was developed whilst the author was contracted to Dounreay Site Restoration Limited (DSRL) and helpful comments have been received from Richard Short of DSRL. This paper is published with the permission of DSRL and they are thanked for their assistance in preparing it. The author accepts full responsibility for the ideas presented.

## Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

*Scientific editing by Jonathan Smith; Nicola Harries*

- © 2019 The Author(s). Published by The Geological Society of London. All rights reserved