## Abstract

In the UK, the Environmental Protection Act requires land to be determined as contaminated if it contains concentrations of substances such that there is significant risk of significant harm to certain defined receptors. A key step in assessing the possibility of significant harm is the comparison of measured soil concentrations of potentially harmful substances with appropriate assessment levels considered to represent tolerable levels of risk. Guidance on making the comparison using statistical techniques was issued in 2008 by the Chartered Institute of Environmental Health (CIEH) and Contaminated Land: Applications in Real Environments (CL:AIRE) (the CL:AIRE–CIEH method). When making decisions on the basis of a finite number of measurements from a very large amount of soil, there is always the possibility of error. It has become apparent that the CL:AIRE–CIEH approach does not necessarily control the likelihood of determining that land is not contaminated when in fact it is contaminated, and this is more likely when using the Chebyshev test. Although there are methods to control such likelihood, there is no clear basis for the necessary specifications. The difficulties can be obviated by making a determination of contamination at 50% probability, which is consistent with the statutory guidance. Alternatively, a different approach may be taken to decision-making.

United Kingdom law (Environmental Protection Act 1990 as amended by the Environment Act 1995 and subsequent regulations) defines ‘contaminated land’ as ‘any land which appears to the local authority in whose area it is situated to be in such a condition, by reason of substances in, on or under the land, that significant harm is being caused or there is a significant possibility of such harm being caused’ (Section 78A (2) (a)). Such land is to be determined by the local authority as ‘contaminated land’ and the Act provides mechanisms for procuring its remediation. The process of identifying contaminated land is explained in more detail in statutory guidance documents issued by the various countries of the UK, which make it clear that contaminated land is defined in the context of the pathways and receptors resulting from the current use of the land. There is no specific offence either of owning contaminated land or of having caused land to become contaminated land. There are offences of causing pollution, but these are outside the scope of this paper.

If the use of land is proposed to be changed, it is possible that the new use will introduce new pathways and receptors, such that the land would be determined as contaminated in the context of the new use. The possibility of this undesirable outcome is considered in planning guidance notes issued under national land use planning legislation, and may result in a requirement for remediation before consent is given for the proposed new use under planning legislation, so that land capable of determination as contaminated is not created.

The assessment of the possibility of contamination is almost always made by comparing the concentration of a contaminant measured in a number of soil samples with an assessment level considered to represent the limit of tolerable risk. Strictly, this is an assessment of the possibility of harm. However, without at least the possibility of harm, land cannot be considered to be contaminated. The determination of such assessment levels is a matter of active debate (CIEH 2014; LQM–CIEH 2015) and is not discussed here. If the soil concentration exceeds the assessment level, harm may be significant; if below the assessment level, harm is not significant. However, it is noted that the assessment levels presented in the references were calculated for a low probability of harm; it is not clear that this is significant harm as envisaged by the Environmental Protection Act.

The various guidance documents issued by the authorities provide no guidance on comparing numerical measurements with an assessment level. However, a non-statutory guidance document has been issued by professional and industry bodies and this has wide acceptance (CIEH 2008). This will be referred to in this paper as the CL:AIRE–CIEH (Contaminated Land: Applications in Real Environments (CL:AIRE)–Chartered Institute of Environmental Health (CIEH)) guidance. It considers two decision scenarios: the determination of contaminated land under the provisions of the Environmental Protection Act, and the demonstration that a changed use of land will not result in a determination of contaminated land, under national planning legislation.

Given that a decision is being made about a parcel of land on the basis of a finite number of measurements, there is the possibility of making an incorrect decision in any particular case. The statistical techniques described in the CL:AIRE–CIEH guidance aim to control the likelihood of making an incorrect decision to an acceptably small value.

It is considered here unclear that this has been achieved in all circumstances, and a suggestion is made for a way forward.

## Making decisions

The legal test is ‘significant possibility of significant harm’. This is specified verbally rather than numerically and will in the end be a decision by the courts. It is not amenable to quantification. However, a significant possibility of significant harm implies that there must be at least a possibility of significant harm (DEFRA 2012, paragraph 4.9), and possibility of significant harm is more amenable to a quantitative approach. Several sets of assessment levels have been published for this purpose relevant to different land uses (such as residential, allotments or industrial estates; e.g. Environment Agency 2009). The measured concentration of a contaminant within the site of interest is compared with an assessment level appropriate to the actual or proposed use; if the measured value exceeds the assessment level then possibility of significant harm exists. Whether or not this possibility is such a significant possibility that action must be taken requires further assessment using additional non-numerical criteria including socio-economic factors, but these are not the subject of the present discussion.

Determining the measured concentration for a whole site exactly is typically impossible. Normal practice is to collect a number of measurements from aliquots of soil (typically in the range 10–100) that are considered to be ‘representative’ and to calculate the arithmetic mean of the results. (The word ‘aliquot’ is used here rather than the more usual ‘sample’, to avoid confusion with the statistical use of the word ‘sample’ used in this paper.) If the soil aliquots are appropriately chosen to be representative, the arithmetic mean concentration is an unbiased estimate of the true average soil concentration. It is not necessarily straightforward to choose an appropriate set of soil aliquots nor is this simple approach necessarily the appropriate way to determine the relevant concentration, as discussed by Nathanail (2004) and Ramsey (2004). A necessary requirement is that the soil aliquots are an unbiased sample from the volume of soil of interest; this can be achieved by random sampling, or more conveniently, approximated by gridded sampling from a random start. For the purpose of this discussion, it will be assumed that an appropriate set of measurements has been achieved that is representative of the volume of interest, the source in the hypothesized source–pathway–receptor contaminant linkage. The arithmetic mean of a finite number of measurements (a sample in the statistical sense) is merely an estimate of the true arithmetic mean of all possible measurements (the population); the latter cannot be known except by analysing the entire mass of soil in the area of interest, which is almost never practicable. However, it is known that the arithmetic mean of an unbiased sample of measurements is an unbiased estimate of the population mean, for any underlying probability distribution (Gilbert 1987). (It is noted here that the practice of calculating 10 to the power of the mean of the logarithms to base 10 of the measured concentrations is mathematically equivalent to calculating the geometric mean of the concentrations, and it is a theorem that the geometric mean is less than the arithmetic mean except in the trivial case where all the numbers are identical. The difference may be substantial for skewed distributions.)

Given that only an estimate of the true concentration is being compared with the assessment level, there is the possibility of error, depending on whether the estimate is above or below the unknowable true value, and on the decision being made. Statistical techniques provide a means of controlling the possibility of making one kind of error and thus help inform the decision. Statistical techniques do not, however, make a decision; this is the reserve of the informed practitioner.

There are four possibilities: two correct and two incorrect:

correctly deciding (on the basis of an estimate) that something is true when in truth (unknowable) it is true;

correctly deciding (on the basis of an estimate) that something is false when in truth (unknowable) it is false;

incorrectly deciding (on the basis of an estimate) that something is false when in truth (unknowable) it is true (Type 1 error);

incorrectly deciding (on the basis of an estimate) that something is true when in truth (unknowable) it is false (Type 2 error).

A way of appreciating this might be in the context of acceptance testing: the customer is concerned that they should receive acceptable product and therefore rejects everything unless it can be shown to be satisfactory (Type 1 error), and the supplier is concerned that the customer does not mistakenly return product that is actually acceptable (Type 2 error). The supplier assumes that all product is acceptable until the customer determines otherwise. In this case, the ‘customer’ is the regulator or general public, and the ‘supplier’ is the owner of the possibly contaminated land.

The probability of Type 1 error is conventionally given the symbol *α* and the probability of Type 2 error is *β*. The quantity 1 − *β* is called the power of a statistical test. A powerful statistical test has a low probability of Type 2 error.

The statistical techniques discussed in the CL:AIRE–CIEH guidance provide a means to control Type 1 error; Type 2 error is much less straightforward to control and is not considered in any detail in the CL:AIRE–CIEH guidance. However, incorrectly deciding that land is not contaminated, when in fact it is contaminated (a Type 2 error, depending on how the question is asked), might be considered to be a serious practical error.

The theory and estimation of Type 1 and Type 2 errors is discussed in many statistics texts; for example Upton & Cook (1996, chapter 16).

## Determinations within the planning framework

The planning framework is discussed first, as this turns out to be consistent with intuition and thus easier to explain.

Within the planning framework, the relevant test is that the land is suitable for the proposed use, whether or not the land has previously been determined as contaminated land. This requirement is contained in the following:

for Scotland: http://www.gov.scot/Publications/2000/10/pan33

for Wales: http://gov.wales/docs/desh/publications/160104ppw-chapter-13-en.pdf

Contaminated land legislation in Northern Ireland is still developing.

The various guidance documents can be summarized as requiring a positive demonstration that the land cannot be considered as contaminated land under the proposed use. We need to demonstrate the absence of significant harm by showing that the mean concentration of contaminant is below the assessment level with an adequate degree of certainty. Custom has been to accept 95% certainty (only 5% chance of error, *α* = 0.05) and this is suggested in the CL:AIRE–CIEH guidance. However, this degree of certainty has no formal basis in regulatory guidance.

In a formal decision-making framework, this can achieved by proposing a null (baseline) hypothesis that significant harm may be caused (mean is equal to or exceeds assessment level), and then showing that the data do not support this hypothesis. The null hypothesis is then rejected in favour of the alternative, that significant harm will not be caused (mean is less than the assessment level). Statistical tests for this purpose are presented in the CL:AIRE–CIEH guidance.

Using this approach, the Type 1 error is that significant harm could not be caused when in fact it could be. Type 1 error is controlled by the statistical testing process and the probability of this is no more than 5%. Type 2 error in this case is that significant harm could be caused when in fact it could not. This is not controlled by the testing process in the CL:AIRE–CIEH guidance. It might be considered that this possible error is less important, as the result remains protective of human health. However, mis-determination of land as contaminated will either result in unnecessary remediation, which has a financial and possible environmental cost (substantial engineering works and waste disposal), or it might result in a decision not to proceed with the proposed development, which has its own adverse socio-economic consequences.

Calculation of the probability of Type 2 error as the result of a sampling exercise for a particular value of the actual (unknowable) sampling target mean is in general not straightforward, although there is an expression for when the underlying result distribution is normally distributed. To have a base for controlling Type 2 error, it is necessary to identify a contaminant concentration below which it would be unreasonable to require remediation. This could lead to a specification of the form that there should be no more than a 5% chance (*α* = 0.05) of identifying the land not to be contaminated if the true average concentration exceeds 10 units (the assessment level) (control of Type 1 error), and no more than a 10% chance (*β* = 0.1) of identifying the land to be contaminated if the true average concentration is less than, say, five units (control of Type 2 error). The size of sample (i.e. number of results) necessary to achieve this can be calculated (US EPA 2006, appendix). There is no agreed protocol for choosing the number five that controls Type 2 error. A back-stop for the lower concentration value might be the natural background concentration of the contaminant in the area of the proposed development, if this can be determined. Clearly, the developer would wish to control the risk of remediating background. In this scenario, to decide that land is not contaminated for planning purposes, the mean of the measurements made will need to be less than the assessment level by a suitable margin identified by an upper confidence limit (see fig. X in the CL:AIRE–CIEH) guidance. Determining the confidence limit is discussed below.

## Determinations under the Environmental Protection Act

The Environmental Protection Act provides the possibility of determining land to be contaminated. It follows from this that determination of contaminated land is a positive act; there is a presumption in law that land is not contaminated until proved otherwise (DEFRA 2012, paragraph 4.25). This is a necessary presumption, otherwise there would be a requirement to assess the status of all land whatever its history.

The process of determining contaminated land is prescribed in statutory guidance that differs slightly between the countries of the UK, but all rely on the same basic process, as follows.

(1) For England and Wales (DEFRA 2012; Welsh Government 2012), paragraph 4.4 requires the authority to be satisfied that it is more likely than not that harm to human health will be caused.

(2) For Scotland (Scottish Executive 2006), paragraph A.31 considers determination on grounds of human health in terms of ‘unacceptable intake or exposure’ to harmful substances. The same paragraph considers ‘more likely than not’ to be grounds for determination in respect of non-human receptors. Notably there is no consideration of probability in human health.

(3) For Northern Ireland, work is in progress but the process is expected to be similar to that for England and Wales.

The legal presumption that land is not contaminated leads naturally to a null (baseline) hypothesis for statistical testing that the concentration is below the assessment level (mean is less than or equal to assessment level). If the evidence does not support this hypothesis, we reject it in favour of the concentration being above the assessment level. Using this approach, the Type 1 error is that the land is determined to be contaminated when in fact it is not. Type 1 error is controlled by the statistical testing process. Type 2 error is that the land is determined not to be contaminated when in fact it is. This is not controlled by the testing process in the CL:AIRE–CIEH guidance. In this scenario, to determine that land is contaminated land under the Environmental Protection Act, it is necessary for the mean of the measurements to exceed the assessment level, by a suitable margin called the lower confidence limit (see fig. X in the CL:AIRE–CIEH guidance). Determining the confidence limit is discussed below.

Returning to the supplier–customer analogy, this is analogous to the customer (regulator) assuming that all product is satisfactory until clearly proved otherwise. There is clearly a possibility using this approach that some unsatisfactory product (contaminated land) will be accepted as satisfactory (Type 2 error), and that this has not been controlled by the decision to reject only clearly defective product.

As the confidence with which a determination of possible harm is made is increased, the margin by which the average measured soil concentration exceeds the assessment level must increase. Thus it becomes more likely that a Type 2 error will be made: a decision that there is no possibility of significant harm when in fact there is. In this case the more certain decision is not conservative with respect to human health. In being, say, 95% certain rather than 90% certain that land truly is contaminated before determining it as such, the probability has increased that land that is contaminated will not be determined as such.

The statutory guidance for England and Wales explicitly requires land to be determined as contaminated if significant harm is more likely than not. Although the Scottish guidance is silent on the likelihood for human receptors, extension of the requirement for non-human receptors might lead to a conclusion that the test is ‘more likely than not’ for human receptors also. There is no requirement to prove the likelihood of significant harm ‘beyond reasonable doubt’.

If it has been shown with 95% certainty that the mean exceeds the assessment level, it is clearly likely that the possibility of harm exists. However, this is a more stringent test than that given in the English statutory guidance, that it is more likely than not, which might be considered equivalent to a 50% certainty. Thus there is the possibility that land would not be determined as contaminated although the mean of the sample exceeds that assessment level, because it does not exceed the assessment level by a sufficient margin to be 95% confident that the unknowable mean of the entire area of interest exceeds the assessment level. The possibility of Type 2 error, failing to determine land as contaminated when it is, is apparent and is not controlled by the statistical test.

To control the Type 2 error, it is necessary to provide a specification of the form that there should be no more than a 5% chance (*α* = 0.05) of identifying the land as contaminated if the true average concentration is less than 10 units (the assessment level) (control of Type 1 error), and no more than a 10% chance (*β* = 0.1) of identifying the land not to be contaminated if the true average concentration is more than, say, 15 units (control of Type 2 error). Again, there is no agreed protocol for agreeing the number 15 that controls Type 2 error, and, as this exceeds the assessment level, which is the boundary of significant harm, it is not clear that such a higher level is consistent with the spirit of the legislative regime.

## Determining the confidence limit

The arithmetic mean of (suitable) data is an unbiased estimate of the mean but has associated uncertainty, which can be expressed as confidence limits. The upper confidence limit of the mean (UCL) is a higher value, which we are reasonably certain exceeds the true value of the mean; for example, the 95% UCL estimated from a dataset is a value that will equal or exceed the true mean of the population of all possible measurements for 95% of possible datasets drawn from the population of all possible measurements. Only 5% of datasets taken from a given population will lead to an estimated 95% UCL that is less than the true mean. A lower confidence interval (LCL) can be expressed in a similar way; it is a value that we believe will be equal to or less than the true mean for a specified percentage of possible datasets drawn from the population. UCL and LCL can be expressed in terms of the arithmetic mean of measurements and the standard deviation of the population estimated from the measurements *s*:
(1)
(2)The factors *u* and *l* are intended to achieve the desired level of confidence *p*.

The *t*-test and Chebyshev tests in the CL:AIRE–CIEH guidance are mathematically equivalent to comparing a suitably calculated UCL or LCL with the assessment level, using different methods to calculate *u* and *l*. The *t*-test is based on the Central Limit Theorem and makes the assumption that the distribution of sample means is normal. This is true for any distribution if the sample of results is sufficiently large. If the sample is insufficiently large, which is likely to be true if the results deviate substantially from a normal distribution, Chebyshev's Inequality (Tchébychev 1867) avoids the requirement that the sample means are normally distributed and is therefore true whatever the distribution of results. The use of the inequality relies on an assumption that may not be true for the current application, that the sample standard deviation *s* is a good estimate of the population standard deviation *σ* (CIEH 2008). There are possible advantages to the confidence level approach to the tests: there is significant literature on different methods of estimating UCL including for datasets containing values below detection (e.g. EPA 2013), and the UCL is calculated in measurement units rather than dimensionless numbers, thus allowing a better feel for the computed result. It may not be clear to a practitioner what the significance is of a calculated *t* value of *x* compared with *y* in the tables, but a calculated UCL of 10.001 mg kg^{−1} compared with an assessment level of 10 mg kg^{−1} might lead to a pragmatic decision, supported by other information on the remediation undertaken, the accuracy of the measurements made and the nature of the proposed development.

Values of *u* and *l* are calculated as
(3)
(4)where *n* is the number of measurements and is the value of one-tailed Student's *t* for *x* degrees of freedom and probability *y* (looked up in tables and available as a function in Excel). Equation (3) can be found in most statistics texts; equation (4) derives directly from Chebyshev's Inequality (see Singh & Singh 2013, section 2.4.7).

The values of the two expressions are compared in Figure 1, for different values of *n* and *p* (= 1 − *α*). It can be seen that the Chebyshev numerator is larger for most combinations of *n* and *p* with *n* > 2; that is, with more than two measurements. The denominator is the same for both equations.

By setting UCL or LCL in equations (3) and (4) to the assessment level, we can calculate the value of the sample mean that is on the threshold of determination, shown in Figure 2.

Figure 2 shows that, for the planning scenario, the critical value of the sample mean using the Chebyshev test approach is smaller than the value calculated from the *t*-distribution when there are more than two measurements, which is generally the case*.* Thus, using the less powerful Chebyshev test, the site has to be ‘cleaner’ to pass the test. As expected, the site has to be ‘cleaner’ to pass the test with 95% certainty than with 90% certainty. The more results that are available (assuming that more data do not change the sample mean and standard deviation significantly), the higher the measured mean site concentration before the test is failed. This is entirely intuitive; the benefit of having more data is less conservatism.

Figure 2 shows that, for the Environmental Protection Act scenario, the critical value of the sample mean using the Chebyshev test approach is larger than the value calculated from the *t*-distribution when there are more than two measurements, which is generally the case*.* Thus, using the less powerful Chebyshev test, the site can be ‘dirtier’ and still pass the test; it is harder to ‘prove that it is contaminated’, as recognized by CIEH (2008). Using the less powerful test is not conservative. The more results that are available (assuming that more data do not change the sample mean and standard deviation significantly), the lower the measured mean site concentration that will pass the test. This is a perverse result; with just a small number of results a contaminated site is more likely to be determined as ‘clean’, so why collect more data to increase the likelihood of the site being determined as contaminated? It can be deduced from equation (2) that the calculated LCL can be negative if *s* is large (as it may be for log-normally distributed data). This appears to be part of a wider difficulty of calculating LCL for skewed distributions and no solution has been noted in the literature. However, using this calculational approach at least reveals that there is a conceptual problem, which may not be apparent when making a simple test as described in the CL:AIRE–CIEH guidance. The LCL calculated from data may be negative, which will always be less than a positive assessment level, so land will not be determined as contaminated, whatever the sample mean.

In the ‘planning’ scenario, from any given dataset it is more likely that a decision made using the Chebyshev test will conclude that significant harm is possible and remediation is required. Thus using the less powerful Chebyshev test fails safe in terms of public protection.

To give a numerical example, let us suppose the assessment level is 10 units, there are 20 measurements and the calculated standard deviation of the population estimated from the data is two units. Then a decision that there is significant harm would be supported with 95% certainty by a mean of the measurements below 9.2 units using the *t*-test, or below 8.1 units using the Chebyshev test. If the unknowable true mean was actually nine units, then a correct decision has resulted using the *t*-test, but a Type 2 error has occurred using the Chebyshev test. Which test is more appropriate requires further exploration of the data. However, it is clearly more demanding to demonstrate absence of significant harm using the less powerful Chebyshev test than using the *t*-test. This is protective of human health, but may have other undesirable consequences as discussed above.

In the Environmental Protection Act scenario, from any given dataset it is more likely that a decision will be made that there is no significant harm using the Chebyshev test than the *t*-test. This is not protective of human health and may not be what was intended by the legislation. Using the same input data as the planning scenario above, a determination of significant harm would require the mean of measurements to exceed 10.8 units using the *t*-test, or 11.9 units using the Chebyshev test. If the unknowable true mean was actually 11 units, then a correct decision has resulted using the *t*-test, but a Type 2 error has occurred using the Chebyshev test. Using the Chebyshev test does not fail safe in terms of public protection; it protects the supplier, who in this case is the landowner. It should be noted also that, whichever test is used, a determination of significant harm is not made until the sample mean exceeds the assessment level by some margin. It is not clear that this was the intent of the legislation.

## Commentary

In the case of determinations under the Environmental Protection Act, if the CL:AIRE–CIEH guidance is used to achieve 95% certainty of determination, there is a substantial probability that land that is in fact contaminated is not determined as such. This appears to be different from the intent of the law and associated statutory guidance. The probability of this error increases if the less powerful Chebyshev test is used, and as the number of results increases. Methods to control Type 2 error require choosing another comparison number, larger than the assessment level that determines the boundary of significant harm. There is the additional difficulty that this approach requires the calculation of an LCL, which appears to be mathematically unsatisfactory if the data are skewed.

These problems are avoided by following the statutory guidance more literally and making a decision in the context of the Environmental Protection Act on the balance of probabilities (more likely than not); that is, at a probability level of 50% rather than 95%. Substituting a probability of 50% in equation (3) given above, results in calculating that UCL and LCL are identical and the same as the arithmetic mean; the *u _{p}* and

*l*terms become zero for the

_{p}*t*-test. Because the arithmetic mean of the suitable data is always an unbiased estimate of the true population mean, there is equal probability of it being above or below the true mean. There is equal probability of correctly and incorrectly determining significant harm or absence of significant harm. The landowner may argue that a 50% chance of incorrectly determining land to be contaminated is too high, and the regulator may equally argue that a 50% chance of incorrectly determining land not to be contaminated is too high. However, this position represents a balance of interests and avoids needing to ‘invent’ a number larger than the assessment level as a basis for distribution of risk. Guidance with statutory force in England states that the determination of contamination should be made on the basis of ‘more likely than not’; this does not clearly translate to a requirement to prove with high certainty the significance of harm. It also avoids the technical difficulty of calculating a 95% LCL, which does not appear to be a satisfactory process in terms of health protection. In the example above, a determination of significant harm at 50% probability would be supported if the mean of measurements (which is an unbiased estimate from the available data) exceeded 10 units.

The CL:AIRE–CIEH guidance discusses the possibility of making a decision of contamination at 51% probability, but it does not emphasize the desirability of doing so. The emphasis is on making a determination of contamination with 95% certainty. This is not protective of human health in the way that might be expected, because of the uncontrolled possibility of deciding incorrectly that land is not contaminated when in fact it is, and it is not obviously consistent with statutory guidance. It results in the possibility of contamination, on the balance of probabilities, not being recognized.

It can be seen from equation (3) that, if the *t* term is zero because we have set *p* = 0.5, the confidence limits are independent of the size of sample; there is no obvious ‘mathematical’ benefit in having more than one result. Clearly, few of us would be happy to proceed on the basis of a single result, which may not be representative of the whole. An alternative approach would be to collect a ‘reasonable number’ of results, and then to calculate both the upper and lower confidence limits. These are then compared with the assessment level and a judgement is made. Alternatively, equations (1)–(4) can be re-cast to calculate the probability that a sample mean is above or below the assessment limit and a judgement can be made as to whether this probability is sufficiently high. However, because a judgement is now being made rather than a simple comparison, there is room for debate between landowner and regulator. Inspection of Figure 2 suggests that a ‘reasonable number’ of results is unlikely to be fewer than 10, and incremental benefits may become small after 30.

In the planning scenario, guidance requires a positive demonstration that the land is suitable for use and it is reasonable to expect this decision to be made with high certainty, as public health is the issue. The decision can be made using the 95% UCL, for which many calculation methods are available. The Chebyshev test provides a conservative way of calculating this. Other methods have been described in the literature. However, it must be remembered that in being conservative there is a greater probability that land is rejected as contaminated when in fact it is not, and development could have proceeded safely or costly remediation could have been avoided.

The alternative of using the ‘planning’ approach for determinations under the Environmental Protection Act is not available, because the law does not permit it. It would be a presumption of contamination everywhere until proved otherwise, whereas the law presumes the opposite, as it must.

## Conclusions

The CL:AIRE–CIEH guidance presents a robust protocol for demonstrating whether potentially contaminated land is suitable for a proposed new use, within the land-use planning framework. A conservative approach perceived to protect human health increases the probability that unnecessary remediation will be required, or that a development will not proceed, both of which have associated societal costs. However, users need to be aware that the guidance does not protect against this.

Rather than using the *t*-test or Chebyshev test in the form detailed in the guidance, there are possible advantages to comparing a suitably calculated UCL with the assessment level for the ‘planning’ scenario. This is mathematically equivalent to the tests described in the guidance, but has the advantages of using real units and of accessing the full variety of methods available in the literature and the public domain software that is available to implement this method. There is still a requirement to use whatever method is chosen with appropriate data and in a responsible manner, as discussed by Nathanail (2004).

In discussing the possibility of identifying land as contaminated within the context of the Environmental Protection Act with 95% certainty, the CL:AIRE–CIEH guidance goes beyond the requirements of the statutory guidance and enters an area of potential difficulty. The test is implicitly dependent on calculating a lower confidence limit, and this behaves in a way that is not protective of human health. The more data are available the easier it is to prove significant harm, for the same (unknown) concentration in the sampling target. Reliable methods to determine lower confidence limits on the skewed data that often occur in contaminated land datasets appear not to have been presented in the literature. In any event, use of the Chebyshev test is no longer conservative with respect to human health. There is no legal necessity to demonstrate that it is 95% certain that a parcel of land is contaminated, on the basis of measurements. ‘On the balance of the evidence’ (i.e. 50% certainty) is sufficient as described by statutory guidance. This can be achieved by a simple comparison of the arithmetic mean of suitable data with an appropriate assessment level, avoiding the difficulty of calculating a lower confidence limit.

Alternatively, rather than relying on a simple comparison of either upper or lower confidence limits with an assessment level, both are calculated (assuming that a suitable method exists) or a probability of exceedance is calculated, and a view is taken on the certainty of the conclusion that there is a possibility of harm. This approach may lead to debate between the landowner and regulator, as the decision is no longer a clear-cut simple comparison.

## Acknowledgements

The thinking expressed here developed in response to discussions with I. Teasdale of Sellafield Ltd and K. Gray of ESI Ltd, during the execution of a contract funded by the Nuclear Decommissioning Authority. These inputs are gratefully acknowledged, but the author accepts sole responsibility for the opinions expressed.

*Scientific editing by Stephen Buss; Nicola Harries*

- © 2018 The Author(s). Published by The Geological Society of London. All rights reserved