## Abstract

**Abstract** The use of 3D models to view complex and diverse geoscience datasets is now common practice for conceptual model evolution, communication to stakeholders, or for testing hypotheses. When applying these models it is important to recognize that their ability to replicate the true situation is controlled by the data used to generate the model and the model algorithms. For the models to be applied correctly the model uncertainty needs to be identified and, where possible, quantified. A method to quantify the uncertainty associated with geological surfaces in a 3D model is presented and tested. Kernel density smoothing and resampling of borehole locations along with expert–user interaction are utilized to provide an estimate of the uncertainty in a geological surface based on data quality, data density and geological complexity. The method is applied to a 3D geological model of shallow superficial deposits, where a sequence of river terrace gravels and alluvial deposits overlie mudstone bedrock. Outcomes indicate that the uncertainty model predictions match well with expert judgement.

The use of 3D geological models in many areas of geoscience has become widespread largely as a result of improvements in computer processing power (Kessler & Mathers 2006; Kessler *et al*. 2007). These models have allowed the reconstruction and visualization of complex and diverse datasets at a range of scales with the potential to adapt and adjust with new data and interpretation, and in some instances to generate logical 3D geometric shapes by triangulation (Robins *et al*. 2005). Much of the impetus and financial backing for these models has come from the hydrocarbon and metallic mineral exploration industries (Xu & Dowd 2003; Berg & Keefer 2004). More recent 3D studies include the use of 3D visualization for geosphere containment (Tirén *et al*. 1999) and as an analytical tool preparatory to numerical groundwater modelling (Robins *et al*. 2005). To date, these studies have all been primarily concerned with bedrock geology. More recent applications include the modelling of superficial deposits, which are spatially complex and demand resolution at a scale determined by the end user. In general, reconstruction and visualization of superficial deposits requires more detail and accuracy to understand and describe than bedrock modelling (Berg & Keefer 2004; Wu *et al*. 2005). These superficial deposit models have been used in a range of applications, including improving understanding of complex depositional sequences (Perrin *et al*. 2005); urban planning (Bridge *et al*. 2004); engineering hazard assessments (Culshaw 2005; Price *et al*. 2007); groundwater issues (Thorleifson *et al*. 2005; Lelliott *et al*. 2006*a*; Olyphant *et al*. 2006; Price *et al* 2006; Stone 2006); and forming the framework for site conceptual models (Lelliott *et al*. 2006*b*).

For a model to be applied successfully, whether it is in support of hydrogeological analysis or looking for engineering solutions, it has to represent the ‘real’ situation as accurately as possible. In the geological context, the ‘real’ situation is often unknown and the model represents an interpretation, based on limiting assumptions, of what is likely to occur between data points. Consequently, the effectiveness of the model depends largely on the quality of information used, and the amount and distribution of information available. Mann (1993) suggested four main categories of uncertainty in geology: (1) variability: the inherent natural variability that exists in geological objects; (2) measurement: uncertainty caused by imperfections in the measurement procedure; (3) sampling: uncertainty that arises from the process of making a measurement at a specific spatial location (i.e. is it representative?); (4) modelling: uncertainty associated with processing of the data to create the model.

For the model to be used in an effective and appropriate manner (e.g. by decision makers or other stakeholders) it is important to assess the confidence associated with the data, to give an idea of model uncertainty (Cave & Wood 2002). Uncertainty analysis of models has become recognized as an important process, and is increasingly being asked for by clients and statutory bodies (Steeds *et al*. 2000). However, uncertainty analysis is still not standard practice, partly because of the perception that it is difficult to perform and too subjective (Pappenberger & Beven 2006).

Many methods of uncertainty assessment in geoscience rely on a combination of mathematical, statistical and geostatistical models (Artus *et al*. 2006; Feyen & Caers 2006; Lee & Lee 2006; Tacher *et al*. 2006; Emery & Gonzalez 2007). Although they are mathematically rigorous, these methods can fail to take into account the subjective nature of the data used in the model (e.g. geological interpretation). In addition, users of geological modelling packages are wary because the results produced are not representative of their conceptual model of the processes controlling the uncertainty. There is a need for an approach that applies rules on assessment of uncertainty in a structured way but allows the modellers to interact and modify the results to take account of their expert opinion.

This paper aims to demonstrate a semi-quantitative uncertainty estimation procedure for 3D geological models that identifies all sources of uncertainty, both qualitative and quantitative, and takes these into account, where they make a significant contribution, in the final uncertainty estimate (Cave & Wood 2002). Key criteria for the uncertainty assessment are: that the method is simple (i.e. a complex understanding of mathematics and statistics is not required); that it is robust (i.e. a procedure that is not seriously disturbed by small violations of the assumptions on which it is based); and that it allows the user to interact with the results to take account of site-specific expert opinion. The uncertainty estimation procedure is tested on a geological model produced for the Source Area BioRemediation (SABRE) project site at an industrial complex in the UK. The site is characterized by a shallow sand and gravel aquifer overlying a low-permeability aquitard of the Mercia Mudstone Group. The remedial objective at the SABRE site is the effective and quantified treatment of a dense non-aqueous phase liquid (DNAPL) source area by accelerated anaerobic *in situ* bioremediation. One of the performance measures to establish the effectiveness of the remedial method is the difference between the DNAPL mass before and after remediation (USEPA 2003). Reconstructing the complexities of a DNAPL source zone, which results from small-scale subsurface spatial variability (heterogeneity), can introduce substantial uncertainty in the estimation of DNAPL mass. Furthermore, uncertain DNAPL mass estimates constrain our ability to design an effective remediation system and ultimately limit performance assessment.

Increasing availability and application of a range of methods to detect and delineate DNAPL source zones demands new methods of data integration, numerical assessment, visualization and communication. The development of the 3D geological model in this study provides a framework to explore the propagation of uncertainties, inherent in the reconstruction of the geology, in the estimation of DNAPL mass at this and other sites.

## Construction of the SABRE 3D geological model

The model was constructed using the Geological Survey and Investigation in 3D (GSI3D) software package (Hinze *et al*. 1999; Sobisch 2000; Kessler & Mathers 2004; Kessler *et al*. 2007). This software was chosen as it has been used extensively, and successfully, for reconstruction of superficial deposit geology (Culshaw 2005; Lelliott 2006*a*).

The geological model was built from information taken from 29 intrusive investigation geological logs over an area of *c*. 4200 m^{2} ( Fig. 1). Geological surfaces were defined by 35 cross-sections that included every site investigation point. The ground surface of the model was constrained by elevation measurements made at investigation points using theodolite measurements and differential global positioning system (GPS). The 3D geological model is shown in Figure 2, and the main elements of the model are shown in a cross-section in Figure 3. A summary of the type and number of site investigation points that fully penetrated each deposit, together with a description of each deposit, is given in Table 1.

The data covered three stages of site investigation, each performed using different intrusive methods and carried out by different contractors. The first site investigation was carried out in 2001 using cable percussion drilling with bulk sample collection and trial pit excavation. The second and third stages of investigation were undertaken as part of a more detailed research study in 2005. The second stage comprised sonic bore drilling, and the third stage cable percussion with soil penetration testing (SPT). The primary purpose of the site investigations was to identify the distribution of contamination in the soil and groundwater. As a consequence of this, the borehole distribution is irregular and focused towards contaminant source delineation (Fig. 2). A 3D geological model was selected to provide a method to visualize the geology and the contrasts between the different lithostratigraphies at the site; the latter is fundamental to conceptualizing likely contaminant migration paths. The model also allowed the flexibility to add new data when available, to visualize complex data in a 3D format, and to identify specific features by exaggerating the vertical scale. For the model to be used in an appropriate manner for decision making (i.e. calculating the volume of the gravel affected in the contaminant source zone) it was fundamental to assess the uncertainty associated with each of the modelled geological surfaces.

## Sources of uncertainty

The first, and possibly most important, step to quantify uncertainty is the identification of the sources of uncertainty associated with the model and to consider their relationships. Cause and effect diagrams, otherwise known as Ishikawa or fishbone diagrams (Kindlarski 1984), provide a structural framework with which to explore these relationships. This method is particularly suited to brainstorming sessions that encourage project members to think laterally about uncertainties that may be inherent in their areas of work, and how these factors can interact to influence one another. This allows all the possible sources of uncertainty, including subjective and qualitative sources, to be considered, even if they are not deemed significant enough to be included in the overall uncertainty assessment. The factors that contribute to the overall uncertainty of the geological model surfaces ( Fig. 4) can be broadly grouped into: data density (the density of boreholes used to construct the model); data quality (quality of the data used to construct the model, including borehole elevation, sample type, drill method and logging quality); geological complexity (variability of the geology across the site); and modelling software. Once the factors that contribute to uncertainty have been identified, judgement can be used to decide if these factors need to be accounted for in the overall assessment. For instance, the elevation of the boreholes was measured using two survey methods, theodolite and differential GPS. Of these two methods, differential GPS is likely to have a lower uncertainty as it is generally more accurate (most points had a cumulative uncertainty in the *x*, *y* and *z* directions of 30 mm), whereas the theodolite can have a vertical error of up to 100 mm. However, in this instance the vertical error of the theodolite is likely to be significantly less, as the boreholes are distributed over a small spatial extent on a relatively flat site. Consequently, borehole elevation has not been considered as part of the uncertainty assessment. Similarly, the logging quality is also not considered, as boreholes were logged by experienced geologists to a recognized standard (BS5930, British Standards Institution 1999).

## Method for the estimation of uncertainty

All of the contributory factors to uncertainty as specified in the fish diagram (Fig. 4) are held as a conceptual model in the geologist's view of the true 3D geological model. The uncertainty at any given point is a measure of how confident the geologist is in the information available for defining a 3D position. It is possible that the modelled surface can be very close to the true spatial position (spatial error) but the geologist would estimate the uncertainty as being much larger because of low confidence in the information being used to estimate its position. The aim is to elicit the expert's opinion of uncertainty and establish a practical approach that can be used to derive a semi-quantitative estimate at any point in space defined by the model. In this instance, the principal sources of uncertainty have been identified as the data density, data quality and geological complexity.

The linguistic rules that express the relationship between these values and the overall uncertainty can be summarized as follows: (1) if the data density is high then uncertainty is lower (or vice versa); (2) if the data quality is high then the uncertainty is lower (or vice versa); (3) if the geological complexity is low then the uncertainty is lower (or vice versa). The following section explores methods for converting these conceptual ideas into measurable quantities that can be calculated at any *xyz* location.

## Data density

If a borehole intersects a geological surface then we can be certain of the surface's location (within the measurement uncertainty of the borehole log) at the borehole. With increasing radial distance from the borehole, uncertainty in the position of the surface increases. This process can be modelled by setting a Gaussian function centred on the borehole ( Fig. 5), and the ‘radius of influence’ of the borehole (i.e. the standard deviation of the Gaussian distribution) can be controlled by changing the variance of the Gaussian function.

Kernel density smoothing (Bowman & Azzalini 1997) can be used to estimate the uncertainty from more than one borehole by adding together the Gaussian outputs for each borehole (Fig. 5). This procedure can be extended into two dimensions. The sum of these Gaussian functions is a surrogate for uncertainty, where high density represents low uncertainty and low density leads to high uncertainty. In this study, the estimated radius of influence for each borehole is 15 m based on sensitivity analysis of expert judgement (see ‘Input parameter sensitivity’ section). The density for a grid of points for a given arrangement of boreholes can be calculated. Scaling the kernel density results to the highest calculated density and equating the highest value to lowest uncertainty (i.e. the measurement uncertainty in the borehole log) and the lowest density to the highest uncertainty (estimated to be ±1 m based on sensitivity analysis), it is now possible to estimate uncertainty related to the density of boreholes at any *xy* location on the modelled surface.

## Data quality

In the SABRE study the borehole data are derived from three types of drilling method, each giving a different quality of data, mainly as a result of differences in measurement uncertainty in the borehole log. In this instance, the uncertainty increases in the order sonic boreholes < excavator < cable percussion. The data density method can produce an estimated uncertainty grid for each of the three types of investigation method. The results from this can be compared at each point on the interpolated grid and, using the lowest uncertainty of the three at any given point, can be combined to produce an uncertainty estimate that takes into account both data density and the data quality.

## Geological complexity

The *xyz* coordinates of boreholes intersecting the geological surface can be used to create an approximation of the true geological surface by fitting through these points. In areas where the surface is featureless and flat only a few widely spaced boreholes are required to give a reasonably accurate representation of the surface. In areas where there are rapid changes in depth over a relatively small area (i.e. more geologically complex areas) a higher density of boreholes is required. The uncertainty arising from changes in complexity over a surface and its relationship to boreholes can be estimated using a computer intensive approach, as follows.

Resample (with replacement) the

*xyz*coordinates of geological surfaces in boreholes used in the model.Fit a surface to the resampled borehole coordinates on a given grid.

Carry out steps (1) and (2) many times (e.g. 500–1000), recording the surface positions on the interpolated grid for each iteration.

Measure the 95% confidence limit on the interpolated depths on the grid obtained for each iteration, to give an estimate of uncertainty.

## Combined uncertainty

The identified factors contributing to uncertainty (geological complexity, data density and data quality) were combined by adding the square root of the sum of the squares of the sources of uncertainty (Ellison *et al*. 2000) to produce an overall uncertainty for the modelled surface. This empirical approach produces a combined uncertainty that fits with the intuitive expert judgement rules and allows prediction of uncertainty at any point on our interpolated surface. This approach was applied to each surface in the model. Given that the uncertainty model for a given surface uses the *xyz* coordinates of the boreholes or other raw data points and some expert judgement, it may be possible to use these inputs and the uncertainty output as an ‘expert system’ to predict uncertainty in other situations without the intervention of the expert. To test this possibility, the uncertainty results from one surface were chosen and converted into an expert model. The model was then applied to other surfaces and compared with the results of the empirical combination of data density and geological complexity already described.

Using a method called adaptive neuro-fuzzy inference system (ANFIS) (Jang 1993), fuzzy models can be set up that ‘learn’ from the expert judgment data and predict the quality score of the parameters under study in the same manner as the expert (e.g. Ju & Ryul 2006). The ANFIS function within the Fuzzy Logic Toolbox of the MATLAB programming language was used to set up the expert model using the data from the upper alluvium surface. The data density for each type of borehole and the geological uncertainty were used as model inputs and the system was trained on the empirical estimate of uncertainty. The data were split in half using one half to train the model and the other as an independent validation dataset. Using subtractive clustering (Chiu 1994), eight fuzzy rules were produced for the original model. Acceptable agreement between the model and the training data were reached after 20 training cycles (root mean square error of 0.003 m), with the validation data indistinguishable from the predicted data at the model root mean square error. This provides the basis for an automated system for estimating uncertainty.

The system requires the *xyz* coordinates of the boreholes, the quality of each borehole (i.e. drilling method for each borehole), and the *xy* grid at which the uncertainty is to be estimated. The software can calculate data densities for each type of borehole and measure the geological complexity of the surface using the resampling routine so that these parameters can be fed into the trained expert system that produces the final uncertainty estimate. There is a good agreement between uncertainty estimates produced by the empirical method and the fuzzy expert judgement model for the lower alluvium, river terrace gravel and Mercia Mudstone Group ( Fig. 6). This exercise shows that the ANFIS procedure can be used successfully to learn the empirical relationships and apply them to a new dataset. In this instance, however, the geology is relatively simple although there are relatively few boreholes to construct the model and, therefore, there is not much to be gained from the ANFIS modelling as the empirical process is easy to carry out. For situations with larger numbers of boreholes (hundreds or many hundreds) the kernel density and the resampling algorithms will be computationally intense. In this instance, it would be more efficient to measure the uncertainty using the empirical method on a representative subset of the boreholes and an ANFIS model of these data to predict uncertainty for the remaining data. It may be possible to build up a library of ANFIS models in the future for different geological modelling scenarios that would replace the empirical approach entirely.

## Input parameter sensitivity

The method for estimating uncertainty in this study combines the data density uncertainty for each borehole type with the geological complexity. The uncertainty from the geological complexity algorithm is dependent on measured parameters (borehole spacing and the measured depths to the geological surface). Consequently, geological complexity takes into account the ‘shape’ of the surface where boreholes are present, but assumes a flat surface, and therefore a false low uncertainty, where there are no boreholes. Data density uncertainty is based purely on the proximity, number and type of boreholes, and does not take into account the ‘shape’ of the surface. When combined, the data density complements the geological complexity by showing high uncertainty in areas where there are no boreholes. The aim of the sensitivity assessment is to ensure that the distance of influence (*i*) for the data density is not so small that the geological complexity dominates where there are no boreholes and gives false low uncertainty values. Expert judgement is also used to determine the likely maximum uncertainty (*m*) associated with the data density. For example, a geologist familiar with the site under study and its geological setting would make a judgement of how confidently the depth of a specified geological surface could be estimated at areas of the site furthest from a borehole. The *m* value is ascribed to the greatest uncertainty for borehole density (i.e. where there are the least boreholes).

The effect of varying both *i* and *m* for three different values expected to cover the likely data range is assessed. Using all combinations a set of nine experiments were used to investigate the effects ( Table 2).

The combined geological complexity and data density uncertainty data has been plotted as histograms ( Fig. 7) and spatial uncertainty plots ( Fig. 8) for each of the nine experiments to assess the best conditions. Examination of these plots shows that the spatial distribution and histograms of the scenarios where the *m* value is <2 give similar spatial and frequency distributions of uncertainty. Because the overall estimated uncertainty does not show marked changes, the sensitivity of the method to changes in the *i* and *m* parameters is relatively low. At *m* values of 1 and 0.5 all of the histograms show a mode value around 1 m. The four scenarios in the bottom left of Figures 7 and 8 show very similar spatial distributions and uncertainty histograms; this result shows that, in these conditions, the uncertainty is dominated by geological complexity. When the *m* value is 2, Figures 7 and 8 show that the uncertainties are skewed towards higher values with bimodal distributions. In these conditions the data density contribution to the overall uncertainty begins to dominate. For this study, where the site is relatively small and the geology is relatively well behaved, expert judgement suggests that it is the local variation in geological complexity (surface shape) that plays a larger role than data density in the final uncertainty assessment. Therefore, an *i* value of 15 and an *m* value of 1 are thought to be the best representation of uncertainty and have been used subsequently for all surfaces.

## Application to the SABRE geological model

The uncertainty estimation method has been applied to each surface in the SABRE geological model to produce semi-quantitative uncertainty estimates. In this study the empirical estimation procedure was carried out on each modelled surface (upper alluvium, lower alluvium, river terrace gravel and Mercia Mudstone Group). The uncertainty outputs ( Fig. 9) show low uncertainty in the spatial position of the modelled geological surface as cold colours, and high uncertainty as warm colours. The uncertainty has been semi-quantified with values ranging from 0.2 to 1.95 m. This value is not an exact measure of uncertainty, but a 95% confidence envelope that the true geology surface will be within the uncertainty range given for the modelled surface.

The uncertainty plots indicate that the greatest uncertainty is associated with the upper alluvium surface, even though the most site investigation points penetrate this unit. The uncertainty of this surface is controlled primarily by the high degree of geological complexity resulting from the irregular incision of the overlying made ground into the upper alluvium during various phases of construction (buildings and services). The lower alluvium, river terrace gravel and Mercia Mudstone Group surfaces all show relatively low uncertainty estimates in the centre of each surface where there is the greatest density of high quality (i.e. sonic drilled) borehole data. This low uncertainty extends further to the west for the river terrace gravel and Mercia Mudstone Group surfaces, principally because of a lower degree of geological complexity in these areas. Areas of higher uncertainty on these two surfaces are at the edge of the model, or where a significant difference in depth to surface is measured, resulting in a ‘bulls-eye’ effect around a borehole. For example, in the south of the modelled area there is change in Mercia Mudstone Group surface elevation from an average of 33.5 m aOD (above Ordnance Datum) to 32.5 m aOD at one borehole location. This is identified on the uncertainty surface for the Mercia Mudstone Group as an area of higher uncertainty surrounding this borehole.

The uncertainty estimates produced for each geological surface appear to match intuitive interpretation of where the greatest uncertainty in the model occurs. However, these uncertainty estimates require validation to confirm the assumptions used during the uncertainty estimation process. This validation can be carried out by drilling additional boreholes at the SABRE site.

The construction of a 3D model for the SABRE site has allowed visualization of the subsurface that can form the framework for further study (i.e. conceptual model evolution). An estimation of the uncertainty associated with the modelled geological surfaces can be used to identify where maximum uncertainty lies, and if this uncertainty is within required limits for investigation. If the criteria for uncertainty are exceeded then uncertainty diagrams can be used to guide future investigations, to reduce uncertainty. Additionally, the method used gives a clear visual impression of where the uncertainty in the model occurs, which is of particular importance when communicating to stakeholders or non-professionals.

## Conclusions

As 3D geological models are used increasingly in geoscience it is crucial that uncertainty is assessed and communicated so that the end user can understand the model limitations and to ensure that it is appropriate for their requirements. This is especially important where 3D reconstruction and visualization is used for decision making (i.e. conceptual model), communication to stakeholders, or for testing hypotheses.

The uncertainty of a model is not restricted to the algorithms that make up the model, but involves all the factors that feed into the model development, including subjective data. A simple method has been used to visualize the uncertainty associated with a modelled geological surface that accounts for both qualitative and quantitative terms. This uncertainty estimation method has been demonstrated on a geological model produced for the SABRE project, with resultant uncertainty estimates agreeing with intuitive expectations for the uncertainty of the model. Additional drilling at the SABRE site will test the hypothesis and allow model validation. Once validated, the uncertainty estimation method can be tested on larger, more geological diverse and complex environments.

## Acknowledgements

Project SABRE is being undertaken by a collaborative team comprising Acetate Products Ltd, British Geological Survey, Celanese Acetate LLC., Archon Environmental, Chevron, CL:AIRE, DuPont, Environment Agency (England and Wales), ESI, General Electric, GeoSyntec/SiREM, Golder Associates, Honeywell, ICI, Scientifics, Shell, Terra Systems, University of Edinburgh, University of Sheffield and USEPA. SABRE is a UK DTI Bioremediation LINK project with specific funding contributions made by BBSRC, DTI, the Environment Agency, EPSRC and NERC. Additional funding has been provided by SERDP (US Department of Defense). The SABRE team gratefully acknowledges these supporting organizations and the in-kind and direct financial contributions made by each team member. The paper is published by permission of the Director, British Geological Survey (NERC).

- © 2009 Geological Society of London