Paper published in a book (Scientific congresses and symposiums)
Comparison of Kohonen's Self-Organizing Map algorithm and principal component analysis in the exploratory data analysis of a groundwater quality dataset
Peeters, Luk; Dassargues, Alain
2006
 

Files


Full Text
publi144-2007.pdf
Author preprint (1.54 MB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
groudwater quality; exploratory data analysis; principal component analysis; Self-Organizing Map algorithm; Kohonen's Self-Organizing Map; SOMs
Abstract :
[en] Groundwater monitoring networks typically yield large, multivariate datasets. Analysis and interpretation of these datasets starts with an exploratory data analysis in order to summarize the available data, extract useful information and formulate hypotheses for further research. Exploratory data analysis is mostly focussed on finding related variables and groupings of similar observations. Traditionally multivariate statistical techniques like principal component analysis (PCA) are used for this purpose. In PCA a linear dimensionality reduction of the original, high dimensional dataset is carried out in order to identify orthogonal directions (principal components) of maximum variance in the dataset based on linear combinations of correlated variables. Projections of the original data in the subspace defined by the principal components can be used to identify groups in the data and to reveal relationships between variables (Davis, 1986). In this study, principal component analysis is compared to Kohonen's self-organizing map (SOM) algorithm. The SOM-algorithm is an artificial neural network technique designed to carry out a non-parametric regression process that is mainly used to represent high-dimensional, nonlinearly related data items in a topology-preserving, often two-dimensional display, and to perform unsupervised classification and clustering (Kohonen, 1995). Both PCA and SOM are applied to a hydrochemical dataset from a monitoring network in two sandy, phreatic aquifers in Central Belgium. The monitoring network consists of 47 monitoring wells each equipped with three filters at different depths, in which 14 variables are measured. The first aquifer, the Diest sands aquifer is of Late Miocene age and consists of coarse, glauconiferous sands and sandstones (Laga et al., 2001). The second aquifer, the Brussels sands aquifer, is of Middle Eocene age and is an heterogeneous formation consisting of an alteration of highly and poorly calcareous sands, locally silicified (Laga et al., 2001). Both techniques succeed in distinguishing between both aquifers and reveal the relationships between variables. The main advantage of PCA is the mathematical quantification of correlation between variables and the expression of the original data in the subspace defined by the principal components. The visualization of the SOM-analysis on the other hand allows a straightforward interpretation of the dataset structure in which even non-linear relationships between variables can be identified. Additionally, the SOM-algorithm can handle a limited amount of missing values in the dataset, contrary to PCA.
Research center :
Aquapôle - ULiège
Disciplines :
Geological, petroleum & mining engineering
Author, co-author :
Peeters, Luk;  Katholieke Universiteit Leuven - KUL > Geologie-Geografie > Hydrogeologie en Ingenieursgeologie
Dassargues, Alain  ;  Université de Liège - ULiège > Département Argenco : Secteur GEO3 > Hydrogéologie & Géologie de l'environnement
Language :
English
Title :
Comparison of Kohonen's Self-Organizing Map algorithm and principal component analysis in the exploratory data analysis of a groundwater quality dataset
Publication date :
2006
Event name :
GeoENV2006
Event date :
2006
Audience :
International
Available on ORBi :
since 03 January 2009

Statistics


Number of views
1165 (9 by ULiège)
Number of downloads
1324 (13 by ULiège)

Bibliography


Similar publications



Contact ORBi