Reference : Comparison of Kohonen's Self-Organizing Map algorithm and principal component analysis i...
Scientific congresses and symposiums : Paper published in a book
Engineering, computing & technology : Geological, petroleum & mining engineering
Comparison of Kohonen's Self-Organizing Map algorithm and principal component analysis in the exploratory data analysis of a groundwater quality dataset
Peeters, Luk [Katholieke Universiteit Leuven - KUL > Geologie-Geografie > Hydrogeologie en Ingenieursgeologie > >]
Dassargues, Alain mailto [Université de Liège - ULg > Département Argenco : Secteur GEO3 > Hydrogéologie & Géologie de l'environnement >]
[en] groudwater quality ; exploratory data analysis ; principal component analysis ; Self-Organizing Map algorithm ; Kohonen's Self-Organizing Map ; SOMs
[en] Groundwater monitoring networks typically yield large, multivariate datasets. Analysis and interpretation of these datasets starts with an exploratory data analysis in order to summarize the available data, extract useful information and formulate hypotheses for further research. Exploratory data analysis is mostly focussed on finding related variables and groupings of similar observations.
Traditionally multivariate statistical techniques like principal component analysis (PCA) are used for this purpose. In PCA a linear dimensionality reduction of the original, high dimensional dataset is carried out in order to identify orthogonal directions (principal components) of maximum variance in the dataset based on linear combinations of correlated variables. Projections of the original data in the subspace defined by the principal components can be used to identify groups in the data and to reveal relationships between variables (Davis, 1986).
In this study, principal component analysis is compared to Kohonen's self-organizing map (SOM) algorithm. The SOM-algorithm is an artificial neural network technique designed to carry out a non-parametric regression process that is mainly used to represent high-dimensional, nonlinearly related data items in a topology-preserving, often two-dimensional display, and to perform unsupervised classification and clustering (Kohonen, 1995).
Both PCA and SOM are applied to a hydrochemical dataset from a monitoring network in two sandy, phreatic aquifers in Central Belgium. The monitoring network consists of 47 monitoring wells each equipped with three filters at different depths, in which 14 variables are measured. The first aquifer, the Diest sands aquifer is of Late Miocene age and consists of coarse, glauconiferous sands and sandstones (Laga et al., 2001). The second aquifer, the Brussels sands aquifer, is of Middle Eocene age and is an heterogeneous formation consisting of an alteration of highly and poorly calcareous sands, locally silicified (Laga et al., 2001).
Both techniques succeed in distinguishing between both aquifers and reveal the relationships between variables. The main advantage of PCA is the mathematical quantification of correlation between variables and the expression of the original data in the subspace defined by the principal components. The visualization of the SOM-analysis on the other hand allows a straightforward interpretation of the dataset structure in which even non-linear relationships between variables can be identified. Additionally, the SOM-algorithm can handle a limited amount of missing values in the dataset, contrary to PCA.
Aquapôle - AQUAPOLE
Researchers ; Professionals

File(s) associated to this reference

Fulltext file(s):

Open access
publi144-2007.pdfAuthor preprint1.5 MBView/Open

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.