[en] pattern recognition ; steel industry ; chemical compounds ; discriminant analysis
[en] The aim of the present paper is to test and compare some mathematical discrimination or pattern recognition methods in order to identify one among three air pollution sources.The basic material in this work are : - the standard gas chromatography/mass spectrometry analysis with concentrations values for different chemical compounds observed at about hundred sites in iron and steel industries : either coking plants, or cold rolling mills, or coil-coating plants. - the odours levels, measured by the method of dilution at the threshold. Generally speaking, coil-coating plants emit more chemical compounds than two others, however olfactive threshold is about the same in the three cases. Sums of compounds concentrations for different chemical families (e.g. alkanes, alcohol's, phenols, ...) are calculated and used for the mathematical treatments. Because the frequency histogram of the observed concentrations has a log normal distribution, logarithm of concentration is chosen as operating variable instead of concentration itself. The stepwise multiple linear regression is used in order to express the dilution factor at the olfactory perception threshold as a linear combination of the logarithm of concentrations. Results show that this factor can be estimated quite well on the basis of the concentrations of compounds belonging only to three chemical families. Nevertheless, these families differ following the emission source. To perform the optimal odour description of the three types of industrial plants a larger set of compounds families should be examined in a first step. Factor analysis is used with the aim of regrouping observed variables that are linked together in the plane of the first two factors. The representation of observation points is then superposed on this scheme. Some variable clusters are observed. The method, applied to the whole variables set, classes rather well the observations : the three emissions sources are well discriminated. Cluster analysis, discriminant analysis and neural network have also been used, but only with three variables common to all installations : concentrations of alkylbenzenes, polycyclic aromatics and alkanes. Cluster analysis is not a "supervised method". That means that it is free to create itself some groups showing similar behaviour with regard to the observed variables. Indeed, results show that cluster analysis does not allow to recognise the emission source. In discriminant analysis, target groups membership is provided for each set of variables values : it is a supervised method. The classification functions so deduced allow to discriminate quite well the three industrial plants and to predict the emission origin. A total of 87 % of the observations are correctly on the basis of the classification functions. Neural network approach, with backpropagation algorithm, is a so-called "pattern recognition technique" and is also a supervised method. It provides still better results than statistical ones. After the learning phase, the network is able to identify correctly 99.5 % of observations from 95.5 % the training set.The concentrations of compounds belonging to three chemical families, combined with a pattern recognition technique based either on discriminant analysis, or on neural network, prove to be efficient for recognising the source of odorous gaseous effluents. For the data set considered in this paper, the neural network exhibits better classification performances than discriminant analysis, but its learning phase is slower. However, applying one method or the other for further recognition of unknown patterns is quick and easy.