References of "Geurts, Pierre"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailUne méthode générique pour la classification automatique d'images à partir des pixels
Marée, Raphaël ULg; Geurts, Pierre ULg; Wehenkel, Louis ULg

in Revue des Nouvelles Technologies de l'Information (2003), 1

Dans cet article, nous évaluons une approche générique de classification automatique d'images. Elle repose sur une méthode d'apprentissage récente qui construit des ensembles d'arbres de décision par ... [more ▼]

Dans cet article, nous évaluons une approche générique de classification automatique d'images. Elle repose sur une méthode d'apprentissage récente qui construit des ensembles d'arbres de décision par sélection aléatoire des tests directement sur les valeurs basiques des pixels. Nous proposons une variante, également générique, qui réalise une augmentation fictive de la taille des échantillons par extraction et classification de sous-fenêtres des images. Ces deux approches sont évaluées et comparées sur quatre bases de données publiques de problèmes courants: la reconnaissance de chiffres manuscrits, de visages, d'objets 3D et de textures. [less ▲]

Detailed reference viewed: 99 (17 ULg)
Full Text
Peer Reviewed
See detailTraitement de données volumineuses par ensemble d'arbres aléatoires
Geurts, Pierre ULg

in Revue des nouvelles technologies de l'information, Numéro spécial entreposage et fouille de données (2003), 1

Cet article présente une nouvelle méthode d'apprentissage ba-sée sur un ensemble d'arbres de décision. Par opposition à la méthode traditionnelle d'induction, les arbres de l'ensemble sont construits en ... [more ▼]

Cet article présente une nouvelle méthode d'apprentissage ba-sée sur un ensemble d'arbres de décision. Par opposition à la méthode traditionnelle d'induction, les arbres de l'ensemble sont construits en choisissant les tests durant le développement de manière complètement aléatoire. Cette méthode est comparée aux arbres de décision et au bagging sur plusieurs problèmes de classification. Grâce aux choix aléatoires des tests, les temps de calcul de cet algorithme sont comparables à ceux des arbres traditionnels. Dans le même temps, la méthode se révèle beaucoup plus précise que les arbres et souvent significativement meilleure que le bagging. Ces caractéristiques rendent cette méthode particulièrement adaptée pour le traitement de bases de données volumineuses. [less ▲]

Detailed reference viewed: 30 (5 ULg)
Full Text
See detailContributions to decision tree induction: bias/variance tradeoff and time series classification
Geurts, Pierre ULg

Doctoral thesis (2002)

Because of the rapid progress of computer and information technology, large amounts of data are nowadays available in a lot of domains. Automatic learning aims at developing algorithms able to produce ... [more ▼]

Because of the rapid progress of computer and information technology, large amounts of data are nowadays available in a lot of domains. Automatic learning aims at developing algorithms able to produce synthetic high-level information, or models, from this data. Learning algorithms are generally evaluated according to three different criteria: interpretability (how well the model helps to understand the data), predictive accuracy (how well the model can predict unseen situations), and computational efficiency (how fast is the algorithm and how it scales to large databases). This thesis explores two issues in automatic learning: the improvement of the well-known decision tree induction method and the problem of learning classification models for time series data. Decision tree induction method is an automatic learning algorithm which focuses on the modeling of input/output relationships. While this algorithm is among the fastest and most interpretable methods, its accuracy is not always competitive with respect to other algorithms. It is commonly admitted that this suboptimality is due to the excessive variance of this method. We first carry out an empirical study which shows quantitatively how important this variance is, i.e. how strongly decision trees depend on the random nature of the database used to infer them. These experiments confirm that this variance is detrimental not only from the point of view of accuracy but also from the point of view of interpretability. With the goal of improving both interpretability and accuracy, we consider three variance reduction techniques for decision trees. First, in the goal of improving mainly interpretability, we propose several methods which try to stabilize the parameters chosen during tree induction. While these methods succeed in reducing the variability of the parameters, they produce only a slight improvement of the accuracy. Then, we consider perturb and combine algorithms (e.g. bagging, boosting) which consist in combining the predictions of several models obtained by randomizing in some way the learning process. Inspired by the high variance of the parameters defining a decision tree, we propose an extremely randomized decision tree induction algorithm, called extra-tree, which chooses all parameters at random during induction. The aggregation of several of these extra-trees gives an important reduction of variance and this algorithm compares favorably in terms of accuracy and computational efficiency with both bagging and boosting. Because of the randomization of the parameters, the resulting method is also competitive with classical decision tree induction in terms of computational efficiency. In addition to these two approaches, we propose a ``dual'' perturb and combine algorithm which delays the perturbation at the prediction stage and hence requires only one model. In combination with decision tree, this method actually bridges the gap between single decision trees and perturb and combine algorithms. Of the first, it saves the interpretability (by using only one model), and with perturb and combine algorithm, it shares some of the accuracy (by reducing the variance). The second topic of the thesis is the problem of time series classification. The most direct way to solve this problem is to apply existing learning algorithms on low-level variables which correspond to the values of a time series at several time points. Experiments with the tree-based algorithms studied in the first part of the thesis shows that this approach is limited. A variance reduction techniques is then proposed specifically for this kind of data which consists in aggregating the prediction given by a classification model for subsequences of time series. Since this method does not provide interpretable models, we propose a second method which extends decision tree tests by allowing them to detect local shift invariant properties, or patterns, in time series. The study proposed in this part of the thesis is only a first step in the domain but our conclusions give some future work directions for handling complex type of data with automatic learning methods. [less ▲]

Detailed reference viewed: 231 (22 ULg)
Full Text
Peer Reviewed
See detailImproving the bias/variance tradeoff of decision trees - towards soft tree induction
Geurts, Pierre ULg; Olaru, Cristina; Wehenkel, Louis ULg

in Engineering intelligent systems (2001), 9

One of the main difficulties with standard top down induction of decision trees comes from the high variance of these methods. High variance means that, for a given problem and sample size, the resulting ... [more ▼]

One of the main difficulties with standard top down induction of decision trees comes from the high variance of these methods. High variance means that, for a given problem and sample size, the resulting tree is strongly dependent on the random nature of the particular sample used for training. Consequently, these algorithms tend to be suboptimal in terms of accuracy and interpretability. This paper analyses this problem in depth and proposes a new method, relying on threshold softening, able to significantly improve the bias/variance tradeoff of decision trees. The algorithm is validated on a number of benchmark problems and its relationship with fuzzy decision tree induction is discussed. This sheds some light on the success of fuzzy decision tree induction and improves our understanding of machine learning, in general. [less ▲]

Detailed reference viewed: 44 (2 ULg)
Full Text
Peer Reviewed
See detailPattern extraction for time-series classification
Geurts, Pierre ULg

in Proceedings of PKDD 2001, 5th European Conference on Principles of Data Mining and Knowledge Discovery (2001)

In this paper, we propose some new tools to allow machine learning classifiers to cope with time series data. We first argue that many time-series classification problems can be solved by detecting and ... [more ▼]

In this paper, we propose some new tools to allow machine learning classifiers to cope with time series data. We first argue that many time-series classification problems can be solved by detecting and combining local properties or patterns in time series. Then, a technique is proposed to find patterns which are useful for classification. These patterns are combined to build interpretable classification rules. Experiments, carried out on several artificial and real problems, highlight the interest of the approach both in terms of interpretability and accuracy of the induced classifiers. [less ▲]

Detailed reference viewed: 92 (2 ULg)
Full Text
Peer Reviewed
See detailDual Perturb and Combine Algorithm
Geurts, Pierre ULg

in Proceedings of AISTATS 2001, Eighth International Workshop on Artificial Intelligence and Statistics (2001)

In this paper, a dual perturb and combine algorithm is proposed which consists in producing the perturbed predictions at the prediction stage using only one model. To this end, the attribute vector of a ... [more ▼]

In this paper, a dual perturb and combine algorithm is proposed which consists in producing the perturbed predictions at the prediction stage using only one model. To this end, the attribute vector of a test case is perturbed several times by an additive random noise, the model is applied to each of these perturbed vectors and the resulting predictions are aggregated. An analytical version of this algorithm is described in the context of decision tree induction. From experiments on several datasets, it appears that this simple algorithm yields significant improvements on several problems, sometimes comparable to those obtained with bagging. When combined with decision tree bagging, this algorithm also improves accuracy in many problems. [less ▲]

Detailed reference viewed: 79 (2 ULg)
Full Text
Peer Reviewed
See detailTemporal machine learning for switching control
Geurts, Pierre ULg; Wehenkel, Louis ULg

in Proceedings of PKDD 2000, 4th European Conference on Principles of Data Mining and Knowledge Discovery (2000)

In this paper, a temporal machine learning method is presented which is able to automatically construct rules allowing to detect as soon as possible an event using past and present measurements made on a ... [more ▼]

In this paper, a temporal machine learning method is presented which is able to automatically construct rules allowing to detect as soon as possible an event using past and present measurements made on a complex system. This method can take as inputs dynamic scenarios directly described by temporal variables and provides easily readable results in the form of detection trees. The application of this method is discussed in the context of switching control. Switching (or discrete event) control of continuous systems consists in changing the structure of a system in such a way as to contreol its behavior. Given a particular discrete control switch, detection trees are applied to the induction of rules which decide based on the available measurements whether or not to operate a switch. Two practical applications are discussed in the context of electrical power systems emergency control. [less ▲]

Detailed reference viewed: 17 (0 ULg)
Full Text
Peer Reviewed
See detailSome enhancements of decision tree bagging
Geurts, Pierre ULg

in Proceedings of PKDD 2000, 4th European Conference on Principles of Data Mining and Knowledge Discovery (2000)

This paper investigates enhancements of decision tree bagging which mainly aims at improving computation times, but also accuracy. The three questions which are reconsidered are: discretization of ... [more ▼]

This paper investigates enhancements of decision tree bagging which mainly aims at improving computation times, but also accuracy. The three questions which are reconsidered are: discretization of continuous attributes, tree pruning, and sampling schemes. A very simple discretization procedure is proposed, resulting in a dramatic speedup without significant decrease in accuracy. Then a new method is proposed to prune an ensemble of trees in a combined fashion, which is significantly more effective than individual pruning. Finally, different resampling schemes are considered leading to different CPU time/accuracy tradeoffs. Combining all these enhancements makes it possible to apply tree bagging to very large datasets, with computational performances similar to single tree induction. Simulations are carried out on two synthetic databases and four real-life datasets. [less ▲]

Detailed reference viewed: 11 (0 ULg)
Full Text
Peer Reviewed
See detailInvestigation and reduction of discretization Variance in decision tree induction
Geurts, Pierre ULg; Wehenkel, Louis ULg

in Proceedings of ECML 2000, European Conference on Machine Learning (2000)

This paper focuses on the variance introduced by the discretization techniques used to handle continuous attributes in decision tree induction. Different discretization procedures are first studied ... [more ▼]

This paper focuses on the variance introduced by the discretization techniques used to handle continuous attributes in decision tree induction. Different discretization procedures are first studied empirically, then means to reduce the discretization variance are proposed. The experiments shows that discretization variance is large and that it is possible to reduce it significantly without notable computational costs. The resulting variance reduction mainly improves interpretability and stability of decision trees, and marginally their accuracy. [less ▲]

Detailed reference viewed: 7 (1 ULg)
Full Text
Peer Reviewed
See detailData mining tools and application in power system engineering
Olaru, Cristina; Geurts, Pierre ULg; Wehenkel, Louis ULg

in Proceedings of the 13th Power System Computation Conference, PSCC99 (1999)

The power system field is presently facing an explosive growth of data. The data mining (DM) approach provides tools for making explicit some implicit subtle structure in data. Applying data mining to ... [more ▼]

The power system field is presently facing an explosive growth of data. The data mining (DM) approach provides tools for making explicit some implicit subtle structure in data. Applying data mining to power system engineering is an iterative and interactive process, requiring an acquainted user with the application specifics. The paper describes data mining tools like statistical methos, visualization, machine learning and neural networks, exemplifying by results obtained with a DM software developed for dynamic security assessment studies. Power system engineering applications where data mining would be useful are reviewed in the second part of the paper. [less ▲]

Detailed reference viewed: 108 (0 ULg)
Full Text
Peer Reviewed
See detailVisualizing dynamic power system scenarios for data mining
Geurts, Pierre ULg; Wehenkel, Louis ULg

in Proceedings of LESCOPE 98, Large Engineering Syst. Conf. on Power Engineering (1998)

Detailed reference viewed: 15 (0 ULg)
Full Text
Peer Reviewed
See detailEarly prediction of electric power system blackouts by temporal machine learning
Geurts, Pierre ULg; Wehenkel, Louis ULg

in Proceedings of ICML-AAAI 98 Workshop on "Predicting the future: AI approaches to time series analysis" (1998)

This paper discusses the application of machine learning to the design of power system blackout prediction criteria, using a large database of random power system scenarios generated by Monte-Carlo ... [more ▼]

This paper discusses the application of machine learning to the design of power system blackout prediction criteria, using a large database of random power system scenarios generated by Monte-Carlo simulation. Each scenario is described by temporal variables and sequences of events describing the dynamics of the system as it might be observed from real-time measurements. The aime is to exploit the data base in order to derive as simple as possible rules which would allow to detect an incipient blackout early enough to prevent or mitigate it. We propose a novel "temporal tree induction" algorithm in order to exploit temporal attributes and reach a compromise between the degree of anticipation and selectivity of detection rules. Tests are carried out on a a data base related to voltage collapse of an existing large scale power system. [less ▲]

Detailed reference viewed: 42 (1 ULg)