References of "Geurts, Pierre"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailClosed-form dual perturb and combine for tree-based models
Geurts, Pierre ULg; Wehenkel, Louis ULg

in Proceedings of the International Conference on Machine Learning (ICML 2005) (2005)

This paper studies the aggregation of predictions made by tree-based models for several perturbed versions of the attribute vector of a test case. A closed-form approximation of this scheme combined with ... [more ▼]

This paper studies the aggregation of predictions made by tree-based models for several perturbed versions of the attribute vector of a test case. A closed-form approximation of this scheme combined with cross-validation to tune the level of perturbation is proposed. This yields soft-tree models in a parameter free way, and reserves their interpretability. Empirical evaluations, on classification and regression problems, show that accuracy and bias/variance tradeoff are improved significantly at the price of an acceptable computational overhead. The method is further compared and combined with tree bagging. [less ▲]

Detailed reference viewed: 148 (6 ULg)
Full Text
Peer Reviewed
See detailA Machine Learning Approach to Improve Congestion Control over Wireless Computer Networks
Geurts, Pierre ULg; El Khayat, Ibtissam; Leduc, Guy ULg

(2004, November)

In this paper, we present the application of machine learning techniques to the improvement of the congestion control of TCP in wired/wireless networks. TCP is suboptimal in hybrid wired/wireless networks ... [more ▼]

In this paper, we present the application of machine learning techniques to the improvement of the congestion control of TCP in wired/wireless networks. TCP is suboptimal in hybrid wired/wireless networks because it reacts in the same way to losses due to congestion and losses due to link errors. We thus propose to use machine learning techniques to build automatically a loss classifier from a database obtained by simulations of random network topologies. Several machine learning algorithms are compared for this task and the best method for this application turns out to be decision tree boosting. It outperforms ad hoc classifiers proposed in the networking literature. [less ▲]

Detailed reference viewed: 44 (3 ULg)
See detailDiscovery of new rheumatoid arthritis biomarkers using SELDI-TOF-MS ProteinChip approach
de Seny, D. M.; Fillet, Marianne ULg; Meuwis, Marie-Alice ULg et al

in Arthritis and Rheumatism (2004, September), 50(9, Suppl. S), 124

Detailed reference viewed: 31 (11 ULg)
Full Text
Peer Reviewed
See detailA generic approach for image classification based on decision tree ensembles and local sub-windows
Marée, Raphaël ULg; Geurts, Pierre ULg; Piater, Justus ULg et al

in Proceedings of the 6th Asian Conference on Computer Vision (2004)

A novel and generic approach for image classification is presented. The method operates directly on pixel values and does not require feature extraction. It combines a simple local sub-window extraction ... [more ▼]

A novel and generic approach for image classification is presented. The method operates directly on pixel values and does not require feature extraction. It combines a simple local sub-window extraction technique with induction of ensembles of extremely randomized decision trees. We report results on four well known and publicly available datasets corresponding to representative applications of image classification problems: handwritten digits (MNIST), faces (ORL), 3D objects (COIL-100), and textures (OUTEX). A comparison with studies from the computer vision literature shows that our method is competitive with the state of the art, an interesting result considering its generality and conceptual simplicity. Further experiments are carried out on the COIL-100 dataset to evaluate the robustness of the learned models to rotation, scaling, or occlusion of test images. These preliminary results are very encouraging [less ▲]

Detailed reference viewed: 36 (3 ULg)
Full Text
Peer Reviewed
See detailIteratively extending time horizon reinforcement learning
Ernst, Damien ULg; Geurts, Pierre ULg; Wehenkel, Louis ULg

in Machine Learning: ECML 2003, 14th European Conference on Machine Learning (2003)

Reinforcement learning aims to determine an (infinite time horizon) optimal control policy from interaction with a system. It can be solved by approximating the so-called Q-function from a sample of four ... [more ▼]

Reinforcement learning aims to determine an (infinite time horizon) optimal control policy from interaction with a system. It can be solved by approximating the so-called Q-function from a sample of four-tuples (x(t), u(t), r(t), x(t+1)) where x(t) denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and x(t+1) the successor state of the system, and by determining the optimal control from the Q-function. Classical reinforcement learning algorithms use an ad hoc version of stochastic approximation which iterates over the Q-function approximations on a four-tuple by four-tuple basis. In this paper, we reformulate this problem as a sequence of batch mode supervised learning problems which in the limit converges to (an approximation of) the Q-function. Each step of this algorithm uses the full sample of four-tuples gathered from interaction with the system and extends by one step the horizon of the optimality criterion. An advantage of this approach is to allow the use of standard batch mode supervised learning algorithms, instead of the incremental versions used up to now. In addition to a theoretical justification the paper provides empirical tests in the context of the "Car on the Hill" control problem based on the use of ensembles of regression trees. The resulting algorithm is in principle able to handle efficiently large scale reinforcement learning problems. [less ▲]

Detailed reference viewed: 41 (6 ULg)
Full Text
Peer Reviewed
See detailAn empirical comparison of machine learning algorithms for generic image classification
Marée, Raphaël ULg; Geurts, Pierre ULg; Visimberga, Giorgio et al

in Proceedings of the 23rd SGAI international conference on innovative techniques and applications of artificial intelligence, Research and development in intelligent systems XX, (2003)

Detailed reference viewed: 34 (2 ULg)
Full Text
Peer Reviewed
See detailUne méthode générique pour la classification automatique d'images à partir des pixels
Marée, Raphaël ULg; Geurts, Pierre ULg; Wehenkel, Louis ULg

in Revue des Nouvelles Technologies de l'Information (2003), 1

Dans cet article, nous évaluons une approche générique de classification automatique d'images. Elle repose sur une méthode d'apprentissage récente qui construit des ensembles d'arbres de décision par ... [more ▼]

Dans cet article, nous évaluons une approche générique de classification automatique d'images. Elle repose sur une méthode d'apprentissage récente qui construit des ensembles d'arbres de décision par sélection aléatoire des tests directement sur les valeurs basiques des pixels. Nous proposons une variante, également générique, qui réalise une augmentation fictive de la taille des échantillons par extraction et classification de sous-fenêtres des images. Ces deux approches sont évaluées et comparées sur quatre bases de données publiques de problèmes courants: la reconnaissance de chiffres manuscrits, de visages, d'objets 3D et de textures. [less ▲]

Detailed reference viewed: 109 (17 ULg)
Full Text
Peer Reviewed
See detailTraitement de données volumineuses par ensemble d'arbres aléatoires
Geurts, Pierre ULg

in Revue des nouvelles technologies de l'information, Numéro spécial entreposage et fouille de données (2003), 1

Cet article présente une nouvelle méthode d'apprentissage ba-sée sur un ensemble d'arbres de décision. Par opposition à la méthode traditionnelle d'induction, les arbres de l'ensemble sont construits en ... [more ▼]

Cet article présente une nouvelle méthode d'apprentissage ba-sée sur un ensemble d'arbres de décision. Par opposition à la méthode traditionnelle d'induction, les arbres de l'ensemble sont construits en choisissant les tests durant le développement de manière complètement aléatoire. Cette méthode est comparée aux arbres de décision et au bagging sur plusieurs problèmes de classification. Grâce aux choix aléatoires des tests, les temps de calcul de cet algorithme sont comparables à ceux des arbres traditionnels. Dans le même temps, la méthode se révèle beaucoup plus précise que les arbres et souvent significativement meilleure que le bagging. Ces caractéristiques rendent cette méthode particulièrement adaptée pour le traitement de bases de données volumineuses. [less ▲]

Detailed reference viewed: 33 (5 ULg)
Full Text
See detailContributions to decision tree induction: bias/variance tradeoff and time series classification
Geurts, Pierre ULg

Doctoral thesis (2002)

Because of the rapid progress of computer and information technology, large amounts of data are nowadays available in a lot of domains. Automatic learning aims at developing algorithms able to produce ... [more ▼]

Because of the rapid progress of computer and information technology, large amounts of data are nowadays available in a lot of domains. Automatic learning aims at developing algorithms able to produce synthetic high-level information, or models, from this data. Learning algorithms are generally evaluated according to three different criteria: interpretability (how well the model helps to understand the data), predictive accuracy (how well the model can predict unseen situations), and computational efficiency (how fast is the algorithm and how it scales to large databases). This thesis explores two issues in automatic learning: the improvement of the well-known decision tree induction method and the problem of learning classification models for time series data. Decision tree induction method is an automatic learning algorithm which focuses on the modeling of input/output relationships. While this algorithm is among the fastest and most interpretable methods, its accuracy is not always competitive with respect to other algorithms. It is commonly admitted that this suboptimality is due to the excessive variance of this method. We first carry out an empirical study which shows quantitatively how important this variance is, i.e. how strongly decision trees depend on the random nature of the database used to infer them. These experiments confirm that this variance is detrimental not only from the point of view of accuracy but also from the point of view of interpretability. With the goal of improving both interpretability and accuracy, we consider three variance reduction techniques for decision trees. First, in the goal of improving mainly interpretability, we propose several methods which try to stabilize the parameters chosen during tree induction. While these methods succeed in reducing the variability of the parameters, they produce only a slight improvement of the accuracy. Then, we consider perturb and combine algorithms (e.g. bagging, boosting) which consist in combining the predictions of several models obtained by randomizing in some way the learning process. Inspired by the high variance of the parameters defining a decision tree, we propose an extremely randomized decision tree induction algorithm, called extra-tree, which chooses all parameters at random during induction. The aggregation of several of these extra-trees gives an important reduction of variance and this algorithm compares favorably in terms of accuracy and computational efficiency with both bagging and boosting. Because of the randomization of the parameters, the resulting method is also competitive with classical decision tree induction in terms of computational efficiency. In addition to these two approaches, we propose a ``dual'' perturb and combine algorithm which delays the perturbation at the prediction stage and hence requires only one model. In combination with decision tree, this method actually bridges the gap between single decision trees and perturb and combine algorithms. Of the first, it saves the interpretability (by using only one model), and with perturb and combine algorithm, it shares some of the accuracy (by reducing the variance). The second topic of the thesis is the problem of time series classification. The most direct way to solve this problem is to apply existing learning algorithms on low-level variables which correspond to the values of a time series at several time points. Experiments with the tree-based algorithms studied in the first part of the thesis shows that this approach is limited. A variance reduction techniques is then proposed specifically for this kind of data which consists in aggregating the prediction given by a classification model for subsequences of time series. Since this method does not provide interpretable models, we propose a second method which extends decision tree tests by allowing them to detect local shift invariant properties, or patterns, in time series. The study proposed in this part of the thesis is only a first step in the domain but our conclusions give some future work directions for handling complex type of data with automatic learning methods. [less ▲]

Detailed reference viewed: 270 (24 ULg)
Full Text
Peer Reviewed
See detailImproving the bias/variance tradeoff of decision trees - towards soft tree induction
Geurts, Pierre ULg; Olaru, Cristina; Wehenkel, Louis ULg

in Engineering intelligent systems (2001), 9

One of the main difficulties with standard top down induction of decision trees comes from the high variance of these methods. High variance means that, for a given problem and sample size, the resulting ... [more ▼]

One of the main difficulties with standard top down induction of decision trees comes from the high variance of these methods. High variance means that, for a given problem and sample size, the resulting tree is strongly dependent on the random nature of the particular sample used for training. Consequently, these algorithms tend to be suboptimal in terms of accuracy and interpretability. This paper analyses this problem in depth and proposes a new method, relying on threshold softening, able to significantly improve the bias/variance tradeoff of decision trees. The algorithm is validated on a number of benchmark problems and its relationship with fuzzy decision tree induction is discussed. This sheds some light on the success of fuzzy decision tree induction and improves our understanding of machine learning, in general. [less ▲]

Detailed reference viewed: 47 (2 ULg)
Full Text
Peer Reviewed
See detailPattern extraction for time-series classification
Geurts, Pierre ULg

in Proceedings of PKDD 2001, 5th European Conference on Principles of Data Mining and Knowledge Discovery (2001)

In this paper, we propose some new tools to allow machine learning classifiers to cope with time series data. We first argue that many time-series classification problems can be solved by detecting and ... [more ▼]

In this paper, we propose some new tools to allow machine learning classifiers to cope with time series data. We first argue that many time-series classification problems can be solved by detecting and combining local properties or patterns in time series. Then, a technique is proposed to find patterns which are useful for classification. These patterns are combined to build interpretable classification rules. Experiments, carried out on several artificial and real problems, highlight the interest of the approach both in terms of interpretability and accuracy of the induced classifiers. [less ▲]

Detailed reference viewed: 100 (2 ULg)
Full Text
Peer Reviewed
See detailDual Perturb and Combine Algorithm
Geurts, Pierre ULg

in Proceedings of AISTATS 2001, Eighth International Workshop on Artificial Intelligence and Statistics (2001)

In this paper, a dual perturb and combine algorithm is proposed which consists in producing the perturbed predictions at the prediction stage using only one model. To this end, the attribute vector of a ... [more ▼]

In this paper, a dual perturb and combine algorithm is proposed which consists in producing the perturbed predictions at the prediction stage using only one model. To this end, the attribute vector of a test case is perturbed several times by an additive random noise, the model is applied to each of these perturbed vectors and the resulting predictions are aggregated. An analytical version of this algorithm is described in the context of decision tree induction. From experiments on several datasets, it appears that this simple algorithm yields significant improvements on several problems, sometimes comparable to those obtained with bagging. When combined with decision tree bagging, this algorithm also improves accuracy in many problems. [less ▲]

Detailed reference viewed: 103 (2 ULg)
Full Text
Peer Reviewed
See detailTemporal machine learning for switching control
Geurts, Pierre ULg; Wehenkel, Louis ULg

in Proceedings of PKDD 2000, 4th European Conference on Principles of Data Mining and Knowledge Discovery (2000)

In this paper, a temporal machine learning method is presented which is able to automatically construct rules allowing to detect as soon as possible an event using past and present measurements made on a ... [more ▼]

In this paper, a temporal machine learning method is presented which is able to automatically construct rules allowing to detect as soon as possible an event using past and present measurements made on a complex system. This method can take as inputs dynamic scenarios directly described by temporal variables and provides easily readable results in the form of detection trees. The application of this method is discussed in the context of switching control. Switching (or discrete event) control of continuous systems consists in changing the structure of a system in such a way as to contreol its behavior. Given a particular discrete control switch, detection trees are applied to the induction of rules which decide based on the available measurements whether or not to operate a switch. Two practical applications are discussed in the context of electrical power systems emergency control. [less ▲]

Detailed reference viewed: 19 (0 ULg)
Full Text
Peer Reviewed
See detailSome enhancements of decision tree bagging
Geurts, Pierre ULg

in Proceedings of PKDD 2000, 4th European Conference on Principles of Data Mining and Knowledge Discovery (2000)

This paper investigates enhancements of decision tree bagging which mainly aims at improving computation times, but also accuracy. The three questions which are reconsidered are: discretization of ... [more ▼]

This paper investigates enhancements of decision tree bagging which mainly aims at improving computation times, but also accuracy. The three questions which are reconsidered are: discretization of continuous attributes, tree pruning, and sampling schemes. A very simple discretization procedure is proposed, resulting in a dramatic speedup without significant decrease in accuracy. Then a new method is proposed to prune an ensemble of trees in a combined fashion, which is significantly more effective than individual pruning. Finally, different resampling schemes are considered leading to different CPU time/accuracy tradeoffs. Combining all these enhancements makes it possible to apply tree bagging to very large datasets, with computational performances similar to single tree induction. Simulations are carried out on two synthetic databases and four real-life datasets. [less ▲]

Detailed reference viewed: 13 (0 ULg)
Full Text
Peer Reviewed
See detailInvestigation and reduction of discretization Variance in decision tree induction
Geurts, Pierre ULg; Wehenkel, Louis ULg

in Proceedings of ECML 2000, European Conference on Machine Learning (2000)

This paper focuses on the variance introduced by the discretization techniques used to handle continuous attributes in decision tree induction. Different discretization procedures are first studied ... [more ▼]

This paper focuses on the variance introduced by the discretization techniques used to handle continuous attributes in decision tree induction. Different discretization procedures are first studied empirically, then means to reduce the discretization variance are proposed. The experiments shows that discretization variance is large and that it is possible to reduce it significantly without notable computational costs. The resulting variance reduction mainly improves interpretability and stability of decision trees, and marginally their accuracy. [less ▲]

Detailed reference viewed: 7 (1 ULg)
Full Text
Peer Reviewed
See detailData mining tools and application in power system engineering
Olaru, Cristina; Geurts, Pierre ULg; Wehenkel, Louis ULg

in Proceedings of the 13th Power System Computation Conference, PSCC99 (1999)

The power system field is presently facing an explosive growth of data. The data mining (DM) approach provides tools for making explicit some implicit subtle structure in data. Applying data mining to ... [more ▼]

The power system field is presently facing an explosive growth of data. The data mining (DM) approach provides tools for making explicit some implicit subtle structure in data. Applying data mining to power system engineering is an iterative and interactive process, requiring an acquainted user with the application specifics. The paper describes data mining tools like statistical methos, visualization, machine learning and neural networks, exemplifying by results obtained with a DM software developed for dynamic security assessment studies. Power system engineering applications where data mining would be useful are reviewed in the second part of the paper. [less ▲]

Detailed reference viewed: 123 (0 ULg)
Full Text
Peer Reviewed
See detailVisualizing dynamic power system scenarios for data mining
Geurts, Pierre ULg; Wehenkel, Louis ULg

in Proceedings of LESCOPE 98, Large Engineering Syst. Conf. on Power Engineering (1998)

Detailed reference viewed: 16 (0 ULg)
Full Text
Peer Reviewed
See detailEarly prediction of electric power system blackouts by temporal machine learning
Geurts, Pierre ULg; Wehenkel, Louis ULg

in Proceedings of ICML-AAAI 98 Workshop on "Predicting the future: AI approaches to time series analysis" (1998)

This paper discusses the application of machine learning to the design of power system blackout prediction criteria, using a large database of random power system scenarios generated by Monte-Carlo ... [more ▼]

This paper discusses the application of machine learning to the design of power system blackout prediction criteria, using a large database of random power system scenarios generated by Monte-Carlo simulation. Each scenario is described by temporal variables and sequences of events describing the dynamics of the system as it might be observed from real-time measurements. The aime is to exploit the data base in order to derive as simple as possible rules which would allow to detect an incipient blackout early enough to prevent or mitigate it. We propose a novel "temporal tree induction" algorithm in order to exploit temporal attributes and reach a compromise between the degree of anticipation and selectivity of detection rules. Tests are carried out on a a data base related to voltage collapse of an existing large scale power system. [less ▲]

Detailed reference viewed: 45 (1 ULg)