References of "Geurts, Pierre"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailTree-based batch mode reinforcement learning
Ernst, Damien ULg; Geurts, Pierre ULg; Wehenkel, Louis ULg

in Journal of Machine Learning Research (2005), 6

Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the so ... [more ▼]

Reinforcement learning aims to determine an optimal control policy from interaction with a system or from observations gathered from a system. In batch mode, it can be achieved by approximating the so-called Q-function based on a set of four-tuples (x(t), u(t), r(t), x(t+1)) where x(t) denotes the system state at time t, u(t) the control action taken, r(t) the instantaneous reward obtained and x(t+1) the successor state of the system, and by determining the control policy from this Q-function. The Q-function approximation may be obtained from the limit of a sequence of (batch mode) supervised learning problems. Within this framework we describe the use of several classical tree-based supervised learning methods (CART, Kd-tree, tree bagging) and two newly proposed ensemble algorithms, namely extremely and totally randomized trees. We study their performances on several examples and find that the ensemble methods based on regression trees perform well in extracting relevant information about the optimal control policy from sets of four-tuples. In particular, the totally randomized trees give good results while ensuring the convergence of the sequence, whereas by relaxing the convergence constraint even better accuracy results are provided by the extremely randomized trees. [less ▲]

Detailed reference viewed: 374 (47 ULg)
Full Text
Peer Reviewed
See detailApproximate value iteration in the reinforcement learning context. Application to electrical power system control
Ernst, Damien ULg; Glavic, Mevludin; Geurts, Pierre ULg et al

in International Journal of Emerging Electrical Power Systems (2005), 3(1),

In this paper we explain how to design intelligent agents able to process the information acquired from interaction with a system to learn a good control policy and show how the methodology can be applied ... [more ▼]

In this paper we explain how to design intelligent agents able to process the information acquired from interaction with a system to learn a good control policy and show how the methodology can be applied to control some devices aimed to damp electrical power oscillations. The control problem is formalized as a discrete-time optimal control problem and the information acquired from interaction with the system is a set of samples, where each sample is composed of four elements: a state, the action taken while being in this state, the instantaneous reward observed and the successor state of the system. To process this information we consider reinforcement learning algorithms that determine an approximation of the so-called Q-function by mimicking the behavior of the value iteration algorithm. Simulations are first carried on a benchmark power system modeled with two state variables. Then we present a more complex case study on a four-machine power system where the reinforcement learning algorithm controls a Thyristor Controlled Series Capacitor (TCSC) aimed to damp power system oscillations. [less ▲]

Detailed reference viewed: 53 (4 ULg)
Full Text
Peer Reviewed
See detailRandom Subwindows for Robust Image Classification
Marée, Raphaël ULg; Geurts, Pierre ULg; Piater, Justus ULg et al

in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2005) (2005)

We present a novel, generic image classification method based on a recent machine learning algorithm (ensembles of extremely randomized decision trees). Images are classified using randomly extracted ... [more ▼]

We present a novel, generic image classification method based on a recent machine learning algorithm (ensembles of extremely randomized decision trees). Images are classified using randomly extracted subwindows that are suitably normalized to yield robustness to certain image transformations. Our method is evaluated on four very different, publicly available datasets (COIL-100, ZuBuD, ETH-80, WANG). Our results show that our automatic approach is generic and robust to illumination, scale, and viewpoint changes. An extension of the method is proposed to improve its robustness with respect to rotation changes. [less ▲]

Detailed reference viewed: 84 (9 ULg)
Full Text
Peer Reviewed
See detailDecision Trees and Random Subwindows for Object Recognition
Marée, Raphaël ULg; Geurts, Pierre ULg; Piater, Justus ULg et al

in ICML workshop on Machine Learning Techniques for Processing Multimedia Content (MLMM2005) (2005)

In this paper, we compare five tree-based machine learning methods within a recent generic image classification framework based on random extraction and classification of subwindows. We evaluate them on ... [more ▼]

In this paper, we compare five tree-based machine learning methods within a recent generic image classification framework based on random extraction and classification of subwindows. We evaluate them on three publicly available object recognition datasets (COIL-100, ETH-80, and ZuBuD). Our comparison shows that this general and conceptually simple framework yields good results when combined with ensemble of decision trees, especially when using Tree Boosting or Extra-Trees. The latter is also particularly attractive in terms of computational efficiency. [less ▲]

Detailed reference viewed: 62 (2 ULg)
Full Text
Peer Reviewed
See detailSegment and combine approach for Biological Sequence Classification
Geurts, Pierre ULg; Blanco Cuesta, Antia; Wehenkel, Louis ULg

in Proc. IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB 2005) (2005)

This paper presents a new algorithm based on the segment and combine paradigm, for automatic classification of biological sequences. It classifies sequences by aggregating the information about their ... [more ▼]

This paper presents a new algorithm based on the segment and combine paradigm, for automatic classification of biological sequences. It classifies sequences by aggregating the information about their subsequences predicted by a classifier derived by machine learning from a random sample of training subsequences. This generic approach is combined with decision tree based ensemble methods, scalable both with respect to sample size and vocabulary size. The method is applied to three families of problems: DNA sequence recognition, splice junction detection, and gene regulon prediction. With respect to standard approaches based on n-grams, it appears competitive in terms of accuracy, flexibility, and scalability. The paper also highlights the possibility to exploit the resulting models to identify interpretable patterns specific of a given class of biological sequences. [less ▲]

Detailed reference viewed: 48 (3 ULg)
Full Text
See detailBias vs. variance decomposition for regression and classification
Geurts, Pierre ULg

in Maimon, O.; Rokach, L. (Eds.) Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers (2005)

In this chapter, the important concepts of bias and variance are introduced. After an intuitive introduction to the bias/variance tradeoff, we discuss the bias/variance decompositions of the mean square ... [more ▼]

In this chapter, the important concepts of bias and variance are introduced. After an intuitive introduction to the bias/variance tradeoff, we discuss the bias/variance decompositions of the mean square error (in the context of regression problems) and of the mean misclassification error (in the context of classification problems). Then, we carry out a small empirical study providing some insight about how the parameters of a learning algorithm nfluence bias and variance. [less ▲]

Detailed reference viewed: 128 (15 ULg)
Full Text
Peer Reviewed
See detailProteomic mass spectra classification using decision tree based ensemble methods.
Geurts, Pierre ULg; Fillet, Marianne ULg; De Seny, Dominique ULg et al

in Bioinformatics (2005), 21(14), 3138-45

MOTIVATION: Modern mass spectrometry allows the determination of proteomic fingerprints of body fluids like serum, saliva or urine. These measurements can be used in many medical applications in order to ... [more ▼]

MOTIVATION: Modern mass spectrometry allows the determination of proteomic fingerprints of body fluids like serum, saliva or urine. These measurements can be used in many medical applications in order to diagnose the current state or predict the evolution of a disease. Recent developments in machine learning allow one to exploit such datasets, characterized by small numbers of very high-dimensional samples. RESULTS: We propose a systematic approach based on decision tree ensemble methods, which is used to automatically determine proteomic biomarkers and predictive models. The approach is validated on two datasets of surface-enhanced laser desorption/ionization time of flight measurements, for the diagnosis of rheumatoid arthritis and inflammatory bowel diseases. The results suggest that the methodology can handle a broad class of similar problems. [less ▲]

Detailed reference viewed: 75 (17 ULg)
Full Text
Peer Reviewed
See detailSegment and combine approach for non-parametric time-series classification
Geurts, Pierre ULg; Wehenkel, Louis ULg

in Lecture Notes in Computer Science (2005), 3721

This paper presents a novel, generic, scalable, autonomous, and flexible supervised learning algorithm for the classification of multivariate and variable length time series. The essential ingredients of ... [more ▼]

This paper presents a novel, generic, scalable, autonomous, and flexible supervised learning algorithm for the classification of multivariate and variable length time series. The essential ingredients of the algorithm are randomization, segmentation of time-series, decision tree ensemble based learning of subseries classifiers, combination of subseries classification by voting, and cross-validation based temporal resolution adaptation. Experiments are carried out with this method on 10 synthetic and real-world datasets. They highlight the good behavior of the algorithm on a large diversity of problems. Our results are also highly competitive with existing approaches from the literature. [less ▲]

Detailed reference viewed: 41 (5 ULg)
Full Text
Peer Reviewed
See detailBiomedical image classification with random subwindows and decision trees
Marée, Raphaël ULg; Geurts, Pierre ULg; Piater, Justus ULg et al

in Computer Vision for Biomedical Image Applications (2005)

In this paper, we address a problem of biomedical image classification that involves the automatic classification of x-ray images in 57 predefined classes with large intra-class variability. To achieve ... [more ▼]

In this paper, we address a problem of biomedical image classification that involves the automatic classification of x-ray images in 57 predefined classes with large intra-class variability. To achieve that goal, we apply and slightly adapt a recent generic method for image classification based on ensemble of decision trees and random subwindows. We obtain classification results close to the state of the art on a publicly available database of 10000 x-ray images. We also provide some clues to interpret the classification of each image in terms of subwindow relevance. [less ▲]

Detailed reference viewed: 99 (28 ULg)
Full Text
Peer Reviewed
See detailDiscovery of new rheumatoid arthritis biomarkers using the surface-enhanced laser desorption/ionization time-of-flight mass spectrometry ProteinChip approach.
De Seny, Dominique ULg; Fillet, Marianne ULg; Meuwis, Marie-Alice ULg et al

in Arthritis and Rheumatism (2005), 52(12), 3801-12

OBJECTIVE: To identify serum protein biomarkers specific for rheumatoid arthritis (RA), using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS) technology ... [more ▼]

OBJECTIVE: To identify serum protein biomarkers specific for rheumatoid arthritis (RA), using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS) technology. METHODS: A total of 103 serum samples from patients and healthy controls were analyzed. Thirty-four of the patients had a diagnosis of RA, based on the American College of Rheumatology criteria. The inflammation control group comprised 20 patients with psoriatic arthritis (PsA), 9 with asthma, and 10 with Crohn's disease. The noninflammation control group comprised 14 patients with knee osteoarthritis and 16 healthy control subjects. Serum protein profiles were obtained by SELDI-TOF-MS and compared in order to identify new biomarkers specific for RA. Data were analyzed by a machine learning algorithm called decision tree boosting, according to different preprocessing steps. RESULTS: The most discriminative mass/charge (m/z) values serving as potential biomarkers for RA were identified on arrays for both patients with RA versus controls and patients with RA versus patients with PsA. From among several candidates, the following peaks were highlighted: m/z values of 2,924 (RA versus controls on H4 arrays), 10,832 and 11,632 (RA versus controls on CM10 arrays), 4,824 (RA versus PsA on H4 arrays), and 4,666 (RA versus PsA on CM10 arrays). Positive results of proteomic analysis were associated with positive results of the anti-cyclic citrullinated peptide test. Our observations suggested that the 10,832 peak could represent myeloid-related protein 8. CONCLUSION: SELDI-TOF-MS technology allows rapid analysis of many serum samples, and use of decision tree boosting analysis as the main statistical method allowed us to propose a pattern of protein peaks specific for RA. [less ▲]

Detailed reference viewed: 71 (9 ULg)
Full Text
Peer Reviewed
See detailClosed-form dual perturb and combine for tree-based models
Geurts, Pierre ULg; Wehenkel, Louis ULg

in Proceedings of the International Conference on Machine Learning (ICML 2005) (2005)

This paper studies the aggregation of predictions made by tree-based models for several perturbed versions of the attribute vector of a test case. A closed-form approximation of this scheme combined with ... [more ▼]

This paper studies the aggregation of predictions made by tree-based models for several perturbed versions of the attribute vector of a test case. A closed-form approximation of this scheme combined with cross-validation to tune the level of perturbation is proposed. This yields soft-tree models in a parameter free way, and reserves their interpretability. Empirical evaluations, on classification and regression problems, show that accuracy and bias/variance tradeoff are improved significantly at the price of an acceptable computational overhead. The method is further compared and combined with tree bagging. [less ▲]

Detailed reference viewed: 164 (6 ULg)
Full Text
Peer Reviewed
See detailA Machine Learning Approach to Improve Congestion Control over Wireless Computer Networks
Geurts, Pierre ULg; El Khayat, Ibtissam; Leduc, Guy ULg

(2004, November)

In this paper, we present the application of machine learning techniques to the improvement of the congestion control of TCP in wired/wireless networks. TCP is suboptimal in hybrid wired/wireless networks ... [more ▼]

In this paper, we present the application of machine learning techniques to the improvement of the congestion control of TCP in wired/wireless networks. TCP is suboptimal in hybrid wired/wireless networks because it reacts in the same way to losses due to congestion and losses due to link errors. We thus propose to use machine learning techniques to build automatically a loss classifier from a database obtained by simulations of random network topologies. Several machine learning algorithms are compared for this task and the best method for this application turns out to be decision tree boosting. It outperforms ad hoc classifiers proposed in the networking literature. [less ▲]

Detailed reference viewed: 48 (4 ULg)
See detailDiscovery of new rheumatoid arthritis biomarkers using SELDI-TOF-MS ProteinChip approach
de Seny, D. M.; Fillet, Marianne ULg; Meuwis, Marie-Alice ULg et al

in Arthritis and Rheumatism (2004, September), 50(9, Suppl. S), 124

Detailed reference viewed: 34 (12 ULg)
Full Text
Peer Reviewed
See detailA generic approach for image classification based on decision tree ensembles and local sub-windows
Marée, Raphaël ULg; Geurts, Pierre ULg; Piater, Justus ULg et al

in Proceedings of the 6th Asian Conference on Computer Vision (2004)

A novel and generic approach for image classification is presented. The method operates directly on pixel values and does not require feature extraction. It combines a simple local sub-window extraction ... [more ▼]

A novel and generic approach for image classification is presented. The method operates directly on pixel values and does not require feature extraction. It combines a simple local sub-window extraction technique with induction of ensembles of extremely randomized decision trees. We report results on four well known and publicly available datasets corresponding to representative applications of image classification problems: handwritten digits (MNIST), faces (ORL), 3D objects (COIL-100), and textures (OUTEX). A comparison with studies from the computer vision literature shows that our method is competitive with the state of the art, an interesting result considering its generality and conceptual simplicity. Further experiments are carried out on the COIL-100 dataset to evaluate the robustness of the learned models to rotation, scaling, or occlusion of test images. These preliminary results are very encouraging [less ▲]

Detailed reference viewed: 46 (3 ULg)
Full Text
Peer Reviewed
See detailIteratively extending time horizon reinforcement learning
Ernst, Damien ULg; Geurts, Pierre ULg; Wehenkel, Louis ULg

in Machine Learning: ECML 2003, 14th European Conference on Machine Learning (2003)

Reinforcement learning aims to determine an (infinite time horizon) optimal control policy from interaction with a system. It can be solved by approximating the so-called Q-function from a sample of four ... [more ▼]

Reinforcement learning aims to determine an (infinite time horizon) optimal control policy from interaction with a system. It can be solved by approximating the so-called Q-function from a sample of four-tuples (x(t), u(t), r(t), x(t+1)) where x(t) denotes the system state at time t, ut the control action taken, rt the instantaneous reward obtained and x(t+1) the successor state of the system, and by determining the optimal control from the Q-function. Classical reinforcement learning algorithms use an ad hoc version of stochastic approximation which iterates over the Q-function approximations on a four-tuple by four-tuple basis. In this paper, we reformulate this problem as a sequence of batch mode supervised learning problems which in the limit converges to (an approximation of) the Q-function. Each step of this algorithm uses the full sample of four-tuples gathered from interaction with the system and extends by one step the horizon of the optimality criterion. An advantage of this approach is to allow the use of standard batch mode supervised learning algorithms, instead of the incremental versions used up to now. In addition to a theoretical justification the paper provides empirical tests in the context of the "Car on the Hill" control problem based on the use of ensembles of regression trees. The resulting algorithm is in principle able to handle efficiently large scale reinforcement learning problems. [less ▲]

Detailed reference viewed: 54 (6 ULg)
Full Text
Peer Reviewed
See detailAn empirical comparison of machine learning algorithms for generic image classification
Marée, Raphaël ULg; Geurts, Pierre ULg; Visimberga, Giorgio et al

in Proceedings of the 23rd SGAI international conference on innovative techniques and applications of artificial intelligence, Research and development in intelligent systems XX, (2003)

Detailed reference viewed: 42 (3 ULg)
Full Text
Peer Reviewed
See detailUne méthode générique pour la classification automatique d'images à partir des pixels
Marée, Raphaël ULg; Geurts, Pierre ULg; Wehenkel, Louis ULg

in Revue des Nouvelles Technologies de l'Information (2003), 1

Dans cet article, nous évaluons une approche générique de classification automatique d'images. Elle repose sur une méthode d'apprentissage récente qui construit des ensembles d'arbres de décision par ... [more ▼]

Dans cet article, nous évaluons une approche générique de classification automatique d'images. Elle repose sur une méthode d'apprentissage récente qui construit des ensembles d'arbres de décision par sélection aléatoire des tests directement sur les valeurs basiques des pixels. Nous proposons une variante, également générique, qui réalise une augmentation fictive de la taille des échantillons par extraction et classification de sous-fenêtres des images. Ces deux approches sont évaluées et comparées sur quatre bases de données publiques de problèmes courants: la reconnaissance de chiffres manuscrits, de visages, d'objets 3D et de textures. [less ▲]

Detailed reference viewed: 135 (19 ULg)
Full Text
Peer Reviewed
See detailTraitement de données volumineuses par ensemble d'arbres aléatoires
Geurts, Pierre ULg

in Revue des nouvelles technologies de l'information, Numéro spécial entreposage et fouille de données (2003), 1

Cet article présente une nouvelle méthode d'apprentissage ba-sée sur un ensemble d'arbres de décision. Par opposition à la méthode traditionnelle d'induction, les arbres de l'ensemble sont construits en ... [more ▼]

Cet article présente une nouvelle méthode d'apprentissage ba-sée sur un ensemble d'arbres de décision. Par opposition à la méthode traditionnelle d'induction, les arbres de l'ensemble sont construits en choisissant les tests durant le développement de manière complètement aléatoire. Cette méthode est comparée aux arbres de décision et au bagging sur plusieurs problèmes de classification. Grâce aux choix aléatoires des tests, les temps de calcul de cet algorithme sont comparables à ceux des arbres traditionnels. Dans le même temps, la méthode se révèle beaucoup plus précise que les arbres et souvent significativement meilleure que le bagging. Ces caractéristiques rendent cette méthode particulièrement adaptée pour le traitement de bases de données volumineuses. [less ▲]

Detailed reference viewed: 38 (7 ULg)
Full Text
See detailContributions to decision tree induction: bias/variance tradeoff and time series classification
Geurts, Pierre ULg

Doctoral thesis (2002)

Because of the rapid progress of computer and information technology, large amounts of data are nowadays available in a lot of domains. Automatic learning aims at developing algorithms able to produce ... [more ▼]

Because of the rapid progress of computer and information technology, large amounts of data are nowadays available in a lot of domains. Automatic learning aims at developing algorithms able to produce synthetic high-level information, or models, from this data. Learning algorithms are generally evaluated according to three different criteria: interpretability (how well the model helps to understand the data), predictive accuracy (how well the model can predict unseen situations), and computational efficiency (how fast is the algorithm and how it scales to large databases). This thesis explores two issues in automatic learning: the improvement of the well-known decision tree induction method and the problem of learning classification models for time series data. Decision tree induction method is an automatic learning algorithm which focuses on the modeling of input/output relationships. While this algorithm is among the fastest and most interpretable methods, its accuracy is not always competitive with respect to other algorithms. It is commonly admitted that this suboptimality is due to the excessive variance of this method. We first carry out an empirical study which shows quantitatively how important this variance is, i.e. how strongly decision trees depend on the random nature of the database used to infer them. These experiments confirm that this variance is detrimental not only from the point of view of accuracy but also from the point of view of interpretability. With the goal of improving both interpretability and accuracy, we consider three variance reduction techniques for decision trees. First, in the goal of improving mainly interpretability, we propose several methods which try to stabilize the parameters chosen during tree induction. While these methods succeed in reducing the variability of the parameters, they produce only a slight improvement of the accuracy. Then, we consider perturb and combine algorithms (e.g. bagging, boosting) which consist in combining the predictions of several models obtained by randomizing in some way the learning process. Inspired by the high variance of the parameters defining a decision tree, we propose an extremely randomized decision tree induction algorithm, called extra-tree, which chooses all parameters at random during induction. The aggregation of several of these extra-trees gives an important reduction of variance and this algorithm compares favorably in terms of accuracy and computational efficiency with both bagging and boosting. Because of the randomization of the parameters, the resulting method is also competitive with classical decision tree induction in terms of computational efficiency. In addition to these two approaches, we propose a ``dual'' perturb and combine algorithm which delays the perturbation at the prediction stage and hence requires only one model. In combination with decision tree, this method actually bridges the gap between single decision trees and perturb and combine algorithms. Of the first, it saves the interpretability (by using only one model), and with perturb and combine algorithm, it shares some of the accuracy (by reducing the variance). The second topic of the thesis is the problem of time series classification. The most direct way to solve this problem is to apply existing learning algorithms on low-level variables which correspond to the values of a time series at several time points. Experiments with the tree-based algorithms studied in the first part of the thesis shows that this approach is limited. A variance reduction techniques is then proposed specifically for this kind of data which consists in aggregating the prediction given by a classification model for subsequences of time series. Since this method does not provide interpretable models, we propose a second method which extends decision tree tests by allowing them to detect local shift invariant properties, or patterns, in time series. The study proposed in this part of the thesis is only a first step in the domain but our conclusions give some future work directions for handling complex type of data with automatic learning methods. [less ▲]

Detailed reference viewed: 384 (27 ULg)
Full Text
Peer Reviewed
See detailImproving the bias/variance tradeoff of decision trees - towards soft tree induction
Geurts, Pierre ULg; Olaru, Cristina; Wehenkel, Louis ULg

in Engineering intelligent systems (2001), 9

One of the main difficulties with standard top down induction of decision trees comes from the high variance of these methods. High variance means that, for a given problem and sample size, the resulting ... [more ▼]

One of the main difficulties with standard top down induction of decision trees comes from the high variance of these methods. High variance means that, for a given problem and sample size, the resulting tree is strongly dependent on the random nature of the particular sample used for training. Consequently, these algorithms tend to be suboptimal in terms of accuracy and interpretability. This paper analyses this problem in depth and proposes a new method, relying on threshold softening, able to significantly improve the bias/variance tradeoff of decision trees. The algorithm is validated on a number of benchmark problems and its relationship with fuzzy decision tree induction is discussed. This sheds some light on the success of fuzzy decision tree induction and improves our understanding of machine learning, in general. [less ▲]

Detailed reference viewed: 63 (2 ULg)