References of "Geurts, Pierre"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailTree based ensemble models regularization by convex optimization
Cornélusse, Bertrand ULg; Geurts, Pierre ULg; Wehenkel, Louis ULg

Conference (2009, December 12)

Tree based ensemble methods can be seen as a way to learn a kernel from a sample of input-output pairs. This paper proposes a regularization framework to incorporate non-standard information not used in ... [more ▼]

Tree based ensemble methods can be seen as a way to learn a kernel from a sample of input-output pairs. This paper proposes a regularization framework to incorporate non-standard information not used in the kernel learning algorithm, so as to take advantage of incomplete information about output values and/or of some prior information about the problem at hand. To this end a generic convex optimization problem is formulated which is first customized into a manifold regularization approach for semi-supervised learning, then as a way to exploit censored output values, and finally as a generic way to exploit prior information about the problem. [less ▲]

Detailed reference viewed: 137 (44 ULg)
Full Text
Peer Reviewed
See detailSupervised learning with decision tree-based methods in computational and systems biology
Geurts, Pierre ULg; Irrthum, Alexandre ULg; Wehenkel, Louis ULg

in Molecular Biosystems (2009), 5(12), 1593-1605

At the intersection between artificial intelligence and statistics, supervised learning provides algorithms to automatically build predictive models only from observations of a system. During the last ... [more ▼]

At the intersection between artificial intelligence and statistics, supervised learning provides algorithms to automatically build predictive models only from observations of a system. During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications in genome annotation, function prediction, or biomarker discovery. Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of trees, excellent accuracy. The goal of this paper is to provide an accessible and comprehensive introduction to this class of methods. The first part of the paper is devoted to an intuitive but complete description of decision tree-based methods and a discussion of their strengths and limitations with respect to other supervised learning methods. The second part of the paper provides a survey of their applications in the context of computational and systems biology. The supplementary material provides information about various non-standard extensions of the decision tree-based approach to modeling, some practical guidelines for the choice of parameters and algorithm variants depending on the practical ob jectives of their application, pointers to freely accessible software packages, and a brief primer going through the different manipulations needed to use the tree-induction packages available in the R statistical tool. [less ▲]

Detailed reference viewed: 161 (30 ULg)
Full Text
Peer Reviewed
See detailDetecting Triangle Inequality Violations in Internet Coordinate Systems by Supervised Learning
Liao, Yongjun ULg; Kaafar, Mohamed Ali; Gueye, Bamba et al

in Lecture Notes in Computer Science (2009, May 12), 5550

Internet Coordinates Systems (ICS) are used to predict Internet distances with limited measurements. However the precision of an ICS is degraded by the presence of Triangle Inequality Violations (TIVs ... [more ▼]

Internet Coordinates Systems (ICS) are used to predict Internet distances with limited measurements. However the precision of an ICS is degraded by the presence of Triangle Inequality Violations (TIVs). Simple methods have been proposed to detect TIVs, based e.g. on the empirical observation that a TIV is more likely when the distance is underestimated by the coordinates. In this paper, we apply supervised machine learning techniques to try and derive more powerful criteria to detect TIVs. We first show that (ensembles of) Decision Trees (DTs) learnt on our datasets are very good models for this problem. Moreover, our approach brings out a discriminative variable (called OREE), which combines the classical estimation error with the variance of the estimated distance. This variable alone is as good as an ensemble of DTs, and provides a much simpler criterion. If every node of the ICS sorts its neighbours according to OREE, we show that cutting these lists after a given number of neighbours, or when OREE crosses a given threshold value, achieves very good performance to detect TIVs. [less ▲]

Detailed reference viewed: 125 (31 ULg)
Full Text
Peer Reviewed
See detailFast Multi-Class Image Annotation with Random Subwindows and Multiple Output Randomized Trees
Dumont, Marie; Marée, Raphaël ULg; Wehenkel, Louis ULg et al

in Proc. International Conference on Computer Vision Theory and Applications (VISAPP) (2009, February)

This paper addresses image annotation, i.e. labelling pixels of an image with a class among a finite set of predefined classes. We propose a new method which extracts a sample of subwindows from a set of ... [more ▼]

This paper addresses image annotation, i.e. labelling pixels of an image with a class among a finite set of predefined classes. We propose a new method which extracts a sample of subwindows from a set of annotated images in order to train a subwindow annotation model by using the extremely randomized trees ensemble method appropriately extended to handle high-dimensional output spaces. The annotation of a pixel of an unseen image is done by aggregating the annotations of its subwindows containing this pixel. The proposed method is compared to a more basic approach predicting the class of a pixel from a single window centered on that pixel and to other state-of-the-art image annotation methods. In terms of accuracy, the proposed method significantly outperforms the basic method and shows good performances with respect to the state-of-the-art, while being more generic, conceptually simpler, and of higher computational efficiency than these latter. [less ▲]

Detailed reference viewed: 153 (20 ULg)
Full Text
Peer Reviewed
See detailContent-based Image Retrieval by Indexing Random Subwindows with Randomized Trees
Marée, Raphaël ULg; Geurts, Pierre ULg; Wehenkel, Louis ULg

in IPSJ Transactions on Computer Vision and Applications (2009), 1

We propose a new method for content-based image retrieval which exploits the similarity measure and indexing structure of totally randomized tree ensembles induced from a set of subwindows randomly ... [more ▼]

We propose a new method for content-based image retrieval which exploits the similarity measure and indexing structure of totally randomized tree ensembles induced from a set of subwindows randomly extracted from a sample of images. We also present the possibility of updating the model as new images come in, and the capability of comparing new images using a model previously constructed from a different set of images. The approach is quantitatively evaluated on various types of images and achieves high recognition rates despite its conceptual simplicity and computational efficiency. [less ▲]

Detailed reference viewed: 148 (25 ULg)
Full Text
Peer Reviewed
See detailProtéomique par SELDI-TOF-MS des maladies inflammatoires articulaires: identification des protéines S100 comme protéines d'intérêt
De Seny, Dominique ULg; Ribbens, Clio ULg; Cobraiville, Gaël ULg et al

in Revue Médicale de Liège (2009), 64(Spec No), 29-35

Clinical proteomics is a technical approach studying the entire proteome expressed by cells, tissues or organs. It describes the dynamics of cell regulation by detecting molecular events related to ... [more ▼]

Clinical proteomics is a technical approach studying the entire proteome expressed by cells, tissues or organs. It describes the dynamics of cell regulation by detecting molecular events related to diseases development. Proteomic techniques focus mainly on identification of new biomarkers or new therapeutic targets. It is a multidisciplinary approach using medical, biological, bioanalytical and bioinformatics knowledges. A strong collaboration between these fields allowed SELDI-TOF-MS proteomics studies to be performed at the CHU and the University of Liege, in GIGA-Research facilities. The aim of these studies was driven along three main axes of research related to the identification of biomarkers specific to a studied pathology, to a common biological pathway and, finally, to a treatment response. [less ▲]

Detailed reference viewed: 81 (8 ULg)
Full Text
Peer Reviewed
See detailA Machine Learning Approach for Material Detection in Hyperspectral Images
Marée, Raphaël ULg; Stevens, Benjamin ULg; Geurts, Pierre ULg et al

in Proc. 6th IEEE Workshop on Object Tracking and Classification Beyond and in the Visible Spectrum (OTCBVS-CVPR09) (2009)

In this paper we propose a machine learning approach for the detection of gaseous traces in thermal infra red hyperspectral images. It exploits both spectral and spatial information by extracting subcubes ... [more ▼]

In this paper we propose a machine learning approach for the detection of gaseous traces in thermal infra red hyperspectral images. It exploits both spectral and spatial information by extracting subcubes and by using extremely randomized trees with multiple outputs as a classifier. Promising results are shown on a dataset of more than 60 hypercubes. [less ▲]

Detailed reference viewed: 60 (15 ULg)
Full Text
Peer Reviewed
See detailRaw genotypes vs haplotype blocks for genome wide association studies by random forests
Botta, Vincent ULg; Hansoul, Sarah ULg; Geurts, Pierre ULg et al

in Proc. of MLSB 2008, second workshop on Machine Learning in Systems Biology (2008, September)

We consider two different representations of the input data for genome-wide association studies using random forests, namely raw genotypes described by a few thousand to a few hundred thousand discrete ... [more ▼]

We consider two different representations of the input data for genome-wide association studies using random forests, namely raw genotypes described by a few thousand to a few hundred thousand discrete variables each one describing a single nucleotide polymorphism, and haplotype block contents, represented by the combinations of about 10 to 100 adjacent and correlated genotypes. We adapt random forests to exploit haplotype blocks, and compare this with the use of raw genotypes, in terms of predictive power and localization of causal mutations, by using simulated datasets with one or two interacting effects. [less ▲]

Detailed reference viewed: 124 (35 ULg)
Full Text
See detailPrediction of genetic risk of complex diseases by supervised learning
Botta, Vincent ULg; Geurts, Pierre ULg; Hansoul, Sarah et al

Scientific conference (2008, May)

Detailed reference viewed: 9 (2 ULg)
Full Text
Peer Reviewed
See detailProteomics for prediction and characterization of response to infliximab in Crohn's disease: a pilot study.
Meuwis, Marie-Alice ULg; Fillet, Marianne ULg; Lutteri, Laurence ULg et al

in Clinical Biochemistry (2008), 41(12), 960-7

OBJECTIVES: Infliximab is the first anti-TNFalpha accepted by the Food and Drug Administration for use in inflammatory bowel disease treatment. Few clinical, biological and genetic factors tend to predict ... [more ▼]

OBJECTIVES: Infliximab is the first anti-TNFalpha accepted by the Food and Drug Administration for use in inflammatory bowel disease treatment. Few clinical, biological and genetic factors tend to predict response in Crohn's disease (CD) patient subcategories, none widely predicting response to infliximab. DESIGN AND METHODS: Twenty CD patients showing clinical response or non response to infliximab were used for serum proteomic profiling on Surface Enhanced Lazer Desorption Ionisation-Time of Flight-Mass Spectrometry (SELDI-TOF-MS), each before and after treatment. Univariate and multivariate data analysis were performed for prediction and characterization of response to infliximab. RESULTS: We obtained a model of classification predicting response to treatment and selected relevant potential biomarkers, among which platelet aggregation factor 4 (PF4). We quantified PF4, sCD40L and IL-6 by ELISA for correlation studies. CONCLUSIONS: This first proteomic pilot study on response to infliximab in CD suggests association between platelet metabolism and response to infliximab and requires validation studies on a larger cohort of patients. [less ▲]

Detailed reference viewed: 125 (26 ULg)
Full Text
Peer Reviewed
See detailExploiting tree-based variable importances to selectively identify relevant variables
Huynh-Thu, Vân Anh ULg; Wehenkel, Louis ULg; Geurts, Pierre ULg

in JMLR: Workshop and Conference Proceedings (2008), 4

Detailed reference viewed: 102 (38 ULg)
Full Text
Peer Reviewed
See detailExploiting tree-based variable importances to selectively identify relevant variables
Huynh-Thu, Vân Anh; Wehenkel, Louis ULg; Geurts, Pierre ULg

in Proc. of FSDM08, ECML/PKDD Workshop on New challenges for feature selection in data mining and knowledge discovery (2008)

Detailed reference viewed: 72 (4 ULg)
Peer Reviewed
See detailCompositional protein analysis of HDL by SELDI-TOF MS during experimental endotoxemia
Levels, Johannes HM; Marée, Raphaël ULg; Geurts, Pierre ULg et al

Poster (2008)

Detailed reference viewed: 19 (0 ULg)
Full Text
Peer Reviewed
See detailEstimation of rotor angles of synchronous machines using artificial neural networks and local PMU-based quantities
Del Angel, A.; Geurts, Pierre ULg; Ernst, Damien ULg et al

in Neurocomputing (2007), 70(16-18), 2668-2678

This paper investigates a possibility for estimating rotor angles in the time frame of transient (angle) stability of electric power systems, for use in real-time. The proposed dynamic state estimation ... [more ▼]

This paper investigates a possibility for estimating rotor angles in the time frame of transient (angle) stability of electric power systems, for use in real-time. The proposed dynamic state estimation technique is based on the use of voltage and current phasors obtained from a phasor measurement unit supposed to be installed on the extra-high voltage side of the substation of a power plant, together with a multilayer perceptron trained off-line from simulations. We demonstrate that an intuitive approach to directly map phasor measurement inputs to the neural network to generator rotor angle does not offer satisfactory results. We found out that a good way to approach the angle estimation problem is to use two neural networks in order to estimate the sin(delta) and cos(delta) of the angle and recover the latter from these values by simple post-processing. Simulation results on a part of the Mexican interconnected system show that the approach could yield satisfactory accuracy for realtime monitoring and control of transient instability. (c) 2007 Elsevier B.V. All rights reserved. [less ▲]

Detailed reference viewed: 124 (8 ULg)
Full Text
See detailDetection of micro-RNA/gene interactions involved in angiogenesis using machine learning techniques
Huynh-Thu, Vân Anh ULg; Hiard, Samuel ULg; Geurts, Pierre ULg et al

Poster (2007, September)

Motivation: Angiogenesis is the process responsible for the growth of new blood vessels from existing ones. It is also associated with the development of cancer, as tumors need to be irrigated by blood ... [more ▼]

Motivation: Angiogenesis is the process responsible for the growth of new blood vessels from existing ones. It is also associated with the development of cancer, as tumors need to be irrigated by blood vessels for growing. New cancer therapies appear that exploit angiogenesis inhibitors, also called angiostatic agents, to asphyxiate and starve the tumors. Better understanding the regulatory mechanisms that control angiogenesis is thus fundamental. Recently, short non-coding RNA molecules, called micro-RNAs, have been discovered that are involved in post- transcriptional regulation of gene expressions. These molecules bind to RNA messengers following the base pairing rules, preventing them from being translated into proteins and/or tagging them for degradation. The main goal of this work is to use computational approaches to identify micro-RNAs involved in angiogenesis. Method: In order to identify genes involved in angiogenesis, bovine endothelial cells were treated by a known angiogenesis inhibitor [1], prolactin 16K, and their gene expression profile was compared to the profile of untreated cells. The genes were then divided into three classes: up-regulated, down-regulated, and unaffected genes. The 3'UTR regions of these genes were then analysed by machine learning techniques. Different approaches were considered. First, we described each gene by a vector of motif counts in their 3'UTR regions and used machine learning techniques to rank the motifs according to their relevance for separating the genes into the different classes. We considered successively motifs corresponding to the seeds of known micro- RNAs and also all possible motifs of a given length. To rank the motifs, we compared ensemble of decision trees and linear support vector machines. Second, we considered an approach called Segment and Combine that was proposed in [2]. Finally, we also carried out an exhaustive search of all motifs of a given length that satisfy some constraints on specificity and coverage with respect to a given gene category. Results: The ability of the different approaches at identifying relevant motifs was first assessed on genes predicted to be the target of some known miRNAs. In this simple setting, most methods were able to identify the micro-RNA seed. The results obtained on the genes regulated by prolactin 16K are also very encouraging. We were able to identify one micro-RNA already known to play a role in angiogenesis and several motifs are predicted by different approaches as very specific of up- or down-regulation by prolactin 16K. Their relationship with known micro-RNAs is certainly worth exploring. Conclusion: Machine learning approaches are promising techniques for the identification of micro-RNA/gene interactions. Future work will concern the application of the same kind of techniques on promoters for the identification of transcription factor binding sites. [less ▲]

Detailed reference viewed: 101 (17 ULg)
Full Text
Peer Reviewed
See detailRandom Subwindows and Randomized Trees for Image Retrieval, Classification, and Annotation
Marée, Raphaël ULg; Dumont, Marie; Geurts, Pierre ULg et al

Poster (2007, July 22)

Detailed reference viewed: 43 (5 ULg)