References of "Geurts, Pierre"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailA zealous parallel gradient descent algorithm
Louppe, Gilles ULg; Geurts, Pierre ULg

Poster (2010, December 11)

Parallel and distributed algorithms have become a necessity in modern machine learning tasks. In this work, we focus on parallel asynchronous gradient descent and propose a zealous variant that minimizes ... [more ▼]

Parallel and distributed algorithms have become a necessity in modern machine learning tasks. In this work, we focus on parallel asynchronous gradient descent and propose a zealous variant that minimizes the idle time of processors to achieve a substantial speedup. We then experimentally study this algorithm in the context of training a restricted Boltzmann machine on a large collaborative filtering task. [less ▲]

Detailed reference viewed: 245 (47 ULg)
Full Text
Peer Reviewed
See detailInferring Regulatory Networks from Expression Data Using Tree-Based Methods
Huynh-Thu, Vân Anh ULg; Irrthum, Alexandre ULg; Wehenkel, Louis ULg et al

in PLoS ONE (2010), 5(9), 12776

One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray ... [more ▼]

One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions. [less ▲]

Detailed reference viewed: 364 (40 ULg)
Full Text
Peer Reviewed
See detailNetwork Distance Prediction Based on Decentralized Matrix Factorization
Liao, Yongjun ULg; Geurts, Pierre ULg; Leduc, Guy ULg

in Lecture Notes in Computer Science (2010, May 11), 6091

Network Coordinate Systems (NCS) are promising techniques to predict unknown network distances from a limited number of measurements. Most NCS algorithms are based on metric space embedding and suffer ... [more ▼]

Network Coordinate Systems (NCS) are promising techniques to predict unknown network distances from a limited number of measurements. Most NCS algorithms are based on metric space embedding and suffer from the inability to represent distance asymmetries and Triangle Inequality Violations (TIVs). To overcome these drawbacks, we formulate the problem of network distance prediction as guessing the missing elements of a distance matrix and solve it by matrix factorization. A distinct feature of our approach, called Decentralized Matrix Factorization (DMF), is that it is fully decentralized. The factorization of the incomplete distance matrix is collaboratively and iteratively done at all nodes with each node retrieving only a small number of distance measurements. There are no special nodes such as landmarks nor a central node where the distance measurements are collected and stored. We compare DMF with two popular NCS algorithms: Vivaldi and IDES. The former is based on metric space embedding, while the latter is also based on matrix factorization but uses landmarks. Experimental results show thatDMF achieves competitive accuracy with the double advantage of having no landmarks and of being able to represent distance asymmetries and TIVs. [less ▲]

Detailed reference viewed: 126 (16 ULg)
Full Text
Peer Reviewed
See detailIncremental Indexing and Distributed Image Search using Shared Randomized Vocabularies
Marée, Raphaël ULg; Denis, Philippe; Wehenkel, Louis ULg et al

in ACM Proceedings MIR 2010 (2010, March)

We present a cooperative framework for content-based image retrieval for the realistic setting where images are distributed across multiple cooperating servers. The proposed method is in line with bag-of ... [more ▼]

We present a cooperative framework for content-based image retrieval for the realistic setting where images are distributed across multiple cooperating servers. The proposed method is in line with bag-of-features approaches but uses fully data-independent, randomized structures, shared by the cooperating servers, to map image features to common visual words. A coherent, global image similarity measure (which is a kernel) is computed in a distributed fashion over visual words, by only requiring a small amount of data transfers between nodes. Our experiments on various image types show that this framework is a very promising step towards large-scale, distributed content-based image retrieval. [less ▲]

Detailed reference viewed: 88 (13 ULg)
Full Text
Peer Reviewed
See detailEnhancement of TCP over wired/wireless networks with packet loss classifiers inferred by supervised learning
El Khayat, Ibtissam; Geurts, Pierre ULg; Leduc, Guy ULg

in Wireless Networks (2010), 16(2), 273-290

TCP is suboptimal in heterogeneous wired/wireless networks because it reacts in the same way to losses due to congestion and losses due to link errors. In this paper, we propose to improve TCP performance ... [more ▼]

TCP is suboptimal in heterogeneous wired/wireless networks because it reacts in the same way to losses due to congestion and losses due to link errors. In this paper, we propose to improve TCP performance in wired/wireless networks by endowing it with a classifier that can distinguish packet loss causes. In contrast to other proposals we do not change TCP’s congestion control nor TCP’s error recovery. A packet loss whose cause is classified as link error will simply be ignored by TCP’s congestion control and recovered as usual, while a packet loss classified as congestion loss will trigger both mechanisms as usual. To build our classification algorithm, a database of pre-classified losses is gathered by simulating a large set of random network conditions, and classification models are automatically built from this database by using supervised learning methods. Several learning algorithms are compared for this task. Our simulations of different scenarios show that adding such a classifier to TCP can improve the throughput of TCP substantially in wired/wireless networks without compromizing TCP-friendliness in both wired and wireless environments. [less ▲]

Detailed reference viewed: 99 (13 ULg)
Full Text
Peer Reviewed
See detailA screening methodology based on Random Forests to improve the detection of gene-gene interactions
De Lobel, L.; Geurts, Pierre ULg; Baele, G. et al

in European Journal of Human Genetics (2010), 18(1127), 1132

The search for susceptibility loci in gene-gene interactions imposes a methodological and computational challenge for statisticians because of the large dimensionality inherent to the modelling of gene ... [more ▼]

The search for susceptibility loci in gene-gene interactions imposes a methodological and computational challenge for statisticians because of the large dimensionality inherent to the modelling of gene-gene interactions or epistasis. In an era in which genome-wide scans have become relatively common, new powerful methods are required to handle the huge amount of feasible gene-gene interactions and to weed out false positives and negatives from these results. One solution to the dimensionality problem is to reduce data by preliminary screening of markers to select the best candidates for further analysis. Ideally, this screening step is statistically independent of the testing phase. Initially developed for small numbers of markers, the Multifactor Dimensionality Reduction (MDR) method is a nonparametric, model-free data reduction technique to associate sets of markers with optimal predictive properties to disease. In this study, we examine the power of MDR in larger data sets and compare it with other approaches that are able to identify gene-gene interactions. Under various interaction models (purely and not purely epistatic), we use a Random Forest (RF)-based prescreening method, before executing MDR, to improve its performance. We find that the power of MDR increases when noisy SNPs are first removed, by creating a collection of candidate markers with RFs. We validate our technique by extensive simulation studies and by application to asthma data from the European Committee of Respiratory Health Study II.European Journal of Human Genetics advance online publication, 12 May 2010; doi:10.1038/ejhg.2010.48. [less ▲]

Detailed reference viewed: 50 (12 ULg)
Full Text
See detailBiomedical Imaging Modality Classification Using Bags of Visual and Textual Terms with Extremely Randomized Trees: Report of ImageCLEF 2010 Experiments
Marée, Raphaël ULg; Stern, Olivier ULg; Geurts, Pierre ULg

in CLEF Notebook Papers/LABs/Workshops (2010)

In this paper we describe our experiments related to the ImageCLEF 2010 medical modality classification task using extremely randomized trees. Our best run combines bags of textual and visual features. It ... [more ▼]

In this paper we describe our experiments related to the ImageCLEF 2010 medical modality classification task using extremely randomized trees. Our best run combines bags of textual and visual features. It yields 90% recognition rate and ranks 6th among 45 runs (ranging from 94% downto 12%). [less ▲]

Detailed reference viewed: 44 (8 ULg)
Full Text
Peer Reviewed
See detailTree based ensemble models regularization by convex optimization
Cornélusse, Bertrand ULg; Geurts, Pierre ULg; Wehenkel, Louis ULg

Conference (2009, December 12)

Tree based ensemble methods can be seen as a way to learn a kernel from a sample of input-output pairs. This paper proposes a regularization framework to incorporate non-standard information not used in ... [more ▼]

Tree based ensemble methods can be seen as a way to learn a kernel from a sample of input-output pairs. This paper proposes a regularization framework to incorporate non-standard information not used in the kernel learning algorithm, so as to take advantage of incomplete information about output values and/or of some prior information about the problem at hand. To this end a generic convex optimization problem is formulated which is first customized into a manifold regularization approach for semi-supervised learning, then as a way to exploit censored output values, and finally as a generic way to exploit prior information about the problem. [less ▲]

Detailed reference viewed: 119 (36 ULg)
Full Text
Peer Reviewed
See detailSupervised learning with decision tree-based methods in computational and systems biology
Geurts, Pierre ULg; Irrthum, Alexandre ULg; Wehenkel, Louis ULg

in Molecular Biosystems (2009), 5(12), 1593-1605

At the intersection between artificial intelligence and statistics, supervised learning provides algorithms to automatically build predictive models only from observations of a system. During the last ... [more ▼]

At the intersection between artificial intelligence and statistics, supervised learning provides algorithms to automatically build predictive models only from observations of a system. During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications in genome annotation, function prediction, or biomarker discovery. Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of trees, excellent accuracy. The goal of this paper is to provide an accessible and comprehensive introduction to this class of methods. The first part of the paper is devoted to an intuitive but complete description of decision tree-based methods and a discussion of their strengths and limitations with respect to other supervised learning methods. The second part of the paper provides a survey of their applications in the context of computational and systems biology. The supplementary material provides information about various non-standard extensions of the decision tree-based approach to modeling, some practical guidelines for the choice of parameters and algorithm variants depending on the practical ob jectives of their application, pointers to freely accessible software packages, and a brief primer going through the different manipulations needed to use the tree-induction packages available in the R statistical tool. [less ▲]

Detailed reference viewed: 148 (27 ULg)
Full Text
Peer Reviewed
See detailDetecting Triangle Inequality Violations in Internet Coordinate Systems by Supervised Learning
Liao, Yongjun ULg; Kaafar, Mohamed Ali; Gueye, Bamba et al

in Lecture Notes in Computer Science (2009, May 12), 5550

Internet Coordinates Systems (ICS) are used to predict Internet distances with limited measurements. However the precision of an ICS is degraded by the presence of Triangle Inequality Violations (TIVs ... [more ▼]

Internet Coordinates Systems (ICS) are used to predict Internet distances with limited measurements. However the precision of an ICS is degraded by the presence of Triangle Inequality Violations (TIVs). Simple methods have been proposed to detect TIVs, based e.g. on the empirical observation that a TIV is more likely when the distance is underestimated by the coordinates. In this paper, we apply supervised machine learning techniques to try and derive more powerful criteria to detect TIVs. We first show that (ensembles of) Decision Trees (DTs) learnt on our datasets are very good models for this problem. Moreover, our approach brings out a discriminative variable (called OREE), which combines the classical estimation error with the variance of the estimated distance. This variable alone is as good as an ensemble of DTs, and provides a much simpler criterion. If every node of the ICS sorts its neighbours according to OREE, we show that cutting these lists after a given number of neighbours, or when OREE crosses a given threshold value, achieves very good performance to detect TIVs. [less ▲]

Detailed reference viewed: 124 (31 ULg)
Full Text
Peer Reviewed
See detailFast Multi-Class Image Annotation with Random Subwindows and Multiple Output Randomized Trees
Dumont, Marie; Marée, Raphaël ULg; Wehenkel, Louis ULg et al

in Proc. International Conference on Computer Vision Theory and Applications (VISAPP) (2009, February)

This paper addresses image annotation, i.e. labelling pixels of an image with a class among a finite set of predefined classes. We propose a new method which extracts a sample of subwindows from a set of ... [more ▼]

This paper addresses image annotation, i.e. labelling pixels of an image with a class among a finite set of predefined classes. We propose a new method which extracts a sample of subwindows from a set of annotated images in order to train a subwindow annotation model by using the extremely randomized trees ensemble method appropriately extended to handle high-dimensional output spaces. The annotation of a pixel of an unseen image is done by aggregating the annotations of its subwindows containing this pixel. The proposed method is compared to a more basic approach predicting the class of a pixel from a single window centered on that pixel and to other state-of-the-art image annotation methods. In terms of accuracy, the proposed method significantly outperforms the basic method and shows good performances with respect to the state-of-the-art, while being more generic, conceptually simpler, and of higher computational efficiency than these latter. [less ▲]

Detailed reference viewed: 112 (19 ULg)
Full Text
Peer Reviewed
See detailContent-based Image Retrieval by Indexing Random Subwindows with Randomized Trees
Marée, Raphaël ULg; Geurts, Pierre ULg; Wehenkel, Louis ULg

in IPSJ Transactions on Computer Vision and Applications (2009), 1

We propose a new method for content-based image retrieval which exploits the similarity measure and indexing structure of totally randomized tree ensembles induced from a set of subwindows randomly ... [more ▼]

We propose a new method for content-based image retrieval which exploits the similarity measure and indexing structure of totally randomized tree ensembles induced from a set of subwindows randomly extracted from a sample of images. We also present the possibility of updating the model as new images come in, and the capability of comparing new images using a model previously constructed from a different set of images. The approach is quantitatively evaluated on various types of images and achieves high recognition rates despite its conceptual simplicity and computational efficiency. [less ▲]

Detailed reference viewed: 134 (25 ULg)
Full Text
Peer Reviewed
See detailProtéomique par SELDI-TOF-MS des maladies inflammatoires articulaires: identification des protéines S100 comme protéines d'intérêt
De Seny, Dominique ULg; Ribbens, Clio ULg; Cobraiville, Gaël ULg et al

in Revue Médicale de Liège (2009), 64(Spec No), 29-35

Clinical proteomics is a technical approach studying the entire proteome expressed by cells, tissues or organs. It describes the dynamics of cell regulation by detecting molecular events related to ... [more ▼]

Clinical proteomics is a technical approach studying the entire proteome expressed by cells, tissues or organs. It describes the dynamics of cell regulation by detecting molecular events related to diseases development. Proteomic techniques focus mainly on identification of new biomarkers or new therapeutic targets. It is a multidisciplinary approach using medical, biological, bioanalytical and bioinformatics knowledges. A strong collaboration between these fields allowed SELDI-TOF-MS proteomics studies to be performed at the CHU and the University of Liege, in GIGA-Research facilities. The aim of these studies was driven along three main axes of research related to the identification of biomarkers specific to a studied pathology, to a common biological pathway and, finally, to a treatment response. [less ▲]

Detailed reference viewed: 79 (8 ULg)