References of "Geurts, Pierre"
     in
Bookmark and Share    
Full Text
See detailPruning randomized trees with L1-norm regularization
Joly, Arnaud ULg; Schnitzler, François ULg; Geurts, Pierre ULg et al

Poster (2011, November 29)

Growing amount of high dimensional data requires robust analysis techniques. Tree-based ensemble methods provide such accurate supervised learning models. However, the model complexity can become utterly ... [more ▼]

Growing amount of high dimensional data requires robust analysis techniques. Tree-based ensemble methods provide such accurate supervised learning models. However, the model complexity can become utterly huge depending on the dimension of the dataset. Here we propose a method to compress such ensemble using random tree induced space and L1-norm regularisation. This leads to a drastic pruning, preserving or improving the model accuracy. Moreover, our approach increases robustness with respect to the selection of complexity parameters. [less ▲]

Detailed reference viewed: 52 (16 ULg)
Full Text
Peer Reviewed
See detailPhenotype Classification of Zebrafish Embryos by Supervised Learning
Jeanray, Nathalie ULg; Marée, Raphaël ULg; Pruvot, Benoist ULg et al

Conference (2011, September 02)

Detailed reference viewed: 29 (12 ULg)
Full Text
Peer Reviewed
See detailEfficiently approximating Markov tree bagging for high-dimensional density estimation
Schnitzler, François ULg; ammar, sourour; leray, philippe et al

in Gunopulos, Dimitrios; Hofmann, Thomas; Malerba, Donato (Eds.) et al Machine Learning and Knowledge Discovery in Databases, Part III (2011, September)

We consider algorithms for generating Mixtures of Bagged Markov Trees, for density estimation. In problems defined over many variables and when few observations are available, those mixtures generally ... [more ▼]

We consider algorithms for generating Mixtures of Bagged Markov Trees, for density estimation. In problems defined over many variables and when few observations are available, those mixtures generally outperform a single Markov tree maximizing the data likelihood, but are far more expensive to compute. In this paper, we describe new algorithms for approximating such models, with the aim of speeding up learning without sacrificing accuracy. More specifically, we propose to use a filtering step obtained as a by-product from computing a first Markov tree, so as to avoid considering poor candidate edges in the subsequently generated trees. We compare these algorithms (on synthetic data sets) to Mixtures of Bagged Markov Trees, as well as to a single Markov tree derived by the classical Chow-Liu algorithm and to a recently proposed randomized scheme used for building tree mixtures. [less ▲]

Detailed reference viewed: 69 (23 ULg)
Full Text
Peer Reviewed
See detailHigh-density lipoprotein proteome dynamics in human endotoxemia.
Levels, Johannes Hm; Geurts, Pierre ULg; Karlsson, Helen et al

in Proteome science (2011), 9(1), 34

BACKGROUND: A large variety of proteins involved in inflammation, coagulation, lipid-oxidation and lipid metabolism have been associated with high-density lipoprotein (HDL) and it is anticipated that ... [more ▼]

BACKGROUND: A large variety of proteins involved in inflammation, coagulation, lipid-oxidation and lipid metabolism have been associated with high-density lipoprotein (HDL) and it is anticipated that changes in the HDL proteome have implications for the multiple functions of HDL. Here, SELDI-TOF mass spectrometry (MS) was used to study the dynamic changes of HDL protein composition in a human experimental low-dose endotoxemia model. Ten healthy men with low HDL cholesterol (0.7+/-0.1 mmol/L) and 10 men with high HDL cholesterol levels (1.9+/-0.4 mmol/L) were challenged with endotoxin (LPS) intravenously (1 ng/kg bodyweight). We previously showed that subjects with low HDL cholesterol are more susceptible to an inflammatory challenge. The current study tested the hypothesis that this discrepancy may be related to differences in the HDL proteome. RESULTS: Plasma drawn at 7 time-points over a 24 hour time period after LPS challenge was used for direct capture of HDL using antibodies against apolipoprotein A-I followed by subsequent SELDI-TOF MS profiling. Upon LPS administration, profound changes in 21 markers (adjusted p-value < 0.05) were observed in the proteome in both study groups. These changes were observed 1 hour after LPS infusion and sustained up to 24 hours, but unexpectedly were not different between the 2 study groups. Hierarchical clustering of the protein spectra at all time points of all individuals revealed 3 distinct clusters, which were largely independent of baseline HDL cholesterol levels but correlated with paraoxonase 1 activity. The acute phase protein serum amyloid A-1/2 (SAA-1/2) was clearly upregulated after LPS infusion in both groups and comprised both native and N-terminal truncated variants that were identified by two-dimensional gel electrophoresis and mass spectrometry. Individuals of one of the clusters were distinguished by a lower SAA-1/2 response after LPS challenge and a delayed time-response of the truncated variants. CONCLUSIONS: This study shows that the semi-quantitative differences in the HDL proteome as assessed by SELDI-TOF MS cannot explain why subjects with low HDL cholesterol are more susceptible to a challenge with LPS than those with high HDL cholesterol. Instead the results indicate that hierarchical clustering could be useful to predict HDL functionality in acute phase responses towards LPS. [less ▲]

Detailed reference viewed: 36 (7 ULg)
Full Text
Peer Reviewed
See detailZebrafish Skeleton Measurements using Image Analysis and Machine Learning Methods
Stern, Olivier ULg; Marée, Raphaël ULg; Aceto, Jessica ULg et al

Poster (2011, May 20)

The zebrafish is a model organism for biological studies on development and gene function. Our work aims at automating the detection of the cartilage skeleton and measuring several distances and angles to ... [more ▼]

The zebrafish is a model organism for biological studies on development and gene function. Our work aims at automating the detection of the cartilage skeleton and measuring several distances and angles to quantify its development following different experimental conditions. [less ▲]

Detailed reference viewed: 34 (13 ULg)
Full Text
Peer Reviewed
See detailLearning from positive and unlabeled examples by enforcing statistical significance
Geurts, Pierre ULg

in JMLR: Workshop and Conference Proceedings (2011, April), 15

Given a finite but large set of objects de- scribed by a vector of features, only a small subset of which have been labeled as ‘positive’ with respect to a class of interest, we consider the problem of ... [more ▼]

Given a finite but large set of objects de- scribed by a vector of features, only a small subset of which have been labeled as ‘positive’ with respect to a class of interest, we consider the problem of characterizing the positive class. We formalize this as the problem of learning a feature based score function that minimizes the p-value of a non parametric statistical hypothesis test. For lin- ear score functions over the original feature space or over one of its kernelized versions, we provide a solution of this problem computed by a one-class SVM applied on a surrogate dataset obtained by sampling subsets of the overall set of objects and representing them by their average feature-vector shifted by the average feature-vector of the original sample of positive examples. We carry out experiments with this method on the prediction of targets of transcription factors in two different organisms, E. Coli and S. Cererevisiae. Our method extends enrichment analysis commonly carried out in Bioinformatics and its results outperform common solutions to this problem. [less ▲]

Detailed reference viewed: 146 (28 ULg)
Full Text
See detailLooking for applications of mixtures of Markov trees in bioinformatics
Schnitzler, François ULg; Geurts, Pierre ULg; Wehenkel, Louis ULg

Scientific conference (2011, March 21)

Probabilistic graphical models (PGM) efficiently encode a probability distribution on a large set of variables. While they have already had several successful applications in biology, their poor scaling in ... [more ▼]

Probabilistic graphical models (PGM) efficiently encode a probability distribution on a large set of variables. While they have already had several successful applications in biology, their poor scaling in terms of the number of variables may make them unfit to tackle problems of increasing size. Mixtures of trees however scale well by design. Experiments on synthetic data have shown the interest of our new learning methods for this model, and we now wish to apply them to relevant problems in bioinformatics. [less ▲]

Detailed reference viewed: 33 (12 ULg)
Full Text
Peer Reviewed
See detailLearning to rank with extremely randomized trees
Geurts, Pierre ULg; Louppe, Gilles ULg

in JMLR: Workshop and Conference Proceedings (2011, January), 14

In this paper, we report on our experiments on the Yahoo! Labs Learning to Rank challenge organized in the context of the 23rd International Conference of Machine Learning (ICML 2010). We competed in both ... [more ▼]

In this paper, we report on our experiments on the Yahoo! Labs Learning to Rank challenge organized in the context of the 23rd International Conference of Machine Learning (ICML 2010). We competed in both the learning to rank and the transfer learning tracks of the challenge with several tree-based ensemble methods, including Tree Bagging, Random Forests, and Extremely Randomized Trees. Our methods ranked 10th in the first track and 4th in the second track. Although not at the very top of the ranking, our results show that ensembles of randomized trees are quite competitive for the “learning to rank” problem. The paper also analyzes computing times of our algorithms and presents some post-challenge experiments with transfer learning methods. [less ▲]

Detailed reference viewed: 335 (73 ULg)
Full Text
Peer Reviewed
See detailAutomatic localization of interest points in zebrafish images with tree-based methods
Stern, Olivier ULg; Marée, Raphaël ULg; Aceto, Jessica ULg et al

in Proceedings of the 6th IAPR International Conference on Pattern Recognition in Bioinformatics (2011)

In many biological studies, scientists assess effects of experimental conditions by visual inspection of microscopy images. They are able to observe whether a protein is expressed or not, if cells are ... [more ▼]

In many biological studies, scientists assess effects of experimental conditions by visual inspection of microscopy images. They are able to observe whether a protein is expressed or not, if cells are going through normal cell cycles, how organisms evolve in different experimental conditions, etc. But, with the large number of images acquired in high-throughput experiments, this manual inspection becomes lengthy, tedious and error-prone. In this paper, we propose to automatically detect specific interest points in microscopy images using machine learning methods with the aim of performing automatic morphometric measurements in the context of Zebrafish studies. We systematically evaluate variants of ensembles of classification and regression trees on four datasets corresponding to different imaging modalities and experimental conditions. Our results show that all variants are effective, with a slight advantage for multiple output methods, which are more robust to parameter choices. [less ▲]

Detailed reference viewed: 61 (17 ULg)
Full Text
Peer Reviewed
See detailA zealous parallel gradient descent algorithm
Louppe, Gilles ULg; Geurts, Pierre ULg

Poster (2010, December 11)

Parallel and distributed algorithms have become a necessity in modern machine learning tasks. In this work, we focus on parallel asynchronous gradient descent and propose a zealous variant that minimizes ... [more ▼]

Parallel and distributed algorithms have become a necessity in modern machine learning tasks. In this work, we focus on parallel asynchronous gradient descent and propose a zealous variant that minimizes the idle time of processors to achieve a substantial speedup. We then experimentally study this algorithm in the context of training a restricted Boltzmann machine on a large collaborative filtering task. [less ▲]

Detailed reference viewed: 221 (47 ULg)
Full Text
Peer Reviewed
See detailInferring Regulatory Networks from Expression Data Using Tree-Based Methods
Huynh-Thu, Vân Anh ULg; Irrthum, Alexandre ULg; Wehenkel, Louis ULg et al

in PLoS ONE (2010), 5(9), 12776

One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray ... [more ▼]

One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions. [less ▲]

Detailed reference viewed: 341 (36 ULg)
Full Text
Peer Reviewed
See detailNetwork Distance Prediction Based on Decentralized Matrix Factorization
Liao, Yongjun ULg; Geurts, Pierre ULg; Leduc, Guy ULg

in Lecture Notes in Computer Science (2010, May 11), 6091

Network Coordinate Systems (NCS) are promising techniques to predict unknown network distances from a limited number of measurements. Most NCS algorithms are based on metric space embedding and suffer ... [more ▼]

Network Coordinate Systems (NCS) are promising techniques to predict unknown network distances from a limited number of measurements. Most NCS algorithms are based on metric space embedding and suffer from the inability to represent distance asymmetries and Triangle Inequality Violations (TIVs). To overcome these drawbacks, we formulate the problem of network distance prediction as guessing the missing elements of a distance matrix and solve it by matrix factorization. A distinct feature of our approach, called Decentralized Matrix Factorization (DMF), is that it is fully decentralized. The factorization of the incomplete distance matrix is collaboratively and iteratively done at all nodes with each node retrieving only a small number of distance measurements. There are no special nodes such as landmarks nor a central node where the distance measurements are collected and stored. We compare DMF with two popular NCS algorithms: Vivaldi and IDES. The former is based on metric space embedding, while the latter is also based on matrix factorization but uses landmarks. Experimental results show thatDMF achieves competitive accuracy with the double advantage of having no landmarks and of being able to represent distance asymmetries and TIVs. [less ▲]

Detailed reference viewed: 120 (15 ULg)