References of "Huynh-Thu, Vân Anh"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailNIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms
Ruyssinck, Joeri; Huynh-Thu, Vân Anh ULg; Geurts, Pierre ULg et al

in PLoS ONE (2014)

One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts ... [more ▼]

One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available. [less ▲]

Detailed reference viewed: 12 (3 ULg)
Full Text
Peer Reviewed
See detailBridging physiological and evolutionary time-scales in a gene regulatory network.
Marchand, Gwenaelle; Huynh-Thu, Vân Anh ULg; Kane, Nolan C. et al

in The New phytologist (2014)

Gene regulatory networks (GRNs) govern phenotypic adaptations and reflect the trade-offs between physiological responses and evolutionary adaptation that act at different time-scales. To identify patterns ... [more ▼]

Gene regulatory networks (GRNs) govern phenotypic adaptations and reflect the trade-offs between physiological responses and evolutionary adaptation that act at different time-scales. To identify patterns of molecular function and genetic diversity in GRNs, we studied the drought response of the common sunflower, Helianthus annuus, and how the underlying GRN is related to its evolution. We examined the responses of 32 423 expressed sequences to drought and to abscisic acid (ABA) and selected 145 co-expressed transcripts. We characterized their regulatory relationships in nine kinetic studies based on different hormones. From this, we inferred a GRN by meta-analyses of a Gaussian graphical model and a random forest algorithm and studied the genetic differentiation among populations (FST ) at nodes. We identified two main hubs in the network that transport nitrate in guard cells. This suggests that nitrate transport is a critical aspect of the sunflower physiological response to drought. We observed that differentiation of the network genes in elite sunflower cultivars is correlated with their position and connectivity. This systems biology approach combined molecular data at different time-scales and identified important physiological processes. At the evolutionary level, we propose that network topology could influence responses to human selection and possibly adaptation to dry environments. [less ▲]

Detailed reference viewed: 9 (2 ULg)
Full Text
Peer Reviewed
See detailIdentification of a microRNA landscape targeting the PI3K/Akt signaling pathway in inflammation-induced colorectal carcinogenesis
JOSSE, Claire ULg; Bouznad, Nassim ULg; Geurts, Pierre ULg et al

in American Journal of Physiology - Gastrointestinal and Liver Physiology (2014), 306

Inflammation can contribute to tumor formation; however, markers that predict progression are still lacking. In the present study, the well-established azoxymethane (AOM)/dextran sulfate sodium (DSS ... [more ▼]

Inflammation can contribute to tumor formation; however, markers that predict progression are still lacking. In the present study, the well-established azoxymethane (AOM)/dextran sulfate sodium (DSS)-induced mouse model of colitis-associated cancer was used to analyze microRNA (miRNA) modulation accompanying inflammation-induced tumor development and to determine whether inflammation-triggered miRNA alterations affect the expression of genes or pathways involved in cancer. A miRNA microarray experiment was performed to establish miRNA expression profiles in mouse colon at early and late time points during inflammation and/or tumor growth. Chronic inflammation and carcinogenesis were associated with distinct changes in miRNA expression. Nevertheless, prediction algorithms of miRNA-mRNA interactions and computational analyses based on ranked miRNA lists consistently identified putative target genes that play essential roles in tumor growth or that belong to key carcinogenesis-related signaling pathways. We identified PI3K/Akt and the insulin growth factor-1 (IGF-1) as major pathways being affected in the AOM/DSS model. DSS-induced chronic inflammation downregulates miR-133a and miR-143/145, which is reportedly associated with human colorectal cancer and PI3K/Akt activation. Accordingly, conditioned medium from inflammatory cells decreases the expression of these miRNA in colorectal adenocarcinoma Caco-2 cells. Overexpression of miR-223, one of the main miRNA showing strong upregulation during AOM/DSS tumor growth, inhibited Akt phosphorylation and IGF-1R expression in these cells. Cell sorting from mouse colons delineated distinct miRNA expression patterns in epithelial and myeloid cells during the periods preceding and spanning tumor growth. Hence, cell-type-specific miRNA dysregulation and subsequent PI3K/Akt activation may be involved in the transition from intestinal inflammation to cancer. [less ▲]

Detailed reference viewed: 22 (5 ULg)
Full Text
Peer Reviewed
See detailGene regulatory network inference from systems genetics data using tree-based methods
Huynh-Thu, Vân Anh ULg; Wehenkel, Louis ULg; Geurts, Pierre ULg

in de la Fuente, Alberto (Ed.) Gene Network Inference - Verification of Methods for Systems Genetics Data (2013)

One of the pressing open problems of computational systems biology is the elucidation of the topology of gene regulatory networks (GRNs). In an attempt to solve this problem, the idea of systems genetics ... [more ▼]

One of the pressing open problems of computational systems biology is the elucidation of the topology of gene regulatory networks (GRNs). In an attempt to solve this problem, the idea of systems genetics is to exploit the natural variations that exist between the DNA sequences of related individuals and that can represent the randomized and multifactorial perturbations necessary to recover GRNs. In this chapter, we present new methods, called GENIE3-SG-joint and GENIE3- SG-sep, for the inference of GRNs from systems genetics data. Experiments on the artificial data of the StatSeq benchmark and of the DREAM5 Systems Genetics challenge show that exploiting jointly expression and genetic data is very helpful for recovering GRNs, and one of our methods outperforms by a large extent the official best performing method of the DREAM5 challenge. [less ▲]

Detailed reference viewed: 84 (19 ULg)
Full Text
Peer Reviewed
See detailMyelin-Derived Lipids Modulate Macrophage Activity by Liver X Receptor Activation
Bogie, Jeroen F. J.; Timmermans, Silke; Huynh-Thu, Vân Anh ULg et al

in PLoS ONE (2012), 7(9), 44998

Multiple sclerosis is a chronic, inflammatory, demyelinating disease of the central nervous system in which macrophages and microglia play a central role. Foamy macrophages and microglia, containing ... [more ▼]

Multiple sclerosis is a chronic, inflammatory, demyelinating disease of the central nervous system in which macrophages and microglia play a central role. Foamy macrophages and microglia, containing degenerated myelin, are abundantly found in active multiple sclerosis lesions. Recent studies have described an altered macrophage phenotype after myelin internalization. However, it is unclear by which mechanisms myelin affects the phenotype of macrophages and how this phenotype can influence lesion progression. Here we demonstrate, by using genome wide gene expression analysis, that myelin-phagocytosing macrophages have an enhanced expression of genes involved in migration, phagocytosis and inflammation. Interestingly, myelin internalization also induced the expression of genes involved in liver-X-receptor signaling and cholesterol efflux. In vitro validation shows that myelin-phagocytosing macrophages indeed have an increased capacity to dispose intracellular cholesterol. In addition, myelin suppresses the secretion of the pro-inflammatory mediator IL-6 by macrophages, which was mediated by activation of liver-X-receptor b. Our data show that myelin modulates the phenotype of macrophages by nuclear receptor activation, which may subsequently affect lesion progression in demyelinating diseases such as multiple sclerosis. [less ▲]

Detailed reference viewed: 24 (5 ULg)
Full Text
Peer Reviewed
See detailWisdom of crowds for robust gene network inference
Marbach, Daniel; Costello, James C.; Küffner, Robert et al

in Nature Methods (2012), 9

Reconstructing gene regulatory networks from high-throughput data is a long-standing challenge. Through the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we performed a ... [more ▼]

Reconstructing gene regulatory networks from high-throughput data is a long-standing challenge. Through the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we performed a comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico microarray data. We characterize the performance, data requirements and inherent biases of different inference approaches, and we provide guidelines for algorithm application and development. We observed that no single inference method performs optimally across all data sets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse data sets. We thereby constructed high-confidence networks for E. coli and S. aureus, each comprising ~ 1,700 transcriptional interactions at a precision of ~50%. We experimentally tested 53 previously unobserved regulatory interactions in E. coli, of which 23 (43%) were supported. Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks. [less ▲]

Detailed reference viewed: 168 (30 ULg)
Full Text
Peer Reviewed
See detailStatistical interpretation of machine learning-based feature importance scores for biomarker discovery
Huynh-Thu, Vân Anh ULg; Saeys, Yvan; Wehenkel, Louis ULg et al

in Bioinformatics (2012), 28(13), 1766-1774

Motivation: Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast and their output is easily interpretable by biologists but they can ... [more ▼]

Motivation: Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast and their output is easily interpretable by biologists but they can only identify variables that provide a significant amount of information in isolation from the other variables. As biological processes are expected to involve complex interactions between variables, univariate methods thus potentially miss some informative biomarkers. Variable relevance scores provided by machine learning techniques, however, are potentially able to highlight multivariate interacting effects, but unlike the p-values returned by univariate tests, these relevance scores are usually not statistically interpretable. This lack of interpretability hampers the determination of a relevance threshold for extracting a feature subset from the rankings and also prevents the wide adoption of these methods by practicians. Results: We evaluated several, existing and novel, procedures that extract relevant features from rankings derived from machine learning approaches. These procedures replace the relevance scores with measures that can be interpreted in a statistical way, such as p-values, false discovery rates, or family wise error rates, for which it is easier to determine a significance level. Experiments were performed on several artificial problems as well as on real microarray datasets. Although the methods differ in terms of computing times and the tradeoff, they achieve in terms of false positives and false negatives, some of them greatly help in the extraction of truly relevant biomarkers and should thus be of great practical interest for biologists and physicians. As a side conclusion, our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive. [less ▲]

Detailed reference viewed: 154 (33 ULg)
Full Text
See detailMachine learning-based feature ranking: Statistical interpretation and gene network inference
Huynh-Thu, Vân Anh ULg

Doctoral thesis (2012)

Machine learning techniques, and in particular supervised learning methods, are nowadays widely used in bioinformatics. Two prominent applications that we target specifically in this thesis are biomarker ... [more ▼]

Machine learning techniques, and in particular supervised learning methods, are nowadays widely used in bioinformatics. Two prominent applications that we target specifically in this thesis are biomarker discovery and regulatory network inference. These two problems are commonly addressed through the use of feature ranking methods that order the input features of a supervised learning problem from the most to the less relevant for predicting the output. This thesis presents, on the one hand, methodological contributions around machine learning-based feature ranking techniques and on the other hand, more applicative contributions on gene regulatory network inference. Our methodological contributions focus on the problem of selecting truly relevant features from machine learning-based feature rankings. Unlike the p-values returned by univariate tests, relevance scores derived from machine learning techniques to rank the features are usually not statistically interpretable. This lack of interpretability makes the identification of the truly relevant features among the top-ranked ones a very difficult task and hence prevents the wide adoption of these methods by practitioners. Our first contribution in this field concerns a procedure, based on permutation tests, that estimates for each subset of top-ranked features the probability for that subset to contain at least one irrelevant feature (called CER for "conditional error rate"). As a second contribution, we performed a large-scale evaluation of several, existing or novel, procedures, including our CER method, that all replace the original relevance scores with measures that can be interpreted in a statistical way. These procedures, which were assessed on several artificial and real datasets, differ greatly in terms of computing times and the tradeoff they achieve in terms of false positives and false negatives. Our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive. The problem of gene regulatory network inference can be formulated as several feature selection problems, each one aiming at discovering the regulators of one target gene. Within this family of methods, we developed the GENIE3 algorithm that exploits feature rankings derived from tree-based ensemble methods to infer gene networks from steady-state gene expression data. In a second step, we derived two extensions of GENIE3 that aim to infer regulatory networks from other types of data. The first extension exploits expression data provided by time course experiments, while the second extension is related to genetical genomics datasets, which contain expression data together with information about genetic markers. GENIE3 was best performer in the DREAM4 In Silico Multifactorial challenge in 2009 and in the DREAM5 Network Inference challenge in 2010, and its extensions perform very well compared to other methods on several artificial datasets. [less ▲]

Detailed reference viewed: 416 (39 ULg)
Full Text
Peer Reviewed
See detailInferring Regulatory Networks from Expression Data Using Tree-Based Methods
Huynh-Thu, Vân Anh ULg; Irrthum, Alexandre ULg; Wehenkel, Louis ULg et al

in PLoS ONE (2010), 5(9), 12776

One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray ... [more ▼]

One of the pressing open problems of computational systems biology is the elucidation of the topology of genetic regulatory networks (GRNs) using high throughput genomic data, in particular microarray gene expression data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) challenge aims to evaluate the success of GRN inference algorithms on benchmarks of simulated data. In this article, we present GENIE3, a new algorithm for the inference of GRNs that was best performer in the DREAM4 In Silico Multifactorial challenge. GENIE3 decomposes the prediction of a regulatory network between p genes into p different regression problems. In each of the regression problems, the expression pattern of one of the genes (target gene) is predicted from the expression patterns of all the other genes (input genes), using tree-based ensemble methods Random Forests or Extra-Trees. The importance of an input gene in the prediction of the target gene expression pattern is taken as an indication of a putative regulatory link. Putative regulatory links are then aggregated over all genes to provide a ranking of interactions from which the whole network is reconstructed. In addition to performing well on the DREAM4 In Silico Multifactorial challenge simulated data, we show that GENIE3 compares favorably with existing algorithms to decipher the genetic regulatory network of Escherichia coli. It doesn't make any assumption about the nature of gene regulation, can deal with combinatorial and non-linear interactions, produces directed GRNs, and is fast and scalable. In conclusion, we propose a new algorithm for GRN inference that performs well on both synthetic and real gene expression data. The algorithm, based on feature selection with tree-based ensemble methods, is simple and generic, making it adaptable to other types of genomic data and interactions. [less ▲]

Detailed reference viewed: 364 (40 ULg)
Full Text
Peer Reviewed
See detailExploiting tree-based variable importances to selectively identify relevant variables
Huynh-Thu, Vân Anh ULg; Wehenkel, Louis ULg; Geurts, Pierre ULg

in JMLR: Workshop and Conference Proceedings (2008), 4

Detailed reference viewed: 94 (36 ULg)