A screening methodology based on Random Forests to improve the detection of gene-gene interactions

[en] The search for susceptibility loci in gene-gene interactions imposes a methodological and computational challenge for statisticians because of the large dimensionality inherent to the modelling of gene-gene interactions or epistasis. In an era in which genome-wide scans have become relatively common, new powerful methods are required to handle the huge amount of feasible gene-gene interactions and to weed out false positives and negatives from these results. One solution to the dimensionality problem is to reduce data by preliminary screening of markers to select the best candidates for further analysis. Ideally, this screening step is statistically independent of the testing phase. Initially developed for small numbers of markers, the Multifactor Dimensionality Reduction (MDR) method is a nonparametric, model-free data reduction technique to associate sets of markers with optimal predictive properties to disease. In this study, we examine the power of MDR in larger data sets and compare it with other approaches that are able to identify gene-gene interactions. Under various interaction models (purely and not purely epistatic), we use a Random Forest (RF)-based prescreening method, before executing MDR, to improve its performance. We find that the power of MDR increases when noisy SNPs are first removed, by creating a collection of candidate markers with RFs. We validate our technique by extensive simulation studies and by application to asthma data from the European Committee of Respiratory Health Study II.European Journal of Human Genetics advance online publication, 12 May 2010; doi:10.1038/ejhg.2010.48.

Disciplines :

Genetics & genetic processes

Author, co-author :

De Lobel, L.

Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Baele, G.

Castro-Giner, F.

Kogevinas, M.

Van Steen, Kristel ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique

Language :

English

Title :

A screening methodology based on Random Forests to improve the detection of gene-gene interactions

Publication date :

2010

Journal title :

European Journal of Human Genetics

ISSN :

1018-4813

eISSN :

1476-5438

Publisher :

Natue Publishing Group, United Kingdom

Volume :

Issue :

1127

Pages :

1132

Peer reviewed :

Peer Reviewed verified by ORBi

Commentary :

2010/05/13

Available on ORBi :

since 22 May 2010

Statistics

Number of views

95 (12 by ULiège)

Number of downloads

7 (6 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

Bibliography

Moore JH: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered 2003; 56: 73-82.
Ritchie MD, Hahn LW, Roodi N et al: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 2001; 69: 138-147.
MDR (Windows) software, http://www.multifactordimensionalityreduction. org/.
Breiman L: Random forests. Machine Learning 2001; 45: 5-32.
Bureau A, Dupuis J, Falls K et al: Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol 2005; 28: 171-182.
Ritchie MD, Hahn LW, Moore JH: Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Genet Epidemiol 2003; 24: 150-157.
Marchini J, Donnelly P, Cardon LR: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet 2005; 37: 413-417.
Millstein J, Conti DV, Gilliland FD, Gauderman WJ: A testing framework for identifying susceptibility genes in the presence of epistasis. Am J Hum Genet 2006; 78: 15-27.
Castro-Giner F, Kogevinas M, Machler M et al: TNFA-308G4A in two international population-based cohorts and risk of asthma. Eur Respir J 2008; 32: 350-361.
Laitinen T, Polvi A, Rydman P et al: Characterization of a common susceptibility locus for asthma-related traits. Science 2004; 304: 300-304.
Allen M, Heinzmann A, Noguchi E et al: Positional cloning of a novel gene influencing asthma from chromosome 2q14. Nat Genet 2003; 35: 258-263.
Pulkkinen V, Majuri ML, Wang G et al: Neuropeptide S and G protein-coupled receptor 154 modulate macrophage immune responses. Hum Mol Genet 2006; 15: 1667-1679.
Bruce S, Nyberg F, Melen E et al: The protective effect of farm animal exposure on childhood allergy is modified by NPSR1 polymorphisms. J Med Genet 2009; 46: 159-167.
Wills-Karp M, Ewart SL: Time to draw breath: asthma-susceptibility genes are identified. Nat Rev Genet 2004; 5: 376-387.
Allen IC, Pace AJ, Jania LA et al: Expression and function of NPSR1/GPRA in the lung before and after induction of asthma-like disease. Am J Physiol-Lung Cell Mol Physiol 2006; 291: L1005-L1017.
Qi SY, Riviere PJ, Trojnar J, Junien JL, Akinsanya KO: Cloning and characterization of dipeptidyl peptidase 10, a new member of an emerging subgroup of serine proteases. Biochem J 2003; 373: 179-189.
Edwards TL, Lewis K, Velez DR, Dudek S, Ritchie MD: Exploring the performance of Multifactor Dimensionality Reduction in large scale SNP studies and in the presence of genetic heterogeneity among epistatic disease models. Hum Hered 2009; 67: 183-192.
Random Jungle, http://www.randomjungle.com/.
Nettleton D, Doerge RW: Accounting for variability in the use of permutation testing to detect quantitative trait loci. Biometrics 2000; 56: 52-58.
Pattin KA, White BC, Barney N et al: A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction. Genet Epidemiol 2009; 33: 87-94.