[en] We performed risk assessment for Crohn's disease (CD) and ulcerative colitis (UC), the two common forms of inflammatory bowel disease (IBD), by using data from the International IBD Genetics Consortium's Immunochip project. This data set contains ?17,000 CD cases, ?13,000 UC cases, and ?22,000 controls from 15 European countries typed on the Immunochip. This custom chip provides a more comprehensive catalog of the most promising candidate variants by picking up the remaining common variants and certain rare variants that were missed in the first generation of GWAS. Given this unprecedented large sample size and wide variant spectrum, we employed the most recent machine-learning techniques to build optimal predictive models. Our final predictive models achieved areas under the curve (AUCs) of 0.86 and 0.83 for CD and UC, respectively, in an independent evaluation. To our knowledge, this is the best prediction performance ever reported for CD and UC to date.
Disciplines :
Genetics & genetic processes
Author, co-author :
Wei, Zhi
Wang, Wei
Bradfield, Jonathan
Li, Jin
Cardinale, Christopher
Frackelton, Edward
Kim, Cecilia
Mentch, Frank
Van Steen, Kristel ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Visscher, Peter M.
Baldassano, Robert N.
Hakonarson, Hakon
Language :
English
Title :
Large Sample Size, Wide Variant Spectrum, and Advanced Machine-Learning Technique Boost Risk Prediction for Inflammatory Bowel Disease.
Publication date :
2013
Journal title :
American Journal of Human Genetics
ISSN :
0002-9297
eISSN :
1537-6605
Publisher :
University of Chicago Press, Chicago, United States - Illinois
Peer reviewed :
Peer Reviewed verified by ORBi
Commentary :
Copyright (c) 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
A. Franke, D.P. McGovern, J.C. Barrett, K. Wang, G.L. Radford-Smith, T. Ahmad, C.W. Lees, T. Balschun, J. Lee, and R. Roberts Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci Nat. Genet. 42 2010 1118 1125
C.A. Anderson, G. Boucher, C.W. Lees, A. Franke, M. D'Amato, K.D. Taylor, J.C. Lee, P. Goyette, M. Imielinski, and A. Latiano Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47 Nat. Genet. 43 2011 246 252
L. Jostins, S. Ripke, R.K. Weersma, R.H. Duerr, D.P. McGovern, K.Y. Hui, J.C. Lee, L.P. Schumm, Y. Sharma, C.A. Anderson International IBD Genetics Consortium (IIBDGC) Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease Nature 491 2012 119 124
D.M. Evans, P.M. Visscher, and N.R. Wray Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk Hum. Mol. Genet. 18 2009 3525 3531
J. Jakobsdottir, M.B. Gorin, Y.P. Conley, R.E. Ferrell, and D.E. Weeks Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers PLoS Genet. 5 2009 e1000337
J. Kang, S. Kugathasan, M. Georges, H. Zhao, J.H. Cho NIDDK IBD Genetics Consortium Improved risk prediction for Crohn's disease with a multi-locus approach Hum. Mol. Genet. 20 2011 2435 2442
C. Kooperberg, M. LeBlanc, and V. Obenchain Risk prediction using genome-wide association studies Genet. Epidemiol. 34 2010 643 652
Z. Wei, K. Wang, H.Q. Qu, H. Zhang, J. Bradfield, C. Kim, E. Frackleton, C. Hou, J.T. Glessner, and R. Chiavacci From disease association to risk assessment: an optimistic view from genome-wide association studies on type 1 diabetes PLoS Genet. 5 2009 e1000678
S.H. Lee, N.R. Wray, M.E. Goddard, and P.M. Visscher Estimating missing heritability for disease from genome-wide association studies Am. J. Hum. Genet. 88 2011 294 305
R. Tibshirani Regression shrinkage and selection via the lasso J. R. Stat. Soc., B 73 1996 267 288
J. Adler, S.C. Rangwalla, B.A. Dwamena, and P.D. Higgins The prognostic power of the NOD2 genotype for complicated Crohn's disease: a meta-analysis Am. J. Gastroenterol. 106 2011 699 712
M. Peeters, H. Nevens, F. Baert, M. Hiele, A.M. de Meyer, R. Vlietinck, and P. Rutgeerts Familial aggregation in Crohn's disease: increased age-adjusted risk and concordance in clinical characteristics Gastroenterology 111 1996 597 603
J. Fan, and Y. Fan High Dimensional Classification Using Features Annealed Independence Rules Ann. Stat. 36 2008 2605 2637
J. Fan, and J. Lv A Selective Overview of Variable Selection in High Dimensional Feature Space Stat Sin 20 2010 101 148
J. Fan, and R. Li Variable selection via nonconcave penalized likelihood and its oracle properties J. Am. Stat. Assoc. 96 2001 1348 1360
H. Zou, and T. Hastie Regularization and variable selection via the elastic net J. R. Stat. Soc. Series B Stat. Methodol. 67 2005 301 320
G. Abraham, A. Kowalczyk, J. Zobel, and M. Inouye Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease Genet. Epidemiol. 37 2013 184 195
E. Austin, W. Pan, and X. Shen Penalized regression and risk prediction in genome-wide association studies Statistical Analysis and Data Mining 6 2013 10.1002/sam.11183 Published online February 22, 2013
H.H. Zhang, and J. Lv Sure independence screening for ultrahigh dimensional feature space J. R. Stat. Soc. Series B Stat. Methodol. 70 2008 849 911
T. Hastie, R. Tibshirani, and J.J.H. Friedman The elements of statistical learning 2009 Springer New York
P. Hall, J. Marron, and A. Neeman Geometric representation of high dimension, low sample size data J. R. Stat. Soc. Series B Stat. Methodol. 67 2005 427 444