Network inference; Machine learning; Decision trees
Abstract :
[en] Networks are ubiquitous in biology, and computational approaches have been largely investigated for their inference. In particular, supervised machine learning methods can be used to complete a partially known network by integrating various measurements. Two main supervised frameworks have been proposed: the local approach, which trains a separate model for each network node, and the global approach, which trains a single model over pairs of nodes. Here, we systematically investigate, theoretically and empirically, the exploitation of tree-based ensemble methods in the context of these two approaches for biological network inference. We first formalize the problem of network inference as a classification of pairs, unifying in the process homogeneous and bipartite graphs and discussing two main sampling schemes. We then present the global and the local approaches, extending the latter for the prediction of interactions between two unseen network nodes, and discuss their specializations to tree-based ensemble methods, highlighting their interpretability and drawing links with clustering techniques. Extensive computational experiments are carried out with these methods on various biological networks that clearly highlight that these methods are competitive with existing methods.
Disciplines :
Engineering, computing & technology: Multidisciplinary, general & others
Author, co-author :
Schrynemackers, Marie ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Dép. d'électric., électron. et informat. (Inst.Montefiore)
Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Madan Babu, Mohan
Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Language :
English
Title :
Classifying pairs with trees for supervised biological network inference
J.-P. Vert, Elements of Computational Systems Biology, John Wiley & Sons, Inc., 2010, ch. 7, pp. 165-188
K. Bleakley G. Biau J.-P. Vert Bioinformatics 2007 23 i57 i65
F. Mordelet J.-P. Vert Bioinformatics 2008 24 i76 i82
A. Ben-Hur W. S. Noble Bioinformatics 2005 21 i38 i46
J.-P. Vert J. Qiu W. S. Noble BMC Bioinf. 2007 8 S8
M. Hue and J.-P. Vert, Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 2010
L. Breiman Mach. Learn. 2001 45 5 32
N. Lin B. Wu R. Jansen M. Gerstein H. Zhao BMC Bioinf. 2004 5 154
X.-W. Chen M. Liu Bioinformatics 2005 21 4394 4400
Y. Qi Z. Bar-Joseph J. Klein-Seetharaman Proteins 2006 63 490 500
O. Tastan Y. Qi J. G. Carbonell J. Klein-Seetharaman Pac. Symp. Biocomput. 2009 14 516 527
H. Yu J. Chen X. Xu Y. Li H. Zhao Y. Fang X. Li W. Zhou W. Wang Y. Wang PLoS One 2012 7 e37608
T. Kato K. Tsuda A. Kiyoshi Bioinformatics 2005 21 2488 2495
P. Geurts N. Touleimat M. Dutreix F. d'Alché Buc BMC Bioinf. 2007 8 S4
C. Brouard, F. D'Alche-Buc and M. Szafranski, Proceedings of the 28th International Conference on Machine Learning (ICML-11), New York, NY, USA, 2011, pp. 593-600
Y. Qi J. Klein-seetharaman Z. Bar-joseph Y. Qi Z. Bar-joseph Pac. Symp. Biocomput. 2005 2005 531 542
F. Cheng C. Liu J. Jiang W. Lu W. Li G. Liu W. Zhou J. Huang Y. Tang PLoS Comput. Biol. 2012 8 e1002503
M. Schrynemackers R. Kuffner P. Geurts Front. Genet. 2013 4 262
Y. Park E. M. Marcotte Nat. Methods 2012 9 1134 1136
T. Pahikkala, M. Stock, A. Airola, T. Aittokallio, B. De Baets and W. Waegeman, in Machine Learning and Knowledge Discovery in Databases, ed., T. Calders, F. Esposito, E. Hullermeier, and, R. Meo, Springer, Berlin, Heidelberg, 2014, vol. 8725, pp. 517-532
L. Breiman, J. Friedman, R. Olsen and C. Stone, Classification and Regression Trees, Wadsworth International, 1984
P. Geurts D. Ernst L. Wehenkel Mach. Learn. 2006 63 3 42
H. Blockeel, L. De Raedt and J. Ramon, Proceedings of ICML 1998, 1998, pp. 55-63
P. Geurts A. Irrthum L. Wehenkel Mol. BioSyst. 2009 5 1593 1605
S. Madeira and A. Oliveira, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 2004, vol. 1, pp. 24-45
C. V. Mering R. Krause B. Snel M. Cornell S. G. Oliver S. Fields P. Bork Nature 2002 417 399 403
Y. Yamanishi J.-P. Vert Bioinformatics 2004 20 i363 i370
M. Schuldiner S. Collins N. Thompson V. Denic A. Bhamidipati T. Punna J. Ihmels B. Andrews C. Boone J. Greenblatt J. Weissman N. Krogan Cell 2005 123 507 519
M. Hillenmeyer et al. Science 2008 320 362 365
Y. Yamanishi J.-P. Vert Bioinformatics 2005 21 i468 i477
J. J. Faith B. Hayete J. T. Thaden I. Mogno J. Wierzbowski G. Cottarel S. Kasif J. J. Collins T. S. Gardner PLoS Biol. 2007 5 e8
K. D. MacIsaac T. Wang B. Gordon D. K. Gifford G. D. Stormo E. Fraenkel BMC Bioinf. 2006 7 113
T. Hughes M. Marton A. Jones C. Roberts R. Stoughton C. Armour H. Bennett E. Coffey H. Dai Y. He M. Kidd A. King M. Meyer D. Slade P. Lum S. Stepaniants D. Shoemaker D. Gachotte K. Chakraburtty J. Simon M. Bard S. Friend Cell 2000 102 109 126
Z. Hu P. J. Killion V. R. Iyer Nat. Genet. 2007 39 683 687
G. Chua Q. D. Morris R. Sopko M. D. Robinson O. Ryan E. T. Chan B. J. Frey B. J. Andrews C. Boone T. R. Hughes Proc. Natl. Acad. Sci. U. S. A. 2006 103 12045 12050
J. Faith M. Driscoll V. Fusaro E. Cosgrove B. Hayete F. Juhn S. Schneider T. Gardner Nucleic Acids Res. 2007 36 866 870
S. Brohée R. Janky F. Abdel-Sater G. Vanderstocken B. André J. van Helden Nucleic Acids Res. 2011 39 6340 6358
Y. Yamanishi E. Pauwels H. Saigo V. Stoven J. Chem. Inf. Model. 2011 51 1183 1194
J. Gillis P. Pavlidis PLoS One 2011 6 e17258
J. Davis and M. Goadrich, Proceedings of the 23rd International Conference on Machine Learning, 2006, pp. 223-240
Y. Tabei E. Pauwels V. Stoven K. Takemoto Y. Yamanishi Bioinformatics 2012 28 i487 i494
G. Tsoumakas and I. Katakis, International Journal of Data Warehousing and Mining (IJDWM), 2007, vol. 3, pp. 1-13
C. Elkan and K. Noto, KDD '08 Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, pp. 213-220
F. Denis R. Gilleron F. Letouzey Theor. Comput. Sci. 2005 348 70 83