Reference : Understanding variable importances in forests of randomized trees
Scientific congresses and symposiums : Paper published in a book
Engineering, computing & technology : Computer science
http://hdl.handle.net/2268/155642
Understanding variable importances in forests of randomized trees
English
Louppe, Gilles mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Wehenkel, Louis mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Sutera, Antonio mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique >]
Geurts, Pierre mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Dec-2013
Advances in Neural Information Processing Systems 26
Yes
No
International
Neural Information Processing Systems Conference 2013
December 5-10 2013
Lake Tahoe
USA
[en] machine learning ; random forest ; variable importances
[en] Despite growing interest and practical use in various scientific areas, variable importances derived from tree-based ensemble methods are not well understood from a theoretical point of view. In this work we characterize the Mean Decrease Impurity (MDI) variable importances as measured by an ensemble of totally randomized trees in asymptotic sample and ensemble size conditions. We derive a three-level decomposition of the information jointly provided by all input variables about the output in terms of i) the MDI importance of each input variable, ii) the degree of interaction of a given input variable with the other input variables, iii) the different interaction terms of a given degree. We then show that this MDI importance of a variable is equal to zero if and only if the variable is irrelevant and that the MDI importance of a relevant variable is invariant with respect to the removal or the addition of irrelevant variables. We illustrate these properties on a simple example and discuss how they may change in the case of non-totally randomized trees such as Random Forests and Extra-Trees.
Researchers
http://hdl.handle.net/2268/155642
Demo and source code available at https://github.com/glouppe/paper-variable-importances

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
louppe13.pdfMain articleAuthor preprint317.2 kBView/Open
Open access
louppe13-suppl.pdfSupplementary materialsAuthor preprint228.44 kBView/Open

Additional material(s):

File Commentary Size Access
Open access
poster.pdfPoster344.27 kBView/Open
Open access
slides.pdfSpotlight113.2 kBView/Open

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.