From global to local MDI variable importances for random forests and when they are Shapley values

Sutera, Antonio; Louppe, Gilles; Huynh-Thu, Vân Anh; Wehenkel, Louis; Geurts, Pierre

Download

Paper published in a journal (Scientific congresses and symposiums)

From global to local MDI variable importances for random forests and when they are Shapley values

Sutera, Antonio; Louppe, Gilles; Huynh-Thu, Vân Anh et al.

2021 • In Advances in Neural Information Processing Systems

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/292378

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

NeurIPS-2021-from-global-to-local-mdi-variable-importances-for-random-forests-and-when-they-are-shapley-values-Paper.pdf

Author postprint (403.48 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Abstract :

[en] Random forests have been widely used for their ability to provide so-called importance measures, which give insight at a global (per dataset) level on the relevance of input variables to predict a certain output. On the other hand, methods based on Shapley values have been introduced to refine the analysis of feature relevance in tree-based models to a local (per instance) level. In this context, we first show that the global Mean Decrease of Impurity (MDI) variable importance scores correspond to Shapley values under some conditions. Then, we derive a local MDI importance measure of variable relevance, which has a very natural connection with the global MDI measure and can be related to a new notion of local feature relevance. We further link local MDI importances with Shapley values and discuss them in the light of related measures from the literature. The measures are illustrated through experiments on several classification and regression problems.

Disciplines :

Computer science
Mathematics

Author, co-author :

Sutera, Antonio ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Méthodes stochastiques

Louppe, Gilles ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Huynh-Thu, Vân Anh ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Wehenkel, Louis ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Geurts, Pierre ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Language :

English

Title :

From global to local MDI variable importances for random forests and when they are Shapley values

Publication date :

06 December 2021

Event name :

Neural Information Processing Systems 2021

Event date :

December 6-14, 2021

Audience :

International

Journal title :

Advances in Neural Information Processing Systems

ISSN :

1049-5258

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

https://proceedings.neurips.cc/paper/2021/file/1cfa81af29c6f2d8cacb44921722e753-Paper.pdf

Available on ORBi :

since 17 June 2022

Statistics

Number of views

77 (6 by ULiège)

Number of downloads

43 (1 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

Kellie J Archer and Ryan V Kimes. Empirical characterization of random forest variable importance measures. Computational_statistics_&_data_analysis, 52(4):2249-2260, 2008.
Lidia Auret and Chris Aldrich. Empirical comparison of tree ensemble variable importance measures. Chemometrics_and_Intelligent_Laboratory_Systems, 105(2):157-170, 2011.
Manfred Besner. Axiomatizations of the proportional shapley value. Theory_and_Decision, 86(2): 161-183, 2019.
Leo Breiman. Random forests. Machine_learning, 45(1):5-32, 2001.
Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. Classification_and_regression trees. CRC press, 1984.
Ian Covert, Scott Lundberg, and Su-In Lee. Understanding global feature contributions with additive importance measures. Advances_in_Neural_Information_Processing_Systems, 33, 2020.
Robin Genuer, Jean-Michel Poggi, and Christine Tuleau-Malot. Variable selection using random forests. Pattern_recognition_letters, 31(14):2225-2236, 2010.
Pierre Geurts, Damien Ernst, and Louis Wehenkel. Extremely randomized trees. Machine_learning, 63(1):3-42, 2006.
Hemant Ishwaran et al. Variable importance in binary regression trees and forests. Electronic_Journal of_Statistics, 1:519-537, 2007.
Yacine Izza, Alexey Ignatiev, and Joao Marques-Silva. On explaining decision trees. arXiv_preprint arXiv:2010.11034, 2020.
Ron Kohavi, George H John, et al. Wrappers for feature subset selection. Artificial_intelligence, 97 (1-2):273-324, 1997.
Xiao Li, Yu Wang, Sumanta Basu, Karl Kumbier, and Bin Yu. A debiased mdi feature importance measure for random forests. In Advances_in_Neural_Information_Processing_Systems, pages 8047-8057, 2019.
G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts. Understanding variable importances in forests of randomized trees. In Advances_in_Neural_Information_Processing_Systems_26, pages 431-439, 2013.
Gilles Louppe. Understanding_random_forests:_from_theory_to_practice. PhD thesis, Université de Liège, Liège, Belgique, 2014.
Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Advances in_neural_information_processing_systems, pages 4765-4774, 2017.
Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. From local explanations to global understanding with explainable ai for trees. Nature_machine_intelligence, 2(1):2522-5839, 2020.
Mário Popolin Neto and Fernando V Paulovich. Explainable matrix-visualization for global and local interpretability of random forest classification ensembles. arXiv_preprint_arXiv:2005.04289, 2020.
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. the_Journal_of_machine_Learning_research, 12:2825-2830, 2011.
Ando Saabas. Interpreting random forests. 2014. URL http://blog.datadive.net/interpreting-random-forests/. Last access: February 2021.
Erwan Scornet. Trees, forests, and impurity-based variable importance. arXiv_preprint arXiv:2001.04295, 2020.
Lloyd S Shapley. A value for n-person games. Contributions_to_the_Theory_of_Games, 2(28):307-317, 1953.
Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis, and Torsten Hothorn. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC_bioinformatics, 8(1):25, 2007.
Erik Strumbelj and Igor Kononenko. An efficient explanation of individual classifications using game theory. J._Mach._Learn._Res., 11:1-18, March 2010. ISSN 1532-4435.
Antonio Sutera. Importance_measures_derived_from_random_forests:_characterisation_and_extension. PhD thesis, Université de Liège, Liège, Belgique, 2019.
Antonio Sutera, Célia Châtel, Gilles Louppe, Louis Wehenkel, and Pierre Geurts. Random subspace with trees for feature selection under memory constraints. In International_Conference_on_Artificial Intelligence_and_Statistics, pages 929-937, 2018.
René van den Brink, René Levínskỳ, and Miroslav Zelenỳ. On proper shapley values for monotone tu-games. International_Journal_of_Game_Theory, 44(2):449-471, 2015.
H Peyton Young. Monotonic solutions of cooperative games. International_Journal_of_Game_Theory, 14(2):65-72, 1985.