Investigation and reduction of discretization Variance in decision tree induction
English
Geurts, Pierre[Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Wehenkel, Louis[Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
2000
Proceedings of ECML 2000, European Conference on Machine Learning
Springer-Verlag
LNAI 1810
162-170
Yes
No
International
European Conference on Machine Learning
2000
Barcelona
Spain
[en] machine learning
[en] This paper focuses on the variance introduced by the discretization techniques used to handle continuous attributes in decision tree induction. Different discretization procedures are first studied empirically, then means to reduce the discretization variance are proposed. The experiments shows that discretization variance is large and that it is possible to reduce it significantly without notable computational costs. The resulting variance reduction mainly improves interpretability and stability of decision trees, and marginally their accuracy.