Reference : Improvement of randomized ensembles of trees for supervised learning in very high dimens...
Dissertations and theses : Master's dissertation
Engineering, computing & technology : Computer science
Engineering, computing & technology : Electrical & electronics engineering
Improvement of randomized ensembles of trees for supervised learning in very high dimension
[fr] Amélioration des ensemble d'arbres aléatoire pour de l'apprentissage supervisé en très haute dimension
Joly, Arnaud mailto [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
Université de Liège, ​Liège, ​​Belgium
Master en ingénieur civil électricien, à finalité approfondie
Wehenkel, Louis mailto
Geurts, Pierre mailto
Destiné, Jacques mailto
Louveaux, Quentin mailto
Van Steen, Kristel mailto
[en] Machine learning ; Supervised learning ; Ensemble of randomized trees ; Pruning ; L1-norm Regularisation ; LASSO ; Sparse model ; Randomisation
[en] Tree-based ensemble methods, such as random forests and extremely randomized trees, are methods of choice for handling high dimensional problems. One important drawback of these methods however is the complexity of the models (i.e. the large number and size of trees) they produce to achieve good performances.

In this work, several research directions are identified to address this problem. Among those, we have developed the following one. From a tree ensemble, one can extract a set of binary features, each one associated to a leaf or a node of a tree and being true for a given object only if it reaches the corresponding leaf or node when propagated in this tree. Given this representation, the prediction of an ensemble can be simply retrieved by linearly combining these characteristic features with appropriate weights. We apply a linear feature selection method, namely the monotone LASSO, on these features, in order to simplify the tree ensemble. A subtree will then be pruned as soon as the characteristic features corresponding to its constituting nodes are not selected in the linear model.

Empirical experiments show that the combination of the monotone LASSO with features extracted from tree ensembles leads at the same time to a drastic reduction of the number of features and can improve the accuracy with respect to unpruned ensembles of trees.
Systems and Modeling research unit

File(s) associated to this reference

Fulltext file(s):

Open access
JOLY_Arnaud-master_thesis.pdfPublisher postprint1.58 MBView/Open

Additional material(s):

File Commentary Size Access
Private access
résumé AIM.doc39.5 kBRequest copy
Private access
slides.pdf482.6 kBRequest copy
Private access
texte_slide.txt7.79 kBRequest copy

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.