|Reference : Improvement of randomized ensembles of trees for supervised learning in very high dimens...|
|Dissertations and theses : Master's dissertation|
|Engineering, computing & technology : Computer science|
Engineering, computing & technology : Electrical & electronics engineering
|Improvement of randomized ensembles of trees for supervised learning in very high dimension|
|[fr] Amélioration des ensemble d'arbres aléatoire pour de l'apprentissage supervisé en très haute dimension|
|Joly, Arnaud [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]|
|Université de Liège, Liège, Belgium|
|Master en ingénieur civil électricien, à finalité approfondie|
|Van Steen, Kristel|
|[en] Machine learning ; Supervised learning ; Ensemble of randomized trees ; Pruning ; L1-norm Regularisation ; LASSO ; Sparse model ; Randomisation|
|[en] Tree-based ensemble methods, such as random forests and extremely randomized trees, are methods of choice for handling high dimensional problems. One important drawback of these methods however is the complexity of the models (i.e. the large number and size of trees) they produce to achieve good performances.
In this work, several research directions are identified to address this problem. Among those, we have developed the following one. From a tree ensemble, one can extract a set of binary features, each one associated to a leaf or a node of a tree and being true for a given object only if it reaches the corresponding leaf or node when propagated in this tree. Given this representation, the prediction of an ensemble can be simply retrieved by linearly combining these characteristic features with appropriate weights. We apply a linear feature selection method, namely the monotone LASSO, on these features, in order to simplify the tree ensemble. A subtree will then be pruned as soon as the characteristic features corresponding to its constituting nodes are not selected in the linear model.
Empirical experiments show that the combination of the monotone LASSO with features extracted from tree ensembles leads at the same time to a drastic reduction of the number of features and can improve the accuracy with respect to unpruned ensembles of trees.
|Systems and Modeling research unit|
|File(s) associated to this reference|
All documents in ORBi are protected by a user license.