L1-based compression of random forest modelsJoly, Arnaud ; Schnitzler, François ; Geurts, Pierre et alin Proceeding of the 21st Belgian-Dutch Conference on Machine Learning (2012, May 24) Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive ... [more ▼] Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible. [less ▲] Detailed reference viewed: 63 (27 ULg) L1-based compression of random forest modelsJoly, Arnaud ; Schnitzler, François ; Geurts, Pierre et alin 20th European Symposium on Artificial Neural Networks (2012, April) Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive ... [more ▼] Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible. [less ▲] Detailed reference viewed: 139 (36 ULg) Pruning randomized trees with L1-norm regularizationJoly, Arnaud ; Schnitzler, François ; Geurts, Pierre et alPoster (2011, November 29) Growing amount of high dimensional data requires robust analysis techniques. Tree-based ensemble methods provide such accurate supervised learning models. However, the model complexity can become utterly ... [more ▼] Growing amount of high dimensional data requires robust analysis techniques. Tree-based ensemble methods provide such accurate supervised learning models. However, the model complexity can become utterly huge depending on the dimension of the dataset. Here we propose a method to compress such ensemble using random tree induced space and L1-norm regularisation. This leads to a drastic pruning, preserving or improving the model accuracy. Moreover, our approach increases robustness with respect to the selection of complexity parameters. [less ▲] Detailed reference viewed: 39 (11 ULg) Improvement of randomized ensembles of trees for supervised learning in very high dimensionJoly, Arnaud ![]() Master's dissertation (2011) Tree-based ensemble methods, such as random forests and extremely randomized trees, are methods of choice for handling high dimensional problems. One important drawback of these methods however is the ... [more ▼] Tree-based ensemble methods, such as random forests and extremely randomized trees, are methods of choice for handling high dimensional problems. One important drawback of these methods however is the complexity of the models (i.e. the large number and size of trees) they produce to achieve good performances. In this work, several research directions are identified to address this problem. Among those, we have developed the following one. From a tree ensemble, one can extract a set of binary features, each one associated to a leaf or a node of a tree and being true for a given object only if it reaches the corresponding leaf or node when propagated in this tree. Given this representation, the prediction of an ensemble can be simply retrieved by linearly combining these characteristic features with appropriate weights. We apply a linear feature selection method, namely the monotone LASSO, on these features, in order to simplify the tree ensemble. A subtree will then be pruned as soon as the characteristic features corresponding to its constituting nodes are not selected in the linear model. Empirical experiments show that the combination of the monotone LASSO with features extracted from tree ensembles leads at the same time to a drastic reduction of the number of features and can improve the accuracy with respect to unpruned ensembles of trees. [less ▲] Detailed reference viewed: 75 (26 ULg) |
||