Reference : Contributions to decision tree induction: bias/variance tradeoff and time series clas... |

Dissertations and theses : Doctoral thesis | |||

Engineering, computing & technology : Computer science | |||

http://hdl.handle.net/2268/25737 | |||

Contributions to decision tree induction: bias/variance tradeoff and time series classification | |

English | |

Geurts, Pierre [Université de Liège - ULg > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >] | |

May-2002 | |

University of Liège Belgium | |

Docteur en Sciences Appliquées | |

260 | |

Wehenkel, Louis | |

[en] Machine learning | |

[en] Because of the rapid progress of computer and information technology, large
amounts of data are nowadays available in a lot of domains. Automatic learning aims at developing algorithms able to produce synthetic high-level information, or models, from this data. Learning algorithms are generally evaluated according to three different criteria: interpretability (how well the model helps to understand the data), predictive accuracy (how well the model can predict unseen situations), and computational efficiency (how fast is the algorithm and how it scales to large databases). This thesis explores two issues in automatic learning: the improvement of the well-known decision tree induction method and the problem of learning classification models for time series data. Decision tree induction method is an automatic learning algorithm which focuses on the modeling of input/output relationships. While this algorithm is among the fastest and most interpretable methods, its accuracy is not always competitive with respect to other algorithms. It is commonly admitted that this suboptimality is due to the excessive variance of this method. We first carry out an empirical study which shows quantitatively how important this variance is, i.e. how strongly decision trees depend on the random nature of the database used to infer them. These experiments confirm that this variance is detrimental not only from the point of view of accuracy but also from the point of view of interpretability. With the goal of improving both interpretability and accuracy, we consider three variance reduction techniques for decision trees. First, in the goal of improving mainly interpretability, we propose several methods which try to stabilize the parameters chosen during tree induction. While these methods succeed in reducing the variability of the parameters, they produce only a slight improvement of the accuracy. Then, we consider perturb and combine algorithms (e.g. bagging, boosting) which consist in combining the predictions of several models obtained by randomizing in some way the learning process. Inspired by the high variance of the parameters defining a decision tree, we propose an extremely randomized decision tree induction algorithm, called extra-tree, which chooses all parameters at random during induction. The aggregation of several of these extra-trees gives an important reduction of variance and this algorithm compares favorably in terms of accuracy and computational efficiency with both bagging and boosting. Because of the randomization of the parameters, the resulting method is also competitive with classical decision tree induction in terms of computational efficiency. In addition to these two approaches, we propose a ``dual'' perturb and combine algorithm which delays the perturbation at the prediction stage and hence requires only one model. In combination with decision tree, this method actually bridges the gap between single decision trees and perturb and combine algorithms. Of the first, it saves the interpretability (by using only one model), and with perturb and combine algorithm, it shares some of the accuracy (by reducing the variance). The second topic of the thesis is the problem of time series classification. The most direct way to solve this problem is to apply existing learning algorithms on low-level variables which correspond to the values of a time series at several time points. Experiments with the tree-based algorithms studied in the first part of the thesis shows that this approach is limited. A variance reduction techniques is then proposed specifically for this kind of data which consists in aggregating the prediction given by a classification model for subsequences of time series. Since this method does not provide interpretable models, we propose a second method which extends decision tree tests by allowing them to detect local shift invariant properties, or patterns, in time series. The study proposed in this part of the thesis is only a first step in the domain but our conclusions give some future work directions for handling complex type of data with automatic learning methods. | |

http://hdl.handle.net/2268/25737 | |

http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2002/Geu02 |

File(s) associated to this reference | ||||||||||||||

| ||||||||||||||

All documents in ORBi are protected by a user license.