Ensembles on Random PatchesLouppe, Gilles ; Geurts, Pierre ![]() in Machine Learning and Knowledge Discovery in Databases (2012) In this paper, we consider supervised learning under the assumption that the available memory is small compared to the dataset size. This general framework is relevant in the context of big data ... [more ▼] In this paper, we consider supervised learning under the assumption that the available memory is small compared to the dataset size. This general framework is relevant in the context of big data, distributed databases and embedded systems. We investigate a very simple, yet effective, ensemble framework that builds each individual model of the ensemble from a random patch of data obtained by drawing random subsets of both instances and features from the whole dataset. We carry out an extensive and systematic evaluation of this method on 29 datasets, using decision tree-based estimators. With respect to popular ensemble methods, these experiments show that the proposed method provides on par performance in terms of accuracy while simultaneously lowering the memory needs, and attains significantly better performance when memory is severely constrained. [less ▲] Detailed reference viewed: 161 (29 ULg) Learning to rank with extremely randomized treesGeurts, Pierre ; Louppe, Gilles ![]() in JMLR: Workshop and Conference Proceedings (2011, January), 14 In this paper, we report on our experiments on the Yahoo! Labs Learning to Rank challenge organized in the context of the 23rd International Conference of Machine Learning (ICML 2010). We competed in both ... [more ▼] In this paper, we report on our experiments on the Yahoo! Labs Learning to Rank challenge organized in the context of the 23rd International Conference of Machine Learning (ICML 2010). We competed in both the learning to rank and the transfer learning tracks of the challenge with several tree-based ensemble methods, including Tree Bagging, Random Forests, and Extremely Randomized Trees. Our methods ranked 10th in the first track and 4th in the second track. Although not at the very top of the ranking, our results show that ensembles of randomized trees are quite competitive for the “learning to rank” problem. The paper also analyzes computing times of our algorithms and presents some post-challenge experiments with transfer learning methods. [less ▲] Detailed reference viewed: 262 (57 ULg) A zealous parallel gradient descent algorithmLouppe, Gilles ; Geurts, Pierre ![]() Poster (2010, December 11) Parallel and distributed algorithms have become a necessity in modern machine learning tasks. In this work, we focus on parallel asynchronous gradient descent and propose a zealous variant that minimizes ... [more ▼] Parallel and distributed algorithms have become a necessity in modern machine learning tasks. In this work, we focus on parallel asynchronous gradient descent and propose a zealous variant that minimizes the idle time of processors to achieve a substantial speedup. We then experimentally study this algorithm in the context of training a restricted Boltzmann machine on a large collaborative filtering task. [less ▲] Detailed reference viewed: 157 (32 ULg) Collaborative filtering: Scalable approaches using restricted Boltzmann machinesLouppe, Gilles ![]() Master's dissertation (2010) Parallel to the growth of electronic commerce, recommender systems have become a very active area of research, both in the industry and in the academic world. The goal of these systems is to make ... [more ▼] Parallel to the growth of electronic commerce, recommender systems have become a very active area of research, both in the industry and in the academic world. The goal of these systems is to make automatic but personal recommendations when customers are overwhelmed with thousands of possibilities and do not know what to look for. In that context, the object of this work is threefold. The first part consists in a survey of recommendation algorithms and emphasizes on a class of algorithms known as collaborative filtering algorithms. The second part consists in studying in more depth a specific model of neural networks known as restricted Boltzmann machines. That model is then experimentaly and extensively examined on a recommendation problem. The third part of this work focuses on how restricted Boltzmann machines can be made more scalable. Three different and original approaches are proposed and studied. In the first approach, we revisit the learning and test algorithms of restricted Boltzmann machines in the context of shared-memory architectures. In the second approach, we propose to reformulate these algorithms as MapReduce tasks. Finally, in the third method, ensemble of RBMs are investigated. The best and the more promising results are obtained with the MapReduce approach. [less ▲] Detailed reference viewed: 459 (62 ULg) |
||