L1-based compression of random forest models Joly, Arnaud ; Schnitzler, François ; Geurts, Pierre et al in 20th European Symposium on Artificial Neural Networks (2012, April) Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive ... [more ▼] Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible. [less ▲] Detailed reference viewed: 382 (72 ULg)Learning to play K-armed bandit problems Maes, Francis ; Wehenkel, Louis ; Ernst, Damien in Proceedings of the 4th International Conference on Agents and Artificial Intelligence (ICAART 2012) (2012, February) We propose a learning approach to pre-compute K-armed bandit playing policies by exploiting prior information describing the class of problems targeted by the player. Our algorithm ﬁrst samples a set of K ... [more ▼] We propose a learning approach to pre-compute K-armed bandit playing policies by exploiting prior information describing the class of problems targeted by the player. Our algorithm ﬁrst samples a set of K-armed bandit problems from the given prior, and then chooses in a space of candidate policies one that gives the best average performances over these problems. The candidate policies use an index for ranking the arms and pick at each play the arm with the highest index; the index for each arm is computed in the form of a linear combination of features describing the history of plays (e.g., number of draws, average reward, variance of rewards and higher order moments), and an estimation of distribution algorithm is used to determine its optimal parameters in the form of feature weights. We carry out simulations in the case where the prior assumes a ﬁxed number of Bernoulli arms, a ﬁxed horizon, and uniformly distributed parameters of the Bernoulli arms. These simulations show that learned strategies perform very well with respect to several other strategies previously proposed in the literature (UCB1, UCB2, UCB-V, KL-UCB and $\epsilon_n$-GREEDY); they also highlight the robustness of these strategies with respect to wrong prior information. [less ▲] Detailed reference viewed: 143 (19 ULg)Decoding Semi-Constrained Brain Activity from fMRI Using Support Vector Machines and Gaussian Processes Schrouff, Jessica ; Kussé, Caroline ; Wehenkel, Louis et al in PLoS ONE (2012), 7(4), Predicting a particular cognitive state from a specific pattern of fMRI voxel values is still a methodological challenge. Decoding brain activity is usually performed in highly controlled experimental ... [more ▼] Predicting a particular cognitive state from a specific pattern of fMRI voxel values is still a methodological challenge. Decoding brain activity is usually performed in highly controlled experimental paradigms characterized by a series of distinct states induced by a temporally constrained experimental design. In more realistic conditions, the number, sequence and duration of mental states are unpredictably generated by the individual, resulting in complex and imbalanced fMRI data sets. This study tests the classification of brain activity, acquired on 16 volunteers using fMRI, during mental imagery, a condition in which the number and duration of mental events were not externally imposed but self-generated. To deal with these issues, two classification techniques were considered (Support Vector Machines, SVM, and Gaussian Processes, GP), as well as different feature extraction methods (General Linear Model, GLM and SVM). These techniques were combined in order to identify the procedures leading to the highest accuracy measures. Our results showed that 12 data sets out of 16 could be significantly modeled by either SVM or GP. Model accuracies tended to be related to the degree of imbalance between classes and to task performance of the volunteers. We also conclude that the GP technique tends to be more robust than SVM to model unbalanced data sets. [less ▲] Detailed reference viewed: 79 (22 ULg)An Efficient Algorithm to Perform Multiple Testing in Epistasis Screening Van Lishout, François ; Cattaert, Tom ; Mahachie John, Jestinah et al Conference (2011, December 13) Background: Research in epistasis or gene-gene interaction detection for human complex traits has grown exponentially over the last few years. It has been marked by promising methodological developments ... [more ▼] Background: Research in epistasis or gene-gene interaction detection for human complex traits has grown exponentially over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. In main-effects detection, this is not a problem since the memory required is thus proportional to the number of SNPs. In contrast, gene-gene interaction studies will require a memory proportional to the squared amount of SNPs. A genome wide epistasis would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. Methods: In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MB-MDR-2.6.2 and compared to MB-MDR's first implementation as an R-package (Calle et al., Bioinformatics 2010). We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn's disease. Results: The sequential version of MBMDR-2.6.2 is approximately 5,500 times faster than its R counterparts. The parallel version (tested on a cluster composed of 14 blades, containing each 4 quad-cores Intel Xeon CPU E5520@2.27 GHz) is approximately 900,000 times faster than the latter, for results of the same quality on the simulated data. It analyses all gene-gene interactions of a dataset of 100,000 SNPs typed on 1000 individuals within 4 days. Our program found 14 SNP-SNP interactions with a p-value less than 0.05 on the real-life Crohn’s disease data. Conclusions: Our software is able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory. A new implementation to reach genome wide epistasis screening is under construction. In the context of Crohn's disease, MBMDR-2.6.2 found signal in regions well known in the field and our results could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype associations. [less ▲] Detailed reference viewed: 69 (26 ULg)Phenotype Classification of Zebrafish Embryos by Supervised Learning Jeanray, Nathalie ; Marée, Raphaël ; Pruvot, Benoist et al Poster (2011, December 08) Detailed reference viewed: 56 (22 ULg)Pruning randomized trees with L1-norm regularization Joly, Arnaud ; Schnitzler, François ; Geurts, Pierre et al Poster (2011, November 29) Growing amount of high dimensional data requires robust analysis techniques. Tree-based ensemble methods provide such accurate supervised learning models. However, the model complexity can become utterly ... [more ▼] Growing amount of high dimensional data requires robust analysis techniques. Tree-based ensemble methods provide such accurate supervised learning models. However, the model complexity can become utterly huge depending on the dimension of the dataset. Here we propose a method to compress such ensemble using random tree induced space and L1-norm regularisation. This leads to a drastic pruning, preserving or improving the model accuracy. Moreover, our approach increases robustness with respect to the selection of complexity parameters. [less ▲] Detailed reference viewed: 80 (27 ULg)Decoding semi-constrained brain activity from fMRI using SVM and GP Schrouff, Jessica ; Kussé, Caroline ; Wehenkel, Louis et al Scientific conference (2011, November 22) Predicting a particular cognitive state from a specific pattern of fMRI voxel values is still a methodological challenge. Decoding brain activity is usually performed in highly controlled experimental ... [more ▼] Predicting a particular cognitive state from a specific pattern of fMRI voxel values is still a methodological challenge. Decoding brain activity is usually performed in highly controlled experimental paradigms characterized by a series of distinct states induced by a temporally constrained experimental design. In more realistic conditions, the number, sequence and duration of mental states are unpredictably generated by the individual, resulting in complex and imbalanced fMRI data sets. This study tests the classification of brain activity, acquired on 16 volunteers using fMRI, during mental imagery, a condition in which the number and duration of mental events were not externally imposed but self-generated. To deal with these issues, two classification techniques were considered (Support Vector Machines, SVM, and Gaussian Processes, GP), as well as different feature extraction methods (General Linear Model, GLM and SVM). These techniques were combined in order to identify the procedures leading to the highest accuracy measures. Our results showed that 12 data sets out of 16 could be significantly modeled by either SVM or GP. Model accuracies tended to be related to the degree of imbalance between classes and to task performance of the volunteers. We also conclude that the GP technique tends to be more robust than SVM to model unbalanced data sets. [less ▲] Detailed reference viewed: 72 (6 ULg)A web-based framework for visualization, annotation, and automatic exploitation of high-resolution bioimages using tree-based machine learning methods Stevens, Benjamin ; Rollus, Loïc ; Wehenkel, Louis et al Poster (2011, November 02) Detailed reference viewed: 140 (20 ULg)Phenotype Classification of Zebrafish Embryos by Supervised Learning Jeanray, Nathalie ; Marée, Raphaël ; Pruvot, Benoist et al Conference (2011, September 02) Detailed reference viewed: 40 (13 ULg)Distributed MPC of wide-area electromechanical oscillations of large-scale power systems Wang, Da ; ; Wehenkel, Louis in Proceedings of ISAP 2011 (2011, September) We investigate distributed Model Predictive Control (MPC) to damp wide-area electromechanical oscillations. Our distributed MPC schemes are derived from and compared with a fully centralized MPC scheme ... [more ▼] We investigate distributed Model Predictive Control (MPC) to damp wide-area electromechanical oscillations. Our distributed MPC schemes are derived from and compared with a fully centralized MPC scheme proposed in a previous publication. Based on simulations carried out using a 16-generator, 70-bus, two-area test power system, we show that simple coordination schemes based on additional local measurements’ feedback yield already a significant improvement with respect to a scheme with only implicit coordination, improve significantly with respect to purely local controls, and in this respect reach about 75% of the improvements obtained by an ideal centralized MPC scheme. [less ▲] Detailed reference viewed: 33 (7 ULg)Efficiently approximating Markov tree bagging for high-dimensional density estimation Schnitzler, François ; ; et al in Gunopulos, Dimitrios; Hofmann, Thomas; Malerba, Donato (Eds.) et al Machine Learning and Knowledge Discovery in Databases, Part III (2011, September) We consider algorithms for generating Mixtures of Bagged Markov Trees, for density estimation. In problems deﬁned over many variables and when few observations are available, those mixtures generally ... [more ▼] We consider algorithms for generating Mixtures of Bagged Markov Trees, for density estimation. In problems deﬁned over many variables and when few observations are available, those mixtures generally outperform a single Markov tree maximizing the data likelihood, but are far more expensive to compute. In this paper, we describe new algorithms for approximating such models, with the aim of speeding up learning without sacriﬁcing accuracy. More speciﬁcally, we propose to use a ﬁltering step obtained as a by-product from computing a ﬁrst Markov tree, so as to avoid considering poor candidate edges in the subsequently generated trees. We compare these algorithms (on synthetic data sets) to Mixtures of Bagged Markov Trees, as well as to a single Markov tree derived by the classical Chow-Liu algorithm and to a recently proposed randomized scheme used for building tree mixtures. [less ▲] Detailed reference viewed: 80 (23 ULg)Day-ahead Security Assessment under Uncertainty Relying on the Combination of Preventive and Corrective Controls to Face Worst-Case Scenarios Capitanescu, Florin ; ; et al in PSCC proceedings Stockholm (Sweden) 2011 (2011, August 22) This paper deals with day-ahead static security assessment with respect to a postulated set of contingencies while taking into account uncertainties about the next day system conditions. We propose a ... [more ▼] This paper deals with day-ahead static security assessment with respect to a postulated set of contingencies while taking into account uncertainties about the next day system conditions. We propose a heuristic approach to check whether, given some assumptions regarding these uncertainties, the worst case with respect to each contingency is still controllable by appropriate combinations of preventive and corrective actions. This approach relies on the solution of successive optimal power flow (OPF) and security-constrained optimal power flow (SCOPF) problems of a special type. The interest of the approach is shown by illustrative examples on the Nordic32 system. [less ▲] Detailed reference viewed: 177 (8 ULg)State-of-the-art, challenges, and future trends in security constrained optimal power flow Capitanescu, Florin ; ; et al in Electric Power Systems Research (2011), 81(8), 1731-1741 This paper addresses the main challenges to the security constrained optimal power flow (SCOPF) computations. We first discuss the issues related to the SCOPF problem formulation such as the use of a ... [more ▼] This paper addresses the main challenges to the security constrained optimal power flow (SCOPF) computations. We first discuss the issues related to the SCOPF problem formulation such as the use of a limited number of corrective actions in the post-contingency states and the modeling of voltage and transient stability constraints. Then we deal with the challenges to the techniques for solving the SCOPF, focusing mainly on: approaches to reduce the size of the problem by either efficiently identifying the binding contingencies and including only these contingencies in the SCOPF or by using approximate models for the post-contingency states, and the handling of discrete variables. We finally address the current trend of extending the SCOPF formulation to take into account the increasing levels of uncertainty in the operation planning. For each such topic we provide a review of the state of the art, we identify the advances that are needed, and we indicate ways to bridge the gap between the current state of the art and these needs. [less ▲] Detailed reference viewed: 315 (17 ULg)SOLVING VERY LARGE-SCALE SECURITY-CONSTRAINED OPTIMAL POWER FLOW PROBLEMS BY COMBINING ITERATIVE CONTINGENCY SELECTION AND NETWORK COMPRESSION ; ; Capitanescu, Florin et al in PSCC conference (2011, August) This paper proposes a practical algorithm for solving very large-scale SCOPF problems, based on the combination of a contingency filtering scheme, used to identify the binding contingencies at the optimum ... [more ▼] This paper proposes a practical algorithm for solving very large-scale SCOPF problems, based on the combination of a contingency filtering scheme, used to identify the binding contingencies at the optimum, and a network compression method, used to reduce the complexity of the post-contingency models included in the SCOPF formulation. By combining these two complementary simplifications, it is possible to solve SCOPF problems addressing both preventive and corrective controls on continental sized power system models and with a very large number of contingencies. The proposed algorithms are implemented with state-of-the-art solvers and applied on a model of the European transmission system, of about 15000 buses, and with about 11000 contingencies. [less ▲] Detailed reference viewed: 192 (2 ULg)Redispatching active and reactive powers using a limited number of control actions Capitanescu, Florin ; Wehenkel, Louis in IEEE Transactions on Power Systems (2011), 26(3), 1221-1230 This paper deals with some essential open questions in the field of optimal power flow (OPF) computations, namely: the limitation of the number of controls allowed to move, the trade-off between the ... [more ▼] This paper deals with some essential open questions in the field of optimal power flow (OPF) computations, namely: the limitation of the number of controls allowed to move, the trade-off between the objective function and the number of controls allowed to move, the computation of the minimum number of control actions needed to satisfy constraints, and the determination of the sequence of control actions to be taken by the system operator in order to achieve its operation goal. To address these questions, we propose approaches which rely on the computation of sensitivities of the objective function and inequality constraints with respect to control actions. We thus determine a subset of controls allowed to move in the OPF, by solving a sensitivity-based mixed integer linear programming (MILP) problem. We study the performances of these approaches on three test systems (of 60, 118, and 618 buses) and by considering three different OPF problems important for a system operator in emergency and/or in normal states, namely the removal of thermal congestions, the removal of bus voltage limits violation, and the reduction of the active power losses. [less ▲] Detailed reference viewed: 138 (13 ULg)Two-level Mixtures of Markov Trees Schnitzler, François ; Wehenkel, Louis Poster (2011, June 29) We study algorithms for learning Mixtures of Markov Trees for density estimation. There are two approaches to build such mixtures, which both exploit the interesting scaling properties of Markov Trees. We ... [more ▼] We study algorithms for learning Mixtures of Markov Trees for density estimation. There are two approaches to build such mixtures, which both exploit the interesting scaling properties of Markov Trees. We investigate whether the maximum likelihood and the variance reduction approaches can be combined together by building a two level Mixture of Markov Trees. Our experiments on synthetic data sets show that this two-level model outperforms the maximum likelihood one. [less ▲] Detailed reference viewed: 36 (10 ULg)High-density lipoprotein proteome dynamics in human endotoxemia. ; Geurts, Pierre ; et al in Proteome science (2011), 9(1), 34 BACKGROUND: A large variety of proteins involved in inflammation, coagulation, lipid-oxidation and lipid metabolism have been associated with high-density lipoprotein (HDL) and it is anticipated that ... [more ▼] BACKGROUND: A large variety of proteins involved in inflammation, coagulation, lipid-oxidation and lipid metabolism have been associated with high-density lipoprotein (HDL) and it is anticipated that changes in the HDL proteome have implications for the multiple functions of HDL. Here, SELDI-TOF mass spectrometry (MS) was used to study the dynamic changes of HDL protein composition in a human experimental low-dose endotoxemia model. Ten healthy men with low HDL cholesterol (0.7+/-0.1 mmol/L) and 10 men with high HDL cholesterol levels (1.9+/-0.4 mmol/L) were challenged with endotoxin (LPS) intravenously (1 ng/kg bodyweight). We previously showed that subjects with low HDL cholesterol are more susceptible to an inflammatory challenge. The current study tested the hypothesis that this discrepancy may be related to differences in the HDL proteome. RESULTS: Plasma drawn at 7 time-points over a 24 hour time period after LPS challenge was used for direct capture of HDL using antibodies against apolipoprotein A-I followed by subsequent SELDI-TOF MS profiling. Upon LPS administration, profound changes in 21 markers (adjusted p-value < 0.05) were observed in the proteome in both study groups. These changes were observed 1 hour after LPS infusion and sustained up to 24 hours, but unexpectedly were not different between the 2 study groups. Hierarchical clustering of the protein spectra at all time points of all individuals revealed 3 distinct clusters, which were largely independent of baseline HDL cholesterol levels but correlated with paraoxonase 1 activity. The acute phase protein serum amyloid A-1/2 (SAA-1/2) was clearly upregulated after LPS infusion in both groups and comprised both native and N-terminal truncated variants that were identified by two-dimensional gel electrophoresis and mass spectrometry. Individuals of one of the clusters were distinguished by a lower SAA-1/2 response after LPS challenge and a delayed time-response of the truncated variants. CONCLUSIONS: This study shows that the semi-quantitative differences in the HDL proteome as assessed by SELDI-TOF MS cannot explain why subjects with low HDL cholesterol are more susceptible to a challenge with LPS than those with high HDL cholesterol. Instead the results indicate that hierarchical clustering could be useful to predict HDL functionality in acute phase responses towards LPS. [less ▲] Detailed reference viewed: 47 (8 ULg)Decoding Directed Brain Activity in fMRI using Support Vector Machines and Gaussian Processes Schrouff, Jessica ; Kussé, Caroline ; Wehenkel, Louis et al Poster (2011, June 26) Predicting a particular cognitive state from a specific pattern of fMRI voxel values is still a methodological challenge. Decoding brain activity is usually performed in highly controlled experimental ... [more ▼] Predicting a particular cognitive state from a specific pattern of fMRI voxel values is still a methodological challenge. Decoding brain activity is usually performed in highly controlled experimental paradigms characterized by a series of distinct states induced by a temporally constrained experimental design. In more realistic conditions, the number, sequence and duration of mental states are unpredictably generated by the individual, resulting in complex and imbalanced fMRI data sets. This study tests the classification of brain activity, acquired on 16 volunteers using fMRI, during mental imagery, a condition in which the number and duration of mental events were not externally imposed but self-generated. To deal with these issues, two classification techniques were considered (Support Vector Machines, SVM, and Gaussian Processes, GP), as well as different feature extraction methods (General Linear Model, GLM and SVM). These techniques were combined in order to identify the procedures leading to the highest accuracy measures. Our results showed that 12 data sets out of 16 could be significantly modeled by either SVM or GP. Model accuracies tended to be related to the degree of imbalance between classes and to task performance of the volunteers. We also conclude that the GP technique tends to be more robust than SVM to model unbalanced data sets. [less ▲] Detailed reference viewed: 63 (12 ULg)Situation Adapted Display of Information for Operating Very Large Interconnected Grids ; ; Capitanescu, Florin et al in Power Tech Conference (2011, June) This paper addresses the problem of security monitoring and situation awareness in very large interconnected transmission systems, with particular emphasis on the continental European grid. An innovative ... [more ▼] This paper addresses the problem of security monitoring and situation awareness in very large interconnected transmission systems, with particular emphasis on the continental European grid. An innovative approach of situation adapted displaying of the operational state of a large network is proposed, which is based on state-of-the-art cognitive methods, is able to be processed online and makes the displays available to all participating transmission system operators. The proposed approach for an improved situation awareness of different security threats such as wide-area split of the system and cascading overload utilise data of a very large simulator model of the continental European transmission system of about 15,000 buses. [less ▲] Detailed reference viewed: 73 (1 ULg)A new MPC scheme for damping wide-area electromechanical oscillations in power systems Wang, Da ; ; Wehenkel, Louis in the 2011 IEEE PES PowerTech (2011, June) This paper introduces a new Model Predictive Control (MPC) scheme to damp wide-area electromechanical oscillations. The proposed MPC controller, based on a linearized discrete-time state space model ... [more ▼] This paper introduces a new Model Predictive Control (MPC) scheme to damp wide-area electromechanical oscillations. The proposed MPC controller, based on a linearized discrete-time state space model, calculates the optimal input sequence for local damping controllers over a chosen time horizon by solving a quadratic programming problem. Local controllers considered include: Power Systems Stabilizers (PSSs), Thyristor Controlled Series Compensators (TCSCs) and Static Var Compensators (SVCs). The MPC scheme is realized and tested first in ideal conditions (complete state observability and controllability, neglecting communication and computing delays). Next, the effects of state-estimation errors, computation and communication delays, and of the number and type of available local damping controllers are studied in order to assess the versatility of this scheme. Realistic simulations are carried out using a 16 generators, 70 bus test system. [less ▲] Detailed reference viewed: 26 (3 ULg) |
