Browse ORBi by ORBi project

- Background
- Content
- Benefits and challenges
- Legal aspects
- Functions and services
- Team
- Help and tutorials

Flexible estimation in cure survival models using Bayesian P-splines ; Lambert, Philippe in Computational Statistics & Data Analysis (2016), 93 In the analysis of survival data, it is usually assumed that any unit will experience the event of interest if it is observed for a sufficiently long time. However, it can be explicitly assumed that an ... [more ▼] In the analysis of survival data, it is usually assumed that any unit will experience the event of interest if it is observed for a sufficiently long time. However, it can be explicitly assumed that an unknown proportion of the population under study will never experience the monitored event. The promotion time model, which has a biological motivation, is one of the survival models taking this feature into account. The promotion time model assumes that the failure time of each subject is generated by the minimum of N independent latent event times with a common distribution independent of N. An extension which allows the covariates to influence simultane- ously the probability of being cured and the latent distribution is presented. The latent distribution is estimated using a flexible Cox proportional hazard model where the logarithm of the baseline hazard function is specified using Bayesian P-splines. Introducing covariates in the latent distribution implies that the population hazard function might not have a proportional hazard structure. However, the use of P- splines provides a smooth estimation of the population hazard ratio over time. The identification issues of the model are discussed and a restricted use of the model when the follow up of the study is not sufficiently long is proposed. The accuracy of our methodology is evaluated through a simulation study and the model is illustrated on data from a Melanoma clinical trial. [less ▲] Detailed reference viewed: 27 (4 ULg)Smooth semiparametric and nonparametric Bayesian estimation of bivariate densities from bivariate histogram data Lambert, Philippe in Computational Statistics & Data Analysis (2011), 55 Penalized B-splines combined with the composite link model are used to estimate a bivariate density from a histogram with wide bins. The goals are multiple: they include the visualization of the ... [more ▼] Penalized B-splines combined with the composite link model are used to estimate a bivariate density from a histogram with wide bins. The goals are multiple: they include the visualization of the dependence between the two variates, but also the estimation of derived quantities like Kendall’s tau, conditional moments and quantiles. Two strategies are proposed: the first one is semiparametric with flexible margins modeled using B-splines and a parametric copula for the dependence structure; the second one is nonparametric and is based on Kronecker products of the marginal B-spline bases. Frequentist and Bayesian estimations are described. A large simulation study quantifies the performances of the two methods under different dependence structures and for varying strengths of dependence, sample sizes and amounts of grouping. It suggests that Schwarz’s BIC is a good tool for classifying the competing models. The density estimates are used to evaluate conditional quantiles in two applications in social and in medical sciences. [less ▲] Detailed reference viewed: 41 (6 ULg)RelaxMCD: smooth optimisation for the Minimum Covariance Determinant estimator Schyns, Michael ; Haesbroeck, Gentiane ; in Computational Statistics & Data Analysis (2010), 54(4), 843-857 The Minimum Covariance Determinant (MCD) estimator is a highly robust procedure for estimating the center and shape of a high dimensional data set. It consists of determining a subsample of h points out ... [more ▼] The Minimum Covariance Determinant (MCD) estimator is a highly robust procedure for estimating the center and shape of a high dimensional data set. It consists of determining a subsample of h points out of n which minimizes the generalized variance. By definition, the computation of this estimator gives rise to a combinatorial optimization problem, for which several approximative algorithms have been developed. Some of these approximations are quite powerful, but they do not take advantage of any smoothness in the objective function. In this paper, focus is on the approach outlined in a general framework in Critchley et al. (2009) and which transforms any discrete and high dimensional combinatorial problem of this type into a continuous and low-dimensional one. The idea is to build on the general algorithm proposed by Critchley et al. (2009) in order to take into account the particular features of the MCD methodology. More specifically, both the adaptation of the algorithm to the specific MCD target function as well as the comparison of this “specialized” algorithm with the usual competitors for computing MCD are the main goals of this paper. The adaptation focuses on the design of “clever” starting points in order to systematically investigate the search domain. Accordingly, a new and surprisingly efficient procedure based on the well known k-means algorithm is constructed. The adapted algorithm, called RelaxMCD, is then compared by means of simulations and examples with FASTMCD and the Feasible Subset Algorithm, both benchmark algorithms for computing MCD. As a by-product, it is shown that RelaxMCD is a general technique encompassing the two others, yielding insight about their overall good performance. [less ▲] Detailed reference viewed: 186 (41 ULg)Goodness-of-fit tests for the error distribution in nonparametric regression Heuchenne, Cédric ; in Computational Statistics & Data Analysis (2010), 54 Detailed reference viewed: 48 (21 ULg)Bayesian density estimation from grouped continuous data Lambert, Philippe ; in Computational Statistics & Data Analysis (2009), 53 Grouped data occur frequently in practice, either because of limited resolution of instruments, or because data have been summarized in relatively wide bins. A combination of the composite link model with ... [more ▼] Grouped data occur frequently in practice, either because of limited resolution of instruments, or because data have been summarized in relatively wide bins. A combination of the composite link model with roughness penalties is proposed to estimate smooth densities from such data in a Bayesian framework. A simulation study is used to evaluate the performances of the strategy in the estimation of a density, of its quantiles and rst moments. Two illustrations are presented: the rst one involves grouped data of lead concentrations in the blood and the second one the number of deaths due to tuberculosis in The Netherlands in wide age classes. [less ▲] Detailed reference viewed: 32 (10 ULg)Archimedean copula estimation using Bayesian splines smoothing techniques Lambert, Philippe in Computational Statistics & Data Analysis (2007), 51(12), 6307-6320 Copulas enable to specify multivariate distributions with given marginals. Various parametric proposals were made in the literature for these quantities, mainly in the bivariate case. They can be ... [more ▼] Copulas enable to specify multivariate distributions with given marginals. Various parametric proposals were made in the literature for these quantities, mainly in the bivariate case. They can be systematically derived from multivariate distributions with known marginals, yielding e.g. the normal and the Student copulas. Alternatively, one can restrict his/her interest to a sub-family of copulas named Archimedean. They are characterized by a strictly decreasing convex function on (0, 1) which tends to +infinity at 0 (when strict) and which is 0 at 1. A ratio approximation of the generator and of its first derivative using B-splines is proposed and the associated parameters estimated using Markov chains Monte Carlo methods. The estimation is reasonably quick. The fitted generator is smooth and parametric. The generated chain(s) can be used to build "credible envelopes" for the above ratio function and derived quantities such as Kendall's tau, posterior predictive probabilities, etc. Parameters associated to parametric models for the marginals can be estimated jointly with the copula parameters. This is an interesting alternative to the popular two-step procedure which assumes that the regression parameters are fixed known quantities when it comes to copula parameter(s) estimation. A simulation study is performed to evaluate the approach. The practical utility of the method is illustrated by a basic analysis of the dependence structure underlying the diastolic and the systolic blood pressures in male subjects. (C) 2007 Elsevier B.V. All rights reserved. [less ▲] Detailed reference viewed: 41 (5 ULg)Robust specification of the roughness penalty prior distribution in spatially adaptive Bayesian P-splines models ; Lambert, Philippe in Computational Statistics & Data Analysis (2007), 51 The potential important role of the prior distribution of the roughness penalty parameter in the resulting smoothness of Bayesian Psplines models is considered. The recommended specification for that ... [more ▼] The potential important role of the prior distribution of the roughness penalty parameter in the resulting smoothness of Bayesian Psplines models is considered. The recommended specification for that distribution yields models that can lack flexibility in specific circumstances. In such instances, these are shown to correspond to a frequentist P-splines model with a predefined and severe roughness penalty parameter, an obviously undesirable feature. It is shown that the specification of a hyperprior distribution for one parameter of that prior distribution provides the desired flexibility. Alternatively, a mixture prior can also be used. An extension of these two models by enabling adaptive penalties is provided. The posterior of all the proposed models can be quickly explored using the convenient Gibbs sampler. [less ▲] Detailed reference viewed: 28 (3 ULg)Implementing the Bianco and Yohai estimator for logistic regression ; Haesbroeck, Gentiane in Computational Statistics & Data Analysis (2003), 44(1-2), 273-295 A fast and stable algorithm to compute a highly robust estimator for the logistic regression model is proposed. A criterium. for the existence of this estimator at finite samples is derived and the ... [more ▼] A fast and stable algorithm to compute a highly robust estimator for the logistic regression model is proposed. A criterium. for the existence of this estimator at finite samples is derived and the problem of the selection of an appropriate loss function is discussed. It is shown that the loss function can be chosen such that the robust estimator exists if and only if the maximum likelihood estimator exists. The advantages of using a weighted version of this estimator are also considered. Simulations and an example give further support for the good performance of the implemented estimators. (C) 2003 Elsevier B.V. All rights reserved. [less ▲] Detailed reference viewed: 52 (5 ULg)Wavelet denoising of Poisson-distributed data and applications. Charles, Catherine ; in Computational Statistics & Data Analysis (2003), 43 Detailed reference viewed: 22 (0 ULg)An Easy Way to Increase the Finite-Sample Efficiency of the Resampled Minimum Volume Ellipsoid Estimator Croux, Christophe ; Haesbroeck, Gentiane in Computational Statistics & Data Analysis (1997), 25 In a robust analysis, the minimum volume ellipsoid (MVE) estimator is very often used to estimate both multivariate location and scatter. The MVE estimator for the scatter matrix is defined as the ... [more ▼] In a robust analysis, the minimum volume ellipsoid (MVE) estimator is very often used to estimate both multivariate location and scatter. The MVE estimator for the scatter matrix is defined as the smallest ellipsoid covering half of the observations, while the MVE location estimator is the midpoint of that ellipsoid. The MVE estimators can be computed by minimizing a certain criterion over a high-dimensional space. In practice, one mostly uses algorithms based on minimization of the objective function over a sequence of trial estimates. One of these estimators uses a resampling scheme, and yields the (p + 1)-subset estimator. In this note, we show how this estimator can easily be adapted, yielding a considerable increase of statistical efficiency at finite samples. This gain in precision is also observed when sampling from contaminated distributions, and it becomes larger when the dimension increases. Therefore, we do not need more computation time nor do we lose robustness properties. Moreover, only a few lines have to be added to existing computer programs. The key idea is to average over several trials close to the optimum, instead of just picking out the trial with the lowest value for the objective function. The resulting estimator keeps the equivariance and robustness properties of the original MVE estimator. This idea can also be applied to several other robust estimators, including least-trimmed-squares regression. [less ▲] Detailed reference viewed: 24 (5 ULg) |
||