Centre for Environment, Social & Economic Research (CESER)
Yes (verified by ORBi)
[en] Regression ; Data structure ; Prediction ; Simulation
[en] Monte Carlo simulation methods was used to study the effects of the data structure on the quality of the predictions in linear multiple regression. Five hundred forty (540) data files were generated of which the number of variables, R-square, the collinearity between the explanatory variables and the index of coefficient, that measures the importance of the explanatory variables in the model, were controlled. Predictions were influenced by the theoretical value of R-square, the method used to establish the model and, to a lesser extent, the collinearity between the explanatory variables. The determination of the minimal sample size which leads to predicted values better than those obtained by the mean of the dependant variable indicated that this size depends on the number of the explanatory variables, the theretical value of the R-square and the method used to establish the model. The minimal sample size increases with the models without variables selection and gradually decreases with the intensity of the selection.