Quels modèles de mesure pour l’évaluation des compétences ? ; ; et al Conference (2012, January 12) De plus en plus de dispositifs d’évaluation (évaluation formative, monitoring, épreuves d’orientation scolaire, …) ont recours aux Modèles de Réponse à l’Item pour estimer les compétences scolaires des ... [more ▼] De plus en plus de dispositifs d’évaluation (évaluation formative, monitoring, épreuves d’orientation scolaire, …) ont recours aux Modèles de Réponse à l’Item pour estimer les compétences scolaires des élèves au départ de tests papier-crayon ou d’épreuves proposées sur ordinateur. Ces modèles de mesure – dont les propriétés laissaient entrevoir des progrès et des avantages considérables – se sont d’abord imposés comme standards de référence dans les enquêtes internationales pour être ensuite transposés dans des contextes locaux, notamment pour l’évaluation externe des compétences des élèves. Cependant, la technicité des MRI, leur nature stochastique et leurs composantes statistiques relativement complexes n’ont pas toujours permis une approche cartésienne et prudente de leur exploitation. On a cru pendant longtemps que la transposition de ces modèles à l’évaluation des compétences était naturelle et ne souffrait d’aucun obstacle fondamental. Mais les objectifs, les caractéristiques et les exigences de l’évaluation des compétences sont parfois bien éloignés des préoccupations internationales. Nous nous proposons donc, au départ des résultats empiriques et/ou des réflexions théoriques des intervenants de la table ronde, de déterminer dans quelles conditions l’application des MRI à l’évaluation des compétences est pertinente? Dans cette perspective, plusieurs critères d’analyse seront envisagés : la nature théorique des modèles (les MRI sont-ils adaptés pour atteindre les objectifs des dispositifs d’évaluation des compétences?), les caractéristiques psychométriques des évaluations (les conditions d’application des MRI sont-elles toujours remplies dans le cadre des évaluations de compétences?) et les méthodes de validation des modèles (les méthodes pour évaluer l’adéquation des modèles sont-elles adaptées à l’évaluation des compétences?). Pour ne pas conclure, les intervenants s’interrogeront sur l’opportunité de développer de nouveaux modèles de mesure pour l’évaluation des compétences. Trois questions principales structureront les échanges : Question 1: Quelles sont les caractéristiques psychométriques des dispositifs d’évaluation des compétences ? Question 2: Les Modèles de Réponse à l’Item sont-ils adaptés pour évaluer les compétences ? Question 3: Est-il opportun de développer de nouveaux modèles pour l’évaluation des compétences ? Si oui, quelles devraient en être les caractéristiques ? [less ▲] Detailed reference viewed: 123 (18 ULg)Taking atypical response patterns into account: a multidimensional measurement model from item response theory ; Magis, David ; et al in Simon, Marielle; Ercikan, Kadriye; Rousseau, Michel (Eds.) Improving large scale assessment in education: Theory, issues, and practice (2012) Detailed reference viewed: 24 (4 ULg)A short introduction into Bayesian evaluation of informative hypotheses as an alternative to exploratory comparisons of multiple group means ; ; et al in Tutorials in Quantitative Methods for Psychology (2012), 8 This paper presents an introduction into Bayesian evaluation of informative hypotheses, that is, hypotheses representing explicit expectations about multiple group means (Hoijtink, 2011; Hoijtink ... [more ▼] This paper presents an introduction into Bayesian evaluation of informative hypotheses, that is, hypotheses representing explicit expectations about multiple group means (Hoijtink, 2011; Hoijtink, Klugkist & Boelen, 2008). The authors begin by discussing some limits of exploratory methods before presenting a non-technical overview of the Bayesian approach. References are provided for the technical details. A particular effort is made to illustrate the method with an example from psychology. References to software, more elaborate textbooks and tutorials enable researchers to apply this novel method to their own data. [less ▲] Detailed reference viewed: 38 (2 ULg)Angoff’s Delta method revisited: improving the DIF detection under small samples Magis, David ; in British Journal of Mathematical & Statistical Psychology (2012), 65 Most of the methods for detecting differential item functioning (DIF) are suitable when the sample sizes are sufficiently large to validate the null statistical distributions. There is no guarantee ... [more ▼] Most of the methods for detecting differential item functioning (DIF) are suitable when the sample sizes are sufficiently large to validate the null statistical distributions. There is no guarantee, however, that they still perform adequately when there are few respondents in the focal group or in both the reference and the focal group. Angoff’s Delta plot is a potentially useful alternative for small-sample DIF investigation, but it suffers from improper DIF flagging criterion. The purpose of this paper is to improve this classification rule under mild statistical assumptions. This improvement yields a modified Delta plot with an adjusted DIF flagging criterion for small samples. A simulation study was conducted to compare the modified Delta plot to both the classical Delta plot approach and the Mantel-Haenszel method. It is concluded that the modified Delta plot is consistently less conservative and more powerful than the usual Delta plot, and is also less conservative and more powerful than the Mantel-Haenszel method as long as at least one group of respondents is small. [less ▲] Detailed reference viewed: 34 (4 ULg)Random generation of response patterns under computerized adaptive testing with the R package catR Magis, David ; in Journal of Statistical Software (2012), 48 This paper outlines a computerized adaptive testing (CAT) framework and presents an R package for the simulation of response patterns under CAT procedures. This package, called catR, requires a bank of ... [more ▼] This paper outlines a computerized adaptive testing (CAT) framework and presents an R package for the simulation of response patterns under CAT procedures. This package, called catR, requires a bank of items, previously calibrated according to the four-parameter logistic (4PL) model or any simpler logistic model. The package proposes several methods to select the early test items, several methods for next item selection, di erent estimators of ability (maximum likelihood, Bayes modal, expected a posteriori, weighted likelihood), and three stopping rules (based on the test length, the precision of ability estimates or the classi cation of the examinee). After a short description of the di erent steps of a CAT process, the commands and options of the catR package are presented and practically illustrated. [less ▲] Detailed reference viewed: 42 (6 ULg)A robust outlier approach to prevent Type I error inflation in DIF Magis, David ; in Educational & Psychological Measurement (2012), 72 The identification of differential item functioning (DIF) is often performed by means of statistical approaches that consider the raw scores as proxys for the ability trait level. One of the most popular ... [more ▼] The identification of differential item functioning (DIF) is often performed by means of statistical approaches that consider the raw scores as proxys for the ability trait level. One of the most popular approaches, the Mantel-Haenszel (MH) method, belongs to this category. However, replacing the ability level by the simple raw score is a source of potential Type I error inflation, especially in the presence of DIF but also when DIF is absent and in the presence of impact. The purpose of this paper is to present an alternative statistical inference approach based on the same measure of DIF but such that the Type I error inflation is prevented. The key notion is that for DIF items, the measure has an outlying value which can be identified as such with inference tools from robust statistics. Although we use the MH log-odds ratio as a statistic, the inference is different. A simulation study is performed to compare the robust statistical inference with the classical inference method, both based on the MH statistic. As expected the Type I error rate inflation is avoided with the robust approach, while the power of the two methods is similar. [less ▲] Detailed reference viewed: 21 (5 ULg)A didactic presentation of Snijders’ lz* index of person fit with emphasis on response model selection and ability estimation Magis, David ; ; in Journal of Educational & Behavioral Statistics (2012), 37 This paper focuses on two likelihood-based indices of person fit, the index lz (Drasgow, Levine & Williams, 1985) and the Snijders’ modified index lz* (Snijders, 2001). The first one is commonly used in ... [more ▼] This paper focuses on two likelihood-based indices of person fit, the index lz (Drasgow, Levine & Williams, 1985) and the Snijders’ modified index lz* (Snijders, 2001). The first one is commonly used in practical assessment of person fit, although its asymptotic standard normal distribution is not valid when true abilities are replaced by sample ability estimates. The lz* index is a generalization of lz which corrects for this sampling variability. Surprisingly, it is not yet popular in the psychometric and educational assessment community. Moreover, there is some ambiguity about which type of item response model and ability estimation method can be used to compute the lz* index. The purpose of this paper is to present the index lz* in a simple and didactic approach. Starting from the relationship between lz and lz*, we develop the framework according to the type of logistic IRT model and the likelihood-based estimators of ability. The practical calculation of lz* is illustrated by analyzing a real data set about language skill assessment. [less ▲] Detailed reference viewed: 51 (6 ULg)On the relationships between Jeffreys modal and weighted likelihood estimation of ability under logistic IRT models Magis, David ; in Psychometrika (2012), 77 This paper focuses on two estimators of ability with logistic item response theory models: the Bayesian modal (BM) estimator and the weighted likelihood (WL) estimator. For the BM estimator, Jeffreys’ ... [more ▼] This paper focuses on two estimators of ability with logistic item response theory models: the Bayesian modal (BM) estimator and the weighted likelihood (WL) estimator. For the BM estimator, Jeffreys’ prior distribution is considered, and the corresponding estimator is referred to as the Jeffreys modal (JM) estimator. It is established that under the three-parameter logistic model, the JM estimator returns larger estimates than the WL estimator. Several implications of this result are outlined. [less ▲] Detailed reference viewed: 32 (4 ULg)On the difficulty of relational concepts among participants with Down syndrome ; Magis, David ; in Research in Developmental Disabilities (2012), 33 The aim of the study was to compare the difficulty of relational concepts among participants with and without intellectual disability. The French versions of the Boehm Tests of Basic Concepts Third ... [more ▼] The aim of the study was to compare the difficulty of relational concepts among participants with and without intellectual disability. The French versions of the Boehm Tests of Basic Concepts Third Edition (Preschool and Kindergarten to 2nd grade) were administered to three groups of 47 participants individually matched on their total raw score on the tests. The first group comprised participants with intellectual disability of undifferentiated etiology, the second, participants with Down syndrome and the third, typical children. Item analyses using the transformed item difficulties method to detect differential item functioning across groups showed that the groups' rank-orders of item difficulty were highly similar. It is concluded that, all things being equal, relational concepts are of comparable difficulty and follow a similar sequence of development whatever the cognitive and etiological status of participants. Methodological and theoretical implications of these findings are discussed. [less ▲] Detailed reference viewed: 7 (3 ULg)A modified item response model to detect and correct for cheating Magis, David ; Conference (2011, October 13) This talk focuses on the identification of respondents with cheating behaviour to educational or psychometric tests. Cheating behaviour is often observed when some examinee with low ability level, tries ... [more ▼] This talk focuses on the identification of respondents with cheating behaviour to educational or psychometric tests. Cheating behaviour is often observed when some examinee with low ability level, tries to get correct responses (in some way or another) to get higher grades. This results usually in a response pattern that does not fit the underlying item response theory (IRT) model and severely affects the estimation of ability level. Until now, several indices of misfit have been proposed, and they seem accurate for detecting cheating (as well as other misfitting behaviours). Unfortunately, none of these indexes permit: (a) to determine which type of behaviour is encountered, and (b) to propose a corrected estimation of the examinee’s ability level. The purpose of this talk is to present Raîche’s multidimensional model as an extension of the usual IRT models but incorporating additional person parameters to characterize misfitting behaviour. Emphasis is put on a simple model with personal pseudo-guessing variation to detect trends in cheating. Results from a simulation study indicate that, in the absence of cheating, the multidimensional model returns similar estimates than the traditional IRT models, while in the presence of cheating, the person pseudo-guessing parameter is an accurate index of misfit and the person ability estimates are less biased than its traditional counterpart. [less ▲] Detailed reference viewed: 73 (2 ULg)Robust DIF analysis Magis, David Conference (2011, October 11) This talk focuses on the issue of differential item functioning in psychometrics. An item is said to function differently if examinees from different groups, but with the same ability levels, have ... [more ▼] This talk focuses on the issue of differential item functioning in psychometrics. An item is said to function differently if examinees from different groups, but with the same ability levels, have nevertheless different probabilities of endorsing this item. Many methods were proposed to detect DIF, either based on statistical methods (such as logistic regression) or on IRT models. The talk is divided into three parts. In the first part, a brief overview of the DIF framework and methods is proposed. In the second part, a recent, conceptually different approach to DIF will be introduced. It consists basically in flagging as DIF, the items that are outlying withb respect to other items. This approach is based on robust statistical tools for outlier identification. It cancels the issue of Type I error inflation and the need for purification of the anchor set. In the third part, it is briefly outlined how this approach easily extends to the simultaneous comparison of more than two groups of examinees; multivariate robust estimators of location and scale are required, but their use might overcome the standard methods for simultaneous pairwise comparisons. [less ▲] Detailed reference viewed: 20 (3 ULg)Basic use and programming in the R language Magis, David Scientific conference (2011, August 22) R is an open-source statistical software and of increasing interest for the scientific community. It can be installed on all usual platforms (Windows, Linux, Mac) and is suitable for many statistical ... [more ▼] R is an open-source statistical software and of increasing interest for the scientific community. It can be installed on all usual platforms (Windows, Linux, Mac) and is suitable for many statistical analyzes, graphical representation, intensive computation and programming, among others. New functionalities and packages are developed regularly and the R web community ensures a constant development and increasing usefulness of R, not only for pure statisticians but also for practitioners from various fields (psychology, education, biometrics, econometrics, etc.). Its main drawback, however, is that it requires preliminary training and practice to handle the various aspects of the software and to become efficient in using R. The purpose of this workshop is to present the basic aspects of R, from the installation to the basic statistical analysis. The following themes will be developed: installation, importing data, R objects and structures, basic computations, graphics, packages and libraries, programming in R, basic statistical modeling and testing. The workshop will be practice-oriented, with sessions mixing both “theoretical” presentations and practical applications. Live demonstrations of R will also be performed. Participants are invited to bring their laptop and to take part to the practical exercises. Slides, R scripts and illustrative data sets will be provided. [less ▲] Detailed reference viewed: 54 (13 ULg)A Bayesian person fit evaluation for polytomous response data ; ; et al Conference (2011, July 19) Studies about Person-fit are generally produced under a frequentist approach. For example, Meijer & Sijtsma (2001) discussed many parametric and non-parametric indexes in their review on this topic ... [more ▼] Studies about Person-fit are generally produced under a frequentist approach. For example, Meijer & Sijtsma (2001) discussed many parametric and non-parametric indexes in their review on this topic. However, it exists also few papers about the investigation of person-fit in a Bayesian context (e.g. Glas & Meijer, 2003; Van Der Linden & Guo, 2008). In this talk, we present a new method based on the evaluation of informative hypotheses using the Bayes factor. This approach is non-parametric in nature and can be applied to a large variety of situations and many types of data. Here, we focus on the use of Bayesian person-fit methods that can be used with polytomous response data. This presentation is divided in two sections. First, we present the technical aspects of this approach by discussing some hypotheses of interest, the nature of the prior and the nature of the posterior. Second, we present results from a real data matrix. The first analysis shows that Bayesian person-fit evaluation is efficient and can be easily applied to small data matrices. [less ▲] Detailed reference viewed: 33 (2 ULg)Application de l’indice Lz pour l’élimination de données de recherche en langues ; ; et al Conference (2011, May 10) L’indice de détection de patrons de réponses inappropriés Lz (Drasgow, Levine et Williams, 1985) a été appliqué à un test d’habileté en lecture en langue seconde de 64 items (Pichette, 2008) mené auprès ... [more ▼] L’indice de détection de patrons de réponses inappropriés Lz (Drasgow, Levine et Williams, 1985) a été appliqué à un test d’habileté en lecture en langue seconde de 64 items (Pichette, 2008) mené auprès de 171 étudiants universitaires. L’objectif principal de cette recherche est d’exclure les participants au profil de réponses aberrant. Des coefficients de validité et de fiabilité sont comparés entre les données éliminées sur la base de l’intuition des chercheurs et les éliminations suggérées par l’indice Lz. Cette approche permet de détecter des participants additionnels dont le patron de réponses est non représentatif de l’habileté qu’on désire mesurer. [less ▲] Detailed reference viewed: 11 (1 ULg)Etude comparative de l'indice Lz dans le cadre de données dichotomique et de données polytomique ; ; Magis, David et al Conference (2011, May 10) L’indice de détection de patrons de réponses inappropriés Lz (Drasgow, Levine et Williams, 1985) est fort probablement l’indice le plus connu et le plus utilisé par les patriciens et les chercheurs en ... [more ▼] L’indice de détection de patrons de réponses inappropriés Lz (Drasgow, Levine et Williams, 1985) est fort probablement l’indice le plus connu et le plus utilisé par les patriciens et les chercheurs en mesure. Ainsi, on a surtout étudié le comportement de Lz lorsqu’il est utilisé avec des données dichotomique mais il est aussi possible de l’analyser avec des données polytomique. Dans le cadre de cette recherche, nous proposons de comparer les résultats des indices Lz dichotomique et Lz polytomique. Pour ce faire, (i) nous utiliserons l’indice Kappa de Cohen pour comparer si les patrons de réponses détectés entre les deux versions de l’indice sont les mêmes et (ii) nous étudierons leur distribution respectives. [less ▲] Detailed reference viewed: 13 (1 ULg)Recent R packages in psychometrics ; ; Magis, David Scientific conference (2011, April 06) Detailed reference viewed: 10 (2 ULg)The difR package, a toolbox for the identification dichotomous differential item functioning Magis, David Conference (2011, February 25) The purpose of this talk is to briefly introduce the R package difR to identify differential item functioning (DIF) among dichotomously scored items. The presentation is organized in three points. First ... [more ▼] The purpose of this talk is to briefly introduce the R package difR to identify differential item functioning (DIF) among dichotomously scored items. The presentation is organized in three points. First, the general framework of DIF is outlined and the most known methods are presented succinctly. Second, the main functionalities of the difR package are described. Third, a practical application of difR is performed by a “live” analysis of a real example with several DIF methods. Future developments and objectives are discussed to conclude the talk. The difR package was jointly developed by Sébastien Béland (Université du Québec à Montréal, Canada), Francis Tuerlinckx (K. U. Leuven, Belgium) and Paul De Boeck (University of Amsterdam, The Netherlands). [less ▲] Detailed reference viewed: 42 (5 ULg)Une solution numérique au test de Cattell pour déterminer le nombre de composantes principales à retenir ; Magis, David ; et al in Raîche, Gilles; Paquette-Côté, Karine; Magis, David (Eds.) Des mécanismes pour assurer la validité de l’interprétation de la mesure en éducation. Tome 1 : la mesure. (2011) Detailed reference viewed: 28 (3 ULg)Identification of differential item functioning in multiple-group settings: A multivariate outlier detection approach Magis, David ; in Multivariate Behavioral Research (2011), 46 This paper focuses on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space ... [more ▼] This paper focuses on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is proposed to identify DIF items as outliers in the multivariate space. For low dimensionalities, up to two three groups, also a simple graphical tool is derived. We illustrate our approach with a re-analysis of data from Kim, Cohen, and Park (1995) on using calculators for a mathematics test. [less ▲] Detailed reference viewed: 46 (5 ULg)A generalized logistic regression procedure to detect differential item functioning among multiple groups Magis, David ; ; et al in International Journal of Testing (2011), 11 We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual ... [more ▼] We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence of uniform DIF, non uniform DIF, or both. This generalized procedure is compared to other existing DIF methods for multiple groups with a real data set on language skill assessment. Emphasis is put on the flexibility, completeness and computational easiness of the generalized method. [less ▲] Detailed reference viewed: 36 (2 ULg) |
