References of "Magis, David"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailSome formulas for the standard error of the weighted likelihood estimator of ability with small psychometric tests
Magis, David ULg

Conference (2012, October 26)

The weighted likelihood estimator of ability (WLE, [3]) was introduced as an asymptotically unbiased estimator of ability in item response theory (IRT) models. Moreover, its standard error was shown to be ... [more ▼]

The weighted likelihood estimator of ability (WLE, [3]) was introduced as an asymptotically unbiased estimator of ability in item response theory (IRT) models. Moreover, its standard error was shown to be asymptotically equal to that of the maximum likelihood (ML) estimator [2]. Although this asymptotic framework is most often encountered in psychometric and educational studies, there are several practical applications for which an "exact" formula for the standard error would be useful. For instance, such a formula would certainly be convenient at the early steps of a computerized adaptive test (CAT), whenever only a few items are administered. The purpose of this paper is to derive two possible formulas for the standard error of the WLE, by starting from the objective function to be optimized and deriving the standard error in a similar approach of the ML framework (see e.g., [1]). The two potential formulas are then compared through both, a small simulation study and a practical analysis with realistic, yet arti cial data. It is concluded that one of the formulas must be preferred to the other, both for mathematical consistency and on the basis of the simulated results. References [1] Baker, F. B., & Kim, S.-H. (2004). Item Response Theory: Parameter Estimation Techniques (2nd edition). New York: Marcel Dekker. [2] Lord, F. M. (1980) Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Lawrence Erlbaum. [3] Warm, T.A. (1989). Weighted likelihood estimation of ability in item response models. Psychometrika, 54, 427-450. [less ▲]

Detailed reference viewed: 35 (0 ULg)
Peer Reviewed
See detailA framework and approaches to develop an in-house CAT with freeware and open sources
Kimura, Tetsuo; Han, Kyung; Kosinski, Michal et al

Conference (2012, August 13)

Detailed reference viewed: 12 (1 ULg)
Full Text
Peer Reviewed
See detailTwo issues in differential item functioning and two recently suggested solutions
Magis, David ULg; Facon, Bruno; De Boeck, Paul

Conference (2012, July 03)

Two issues of current interest in the framework of differential item functioning (DIF) are considered. First, in the presence of small samples of respondents, the asymptotic validity of most traditional ... [more ▼]

Two issues of current interest in the framework of differential item functioning (DIF) are considered. First, in the presence of small samples of respondents, the asymptotic validity of most traditional DIF detection methods is not guaranteed. Second, even with large samples of respondents, test score based methods such as Mantel-Haenszel) are affected by Type I error inflation when the true underlying model is not the Rasch model and in the presence of impact. To deal with small samples of respondents, Angoff’s Delta plot may be considered as a simple and straightforward DIF method. An improvement is proposed, based on acceptable assumptions, to select an optimal classification threshold rather than fixing it arbitrarily (as with the standard Delta plot). This modified Delta plot was compared to its standard version and to the Mantel-Haenszel method by means of simulations. Both, Mantel-Haenszel and the modified Delta plot outperform Angoff’s proposal, but the modified Delta plot is much more accurate for small samples than Mantel-Haenszel. For the second issue, a robust outlier approach to DIF was developed, by considering DIF items as outliers in the set of all tests items, and flagging the outliers with robust statistical inferential tools. This approach was compared with the Mantel-Haenszel method using simulations. Stable and correct Type I errors are observed for the robust outlier approach, independent of the underlying model, while Type I error inflation is observed for the Mantel-Haenszel method. The robust outlier method may therefore be considered as a valuable alternative. [less ▲]

Detailed reference viewed: 35 (4 ULg)
Full Text
Peer Reviewed
See detailLa librairie catR : une application pour soutenir le développement de tests adaptatifs informatisés comme modalités d’évaluation des apprentissages
Magis, David ULg; Raîche, Gilles

Conference (2012, June 05)

Le testing adaptatif informatisé (TAI) est une méthode d’administration de tests d’évaluation des apprentissages possédant des avantages significatifs sur l’administration fixe (papier-crayon) de mêmes ... [more ▼]

Le testing adaptatif informatisé (TAI) est une méthode d’administration de tests d’évaluation des apprentissages possédant des avantages significatifs sur l’administration fixe (papier-crayon) de mêmes tests : réduction de la longueur du test, évaluation individualisée, estimation immédiate des compétences évaluées, etc. Bien que développé depuis de nombreuses années, le TAI est, en pratique, utilisé de façon marginale. Ceci, entre autres, en raison des difficultés de calculs associés à l’utilisation de la théorie de la réponse à l’item (TRI) comme modélisation sous-jacente des TAI. Heureusement, le développement récent de logiciels gratuits et programmables, tel que R, permettent à présent de soutenir ces modélisations et ainsi de supporter le TAI avec une grande efficacité. Cet exposé vise un triple objectif. Premièrement, une présentation succincte et schématique du TAI est proposée en insistant sur ses aspects spécifiques. Deuxièmement, la librairie catR du logiciel R est décrite brièvement ainsi que ses fonctionnalités. Finalement, l’utilité de catR ainsi que son interaction avec des plateformes de développement du TAI, telle que Concerto, sont présentées. Les éléments techniques du TAI ne seront pas abordés, le but de l’exposé étant l’illustration pratique du TAI et son utilité en évaluation des apprentissages. [less ▲]

Detailed reference viewed: 14 (0 ULg)
Full Text
Peer Reviewed
See detailAnalyse du fonctionnement différentiel des items des versions papier et informatisée d'un test de classement en anglais, langue seconde, en présence de patrons de réponses inappropriés
Béland, Sébastien; Raîche, Gilles; Magis, David ULg et al

Conference (2012, June 05)

On utilise traditionnellement des tests de type papier-crayon lors des épreuves d’évaluation en éducation. Leur format de passation simple, et peu dispendieux, en a favorisé leur diffusion au sein des ... [more ▼]

On utilise traditionnellement des tests de type papier-crayon lors des épreuves d’évaluation en éducation. Leur format de passation simple, et peu dispendieux, en a favorisé leur diffusion au sein des établissements de tous les cycles d’étude. Les administrateurs de tests ont aussi commencé à dispenser ces épreuves à l’aide d’outils informatiques. Toutefois, l’équivalence des résultats obtenus entre les tests administrés sous forme papier-crayon ou informatisée n’est pas toujours assurée. De plus, la fonction de l’évaluation peut avoir un impact sur cette équivalence. Dans le contexte de tests à fonction certificative en mathématiques, par exemple, on a noté que dans les versions informatisées le niveau de difficulté des items pouvait être plus élevé.Un élément supplémentaire vient complexifier la situation, soit la présence d’individus qui tentent de manipuler leur résultat au test et ainsi produire des patrons de réponses inappropriés. Nous nous intéressons ici à vérifier cette équivalence au regard d’un test à fonction de classement en anglais, langue seconde, au niveau collégial au Québec. Dans le cadre de cette étude, nous avons élaboré une démarche en trois étapes. Premièrement, nous réaliserons séparément une première estimation des paramètres d’items, selon le modèle de Rasch pour la version papier-crayon (N=1709) administrée en 2009 et la version informatisée (N=13278) administrée en 2011. Deuxièmement, nous détecterons les patrons de réponse inappropriés à l’aide de l’indice lz* (Snijders, 2001). Troisièmement, les patrons de réponses inappropriés ayant été retirés, nous procéderons à l’analyse du fonctionnement différentiel des items. Dans ce cas-ci, nous utiliserons la version papier-crayon de l’épreuve d’évaluation comme source de données pour le groupe de référence et la version informatisée comme source de données pour le groupe focal. Cette dernière étape nous permettra de vérifier si les items sont équivalents quel que soit la version administrée. [less ▲]

Detailed reference viewed: 69 (1 ULg)
Full Text
Peer Reviewed
See detailOn the accurate selection of asymptotic detection thresholds for Infit and Outfit indexes of person fit
Magis, David ULg; Raîche, Gilles; Béland, Sébastien

Conference (2012, April 11)

It exists a bunch of person fit indexes but Lz (Drasgow, Levine, & Williams, 1985), Infit mean square W (Wright & Masters, 1982) and Outfit mean square U (Wright & Stone, 1979) are certainly the most ... [more ▼]

It exists a bunch of person fit indexes but Lz (Drasgow, Levine, & Williams, 1985), Infit mean square W (Wright & Masters, 1982) and Outfit mean square U (Wright & Stone, 1979) are certainly the most popular. However, they have the undesirable property that their limiting distribution depends on the true ability level, which is generally unknown. In addition, the asymptotic distribution of U and W indexes was not clearly stated. Snijders (2001) proposed a generalization of the index Lz to incorporate estimated ability levels in its computation, and derived subsequent asymptotic normality of this modified Lz* index. The purpose of this talk is threefold. First, the generalization of Lz to Lz* is briefly sketched. Second, it is shown how this generalization can be successfully applied to both U and W indexes, yielding generalized U* and W* indexes respectively. Third, the accuracy of generalized indexes in detecting person (mis)fit is assessed through a simulation study. Three situations were investigated: (a) absence of misfit; (b) presence of cheating (yielding spuriously high scores); (c) presence of inattention (yielding spuriously low scores). Several conditions were varied, such as test length and aberrance rates when misfit was introduced. Response patterns were generated under the Rasch model and maximum likelihood estimation was performed to obtain the ability estimates. Several significance levels were selected. It is observed, that the generalized indexes Lz*, U* and W* better recover the significance level than their standard alternatives Lz, U and W respectively, while they are more powerful in identifying the two types of person misfit. In particular, the modified index W* has the best improvement in performance with respect to its original version W. It is concluded that Snijders' generalization of Lz index to Lz* is also accurate for U and W indexes under Rasch modelling. Possible extensions to other person fit indexes, such as ECI indexes (Tatsuoka, 1984), other ability estimators, and other IRT models are eventually briefly discussed. [less ▲]

Detailed reference viewed: 33 (2 ULg)
Full Text
See detailEverybody knows what IRT is... (really?)
Magis, David ULg

Conference (2012, March 27)

Item response theory (IRT) is not only a famous acronym, it also covers a broad range of statistical and psychometric models, methods, and applications in psychology and educational science. However, IRT ... [more ▼]

Item response theory (IRT) is not only a famous acronym, it also covers a broad range of statistical and psychometric models, methods, and applications in psychology and educational science. However, IRT looks somewhat misunderstood or even esoteric to the non-statisticians and non-psychometricians. The main goal of this talk is to briefly and clearly present the main aspects of traditional IRT: objectives, assumptions, basic models, and basic estimation methods. As a secondary purpose, extensions of IRT to more complex situations (polytomous data, multidimensional and hierarchical models, computerized adaptive testing, differential item functioning, nonparametric IRT) are briefly outlined. This talk will be as little technical as possible, focusing on the main concepts and applications of this theory. [less ▲]

Detailed reference viewed: 24 (4 ULg)
Full Text
See detailAn overview of the CAT: framework, R package, and applications
Magis, David ULg

Conference (2012, March 12)

Computerized adaptive testing (CAT) is an efficient method to administer psychometric or educational tests and questionnaires. Unlike the standard fixed (“paper-and-pencil”) tests, items in a CAT are ... [more ▼]

Computerized adaptive testing (CAT) is an efficient method to administer psychometric or educational tests and questionnaires. Unlike the standard fixed (“paper-and-pencil”) tests, items in a CAT are iteratively and optimally selected within a bank of available items, on the basis of previously administered items and the current ability estimate of the examinee. This general approach has several assets with respect to fixed tests: it reduces the risk of fraud, it allows for individualized questionnaires according to the examinee’s ability level, and fewer items must be administered to reach the same level of precision in the ability estimates. The purpose of this talk is threefold. First, a general overview of CAT is proposed and its main principles are quickly outlined. Second, a recently developed R package, called catR, is briefly presented and its functionalities are described. Finally, two applications are discussed. The first application is a live demonstration of catR, by using its by-default item bank about English aptitude assessment, and several CAT options. The second application focuses on the on-line testing platform Concerto, a web interface for the development and testing of CAT sessions that uses catR as underlying computational package. The R package catR was jointly developed with Gilles Raîche (Université du Québec à Montréal, Canada). The platform Concerto is under development by The Psychometrics Centre (Cambridge University, UK) under the supervision of Michal Kosinski and John Rust. [less ▲]

Detailed reference viewed: 31 (3 ULg)
See detailQuels modèles de mesure pour l’évaluation des compétences ?
Burton, Réginald; Flieller, André; Frenette, Eric et al

Conference (2012, January 12)

De plus en plus de dispositifs d’évaluation (évaluation formative, monitoring, épreuves d’orientation scolaire, …) ont recours aux Modèles de Réponse à l’Item pour estimer les compétences scolaires des ... [more ▼]

De plus en plus de dispositifs d’évaluation (évaluation formative, monitoring, épreuves d’orientation scolaire, …) ont recours aux Modèles de Réponse à l’Item pour estimer les compétences scolaires des élèves au départ de tests papier-crayon ou d’épreuves proposées sur ordinateur. Ces modèles de mesure – dont les propriétés laissaient entrevoir des progrès et des avantages considérables – se sont d’abord imposés comme standards de référence dans les enquêtes internationales pour être ensuite transposés dans des contextes locaux, notamment pour l’évaluation externe des compétences des élèves. Cependant, la technicité des MRI, leur nature stochastique et leurs composantes statistiques relativement complexes n’ont pas toujours permis une approche cartésienne et prudente de leur exploitation. On a cru pendant longtemps que la transposition de ces modèles à l’évaluation des compétences était naturelle et ne souffrait d’aucun obstacle fondamental. Mais les objectifs, les caractéristiques et les exigences de l’évaluation des compétences sont parfois bien éloignés des préoccupations internationales. Nous nous proposons donc, au départ des résultats empiriques et/ou des réflexions théoriques des intervenants de la table ronde, de déterminer dans quelles conditions l’application des MRI à l’évaluation des compétences est pertinente? Dans cette perspective, plusieurs critères d’analyse seront envisagés : la nature théorique des modèles (les MRI sont-ils adaptés pour atteindre les objectifs des dispositifs d’évaluation des compétences?), les caractéristiques psychométriques des évaluations (les conditions d’application des MRI sont-elles toujours remplies dans le cadre des évaluations de compétences?) et les méthodes de validation des modèles (les méthodes pour évaluer l’adéquation des modèles sont-elles adaptées à l’évaluation des compétences?). Pour ne pas conclure, les intervenants s’interrogeront sur l’opportunité de développer de nouveaux modèles de mesure pour l’évaluation des compétences. Trois questions principales structureront les échanges : Question 1: Quelles sont les caractéristiques psychométriques des dispositifs d’évaluation des compétences ? Question 2: Les Modèles de Réponse à l’Item sont-ils adaptés pour évaluer les compétences ? Question 3: Est-il opportun de développer de nouveaux modèles pour l’évaluation des compétences ? Si oui, quelles devraient en être les caractéristiques ? [less ▲]

Detailed reference viewed: 78 (9 ULg)
Full Text
Peer Reviewed
See detailA short introduction into Bayesian evaluation of informative hypotheses as an alternative to exploratory comparisons of multiple group means
Béland, Sébastien; Klugkist, Irène; Raîche, Gilles et al

in Tutorials in Quantitative Methods for Psychology (2012), 8

This paper presents an introduction into Bayesian evaluation of informative hypotheses, that is, hypotheses representing explicit expectations about multiple group means (Hoijtink, 2011; Hoijtink ... [more ▼]

This paper presents an introduction into Bayesian evaluation of informative hypotheses, that is, hypotheses representing explicit expectations about multiple group means (Hoijtink, 2011; Hoijtink, Klugkist & Boelen, 2008). The authors begin by discussing some limits of exploratory methods before presenting a non-technical overview of the Bayesian approach. References are provided for the technical details. A particular effort is made to illustrate the method with an example from psychology. References to software, more elaborate textbooks and tutorials enable researchers to apply this novel method to their own data. [less ▲]

Detailed reference viewed: 16 (2 ULg)
Full Text
Peer Reviewed
See detailTaking atypical response patterns into account: a multidimensional measurement model from item response theory
Raîche, Gilles; Magis, David ULg; Blais, Jean-Guy et al

in Simon, Marielle; Ercikan, Kadriye; Rousseau, Michel (Eds.) Improving large scale assessment in education: Theory, issues, and practice (2012)

Detailed reference viewed: 13 (4 ULg)
Full Text
Peer Reviewed
See detailAngoff’s Delta method revisited: improving the DIF detection under small samples
Magis, David ULg; Facon, Bruno

in British Journal of Mathematical & Statistical Psychology (2012), 65

Most of the methods for detecting differential item functioning (DIF) are suitable when the sample sizes are sufficiently large to validate the null statistical distributions. There is no guarantee ... [more ▼]

Most of the methods for detecting differential item functioning (DIF) are suitable when the sample sizes are sufficiently large to validate the null statistical distributions. There is no guarantee, however, that they still perform adequately when there are few respondents in the focal group or in both the reference and the focal group. Angoff’s Delta plot is a potentially useful alternative for small-sample DIF investigation, but it suffers from improper DIF flagging criterion. The purpose of this paper is to improve this classification rule under mild statistical assumptions. This improvement yields a modified Delta plot with an adjusted DIF flagging criterion for small samples. A simulation study was conducted to compare the modified Delta plot to both the classical Delta plot approach and the Mantel-Haenszel method. It is concluded that the modified Delta plot is consistently less conservative and more powerful than the usual Delta plot, and is also less conservative and more powerful than the Mantel-Haenszel method as long as at least one group of respondents is small. [less ▲]

Detailed reference viewed: 29 (4 ULg)
Full Text
Peer Reviewed
See detailRandom generation of response patterns under computerized adaptive testing with the R package catR
Magis, David ULg; Raîche, Gilles

in Journal of Statistical Software (2012), 48

This paper outlines a computerized adaptive testing (CAT) framework and presents an R package for the simulation of response patterns under CAT procedures. This package, called catR, requires a bank of ... [more ▼]

This paper outlines a computerized adaptive testing (CAT) framework and presents an R package for the simulation of response patterns under CAT procedures. This package, called catR, requires a bank of items, previously calibrated according to the four-parameter logistic (4PL) model or any simpler logistic model. The package proposes several methods to select the early test items, several methods for next item selection, di erent estimators of ability (maximum likelihood, Bayes modal, expected a posteriori, weighted likelihood), and three stopping rules (based on the test length, the precision of ability estimates or the classi cation of the examinee). After a short description of the di erent steps of a CAT process, the commands and options of the catR package are presented and practically illustrated. [less ▲]

Detailed reference viewed: 27 (6 ULg)
Full Text
Peer Reviewed
See detailA robust outlier approach to prevent Type I error inflation in DIF
Magis, David ULg; De Boeck, Paul

in Educational & Psychological Measurement (2012), 72

The identification of differential item functioning (DIF) is often performed by means of statistical approaches that consider the raw scores as proxys for the ability trait level. One of the most popular ... [more ▼]

The identification of differential item functioning (DIF) is often performed by means of statistical approaches that consider the raw scores as proxys for the ability trait level. One of the most popular approaches, the Mantel-Haenszel (MH) method, belongs to this category. However, replacing the ability level by the simple raw score is a source of potential Type I error inflation, especially in the presence of DIF but also when DIF is absent and in the presence of impact. The purpose of this paper is to present an alternative statistical inference approach based on the same measure of DIF but such that the Type I error inflation is prevented. The key notion is that for DIF items, the measure has an outlying value which can be identified as such with inference tools from robust statistics. Although we use the MH log-odds ratio as a statistic, the inference is different. A simulation study is performed to compare the robust statistical inference with the classical inference method, both based on the MH statistic. As expected the Type I error rate inflation is avoided with the robust approach, while the power of the two methods is similar. [less ▲]

Detailed reference viewed: 14 (5 ULg)
Full Text
Peer Reviewed
See detailA didactic presentation of Snijders’ lz* index of person fit with emphasis on response model selection and ability estimation
Magis, David ULg; Raîche, Gilles; Béland, Sébastien

in Journal of Educational & Behavioral Statistics (2012), 37

This paper focuses on two likelihood-based indices of person fit, the index lz (Drasgow, Levine & Williams, 1985) and the Snijders’ modified index lz* (Snijders, 2001). The first one is commonly used in ... [more ▼]

This paper focuses on two likelihood-based indices of person fit, the index lz (Drasgow, Levine & Williams, 1985) and the Snijders’ modified index lz* (Snijders, 2001). The first one is commonly used in practical assessment of person fit, although its asymptotic standard normal distribution is not valid when true abilities are replaced by sample ability estimates. The lz* index is a generalization of lz which corrects for this sampling variability. Surprisingly, it is not yet popular in the psychometric and educational assessment community. Moreover, there is some ambiguity about which type of item response model and ability estimation method can be used to compute the lz* index. The purpose of this paper is to present the index lz* in a simple and didactic approach. Starting from the relationship between lz and lz*, we develop the framework according to the type of logistic IRT model and the likelihood-based estimators of ability. The practical calculation of lz* is illustrated by analyzing a real data set about language skill assessment. [less ▲]

Detailed reference viewed: 38 (6 ULg)
Full Text
Peer Reviewed
See detailOn the relationships between Jeffreys modal and weighted likelihood estimation of ability under logistic IRT models
Magis, David ULg; Raîche, Gilles

in Psychometrika (2012), 77

This paper focuses on two estimators of ability with logistic item response theory models: the Bayesian modal (BM) estimator and the weighted likelihood (WL) estimator. For the BM estimator, Jeffreys’ ... [more ▼]

This paper focuses on two estimators of ability with logistic item response theory models: the Bayesian modal (BM) estimator and the weighted likelihood (WL) estimator. For the BM estimator, Jeffreys’ prior distribution is considered, and the corresponding estimator is referred to as the Jeffreys modal (JM) estimator. It is established that under the three-parameter logistic model, the JM estimator returns larger estimates than the WL estimator. Several implications of this result are outlined. [less ▲]

Detailed reference viewed: 28 (4 ULg)
Full Text
Peer Reviewed
See detailOn the difficulty of relational concepts among participants with Down syndrome
Facon, Bruno; Magis, David ULg; Courbois, Yannick

in Research in Developmental Disabilities (2012), 33

The aim of the study was to compare the difficulty of relational concepts among participants with and without intellectual disability. The French versions of the Boehm Tests of Basic Concepts Third ... [more ▼]

The aim of the study was to compare the difficulty of relational concepts among participants with and without intellectual disability. The French versions of the Boehm Tests of Basic Concepts Third Edition (Preschool and Kindergarten to 2nd grade) were administered to three groups of 47 participants individually matched on their total raw score on the tests. The first group comprised participants with intellectual disability of undifferentiated etiology, the second, participants with Down syndrome and the third, typical children. Item analyses using the transformed item difficulties method to detect differential item functioning across groups showed that the groups' rank-orders of item difficulty were highly similar. It is concluded that, all things being equal, relational concepts are of comparable difficulty and follow a similar sequence of development whatever the cognitive and etiological status of participants. Methodological and theoretical implications of these findings are discussed. [less ▲]

Detailed reference viewed: 7 (3 ULg)
Full Text
See detailA modified item response model to detect and correct for cheating
Magis, David ULg; Raîche, Gilles

Conference (2011, October 13)

This talk focuses on the identification of respondents with cheating behaviour to educational or psychometric tests. Cheating behaviour is often observed when some examinee with low ability level, tries ... [more ▼]

This talk focuses on the identification of respondents with cheating behaviour to educational or psychometric tests. Cheating behaviour is often observed when some examinee with low ability level, tries to get correct responses (in some way or another) to get higher grades. This results usually in a response pattern that does not fit the underlying item response theory (IRT) model and severely affects the estimation of ability level. Until now, several indices of misfit have been proposed, and they seem accurate for detecting cheating (as well as other misfitting behaviours). Unfortunately, none of these indexes permit: (a) to determine which type of behaviour is encountered, and (b) to propose a corrected estimation of the examinee’s ability level. The purpose of this talk is to present Raîche’s multidimensional model as an extension of the usual IRT models but incorporating additional person parameters to characterize misfitting behaviour. Emphasis is put on a simple model with personal pseudo-guessing variation to detect trends in cheating. Results from a simulation study indicate that, in the absence of cheating, the multidimensional model returns similar estimates than the traditional IRT models, while in the presence of cheating, the person pseudo-guessing parameter is an accurate index of misfit and the person ability estimates are less biased than its traditional counterpart. [less ▲]

Detailed reference viewed: 58 (2 ULg)
Full Text
See detailRobust DIF analysis
Magis, David ULg

Conference (2011, October 11)

This talk focuses on the issue of differential item functioning in psychometrics. An item is said to function differently if examinees from different groups, but with the same ability levels, have ... [more ▼]

This talk focuses on the issue of differential item functioning in psychometrics. An item is said to function differently if examinees from different groups, but with the same ability levels, have nevertheless different probabilities of endorsing this item. Many methods were proposed to detect DIF, either based on statistical methods (such as logistic regression) or on IRT models. The talk is divided into three parts. In the first part, a brief overview of the DIF framework and methods is proposed. In the second part, a recent, conceptually different approach to DIF will be introduced. It consists basically in flagging as DIF, the items that are outlying withb respect to other items. This approach is based on robust statistical tools for outlier identification. It cancels the issue of Type I error inflation and the need for purification of the anchor set. In the third part, it is briefly outlined how this approach easily extends to the simultaneous comparison of more than two groups of examinees; multivariate robust estimators of location and scale are required, but their use might overcome the standard methods for simultaneous pairwise comparisons. [less ▲]

Detailed reference viewed: 9 (2 ULg)