Browse ORBi by ORBi project

- Background
- Content
- Benefits and challenges
- Legal aspects
- Functions and services
- Team
- Help and tutorials

An item analysis of the French version of the Test for Reception of Grammar among children and adolescents with Down syndrome or intellectual disability of undifferentiated etiology ; Magis, David in Journal of Speech, Language, and Hearing research (in press) Purpose: An item analysis of Bishop’s (1983) Test for Reception of Grammar (TROG) in its French version (F-TROG, Lecocq, 1996) was conducted to determine whether the difficulty of items is similar for ... [more ▼] Purpose: An item analysis of Bishop’s (1983) Test for Reception of Grammar (TROG) in its French version (F-TROG, Lecocq, 1996) was conducted to determine whether the difficulty of items is similar for participants with or without intellectual disability (ID). Method: In Study 1, responses to the 92 F-TROG items by 55 participants with Down syndrome (DS), 55 with ID of undifferentiated etiology (UND) and 55 typical children (TYP) matched on their F-TROG total score were compared using the transformed item difficulties method, a statistical approach designed to detect differential item functioning (DIF) between groups. In Study 2, an additional comparison involving 526 TYP participants and 526 participants with UND was conducted to increase the statistical power of the analysis. Results: The difficulty of items was highly similar whatever the sample size or clinical status of participants. Fewer than 3.5 % of the items were flagged as showing DIF. Conclusions: Tests such as the TROG can be used with confidence in clinical practice as well as in research studies comparing participants with or without ID. Methods designed for investigating potential internal test bias – such as done here – should be more regularly employed in the developmental disability field to affirm the absence of DIF. [less ▲] Detailed reference viewed: 15 (1 ULg)On the finiteness of the weighted likelihood estimator of ability Magis, David ; in Psychometrika (in press) The purpose of this note is to focus on the finiteness of the weighted likelihood estimator (WLE) of ability in the context of dichotomous and polytomous item response theory (IRT) models. It is ... [more ▼] The purpose of this note is to focus on the finiteness of the weighted likelihood estimator (WLE) of ability in the context of dichotomous and polytomous item response theory (IRT) models. It is established that the WLE always returns finite ability estimates. This general result is valid for dichotomous (one-, two-, three- and four-parameter logistic) IRT models, the class of polytomous difference models and divide-by-total models, independently of the number of items, the item parameters and the response patterns. Further implications of this result are outlined. [less ▲] Detailed reference viewed: 15 (2 ULg)Étude de nouveaux indices de détection de la réponse au hasard et de l’inattention selon différentes valeurs de l’habileté dans le contexte de la modélisation de Rasch ; ; Magis, David et al in Mesure et Evaluation en Education (in press) Certains étudiants peuvent répondre au hasard ou être inattentifs dans une situation de testing. Plusieurs approches ont déjà été développées pour détecter ce type de réponse. Parmi celles-ci ... [more ▼] Certains étudiants peuvent répondre au hasard ou être inattentifs dans une situation de testing. Plusieurs approches ont déjà été développées pour détecter ce type de réponse. Parmi celles-ci, l’utilisation d’indices de détection (person-fit indexes) de patrons de réponses inappropriés est l’approche qui est la plus étudiée et qui semble la plus prometteuse. Dans le cadre de cette étude, nous nous concentrons sur trois indices de détection populaires qui présentent des caractéristiques permettant d’en faciliter l’interprétation: lz, ZU et ZW. Des études antérieures ont montré que ces trois indices sont fortement affectés par le fait que l’habileté d’un étudiant est estimée plutôt que réelle. Snijders (2001) a proposé une version corrigée de l’indice lz (nommée lz*) afin de tenir compte de cette difficulté. Magis, Béland et Raîche (2014) ont déjà corrigé deux autres indices selon l’approche de Snijders: U* et W*. Il reste cependant à analyser plus en détail le comportement des indices corrigés lz*, U* et W* et des indices standardisés lz, ZU et ZW. Pour ce faire, nous effectuons deux études selon différentes valeurs de l’habileté, soit une analyse des erreurs de type I des indices (probabilité de se tromper en identifiant un patron de réponses inapproprié) et une analyse de leur puissance de détection. Ces analyses permettront de démontrer que ce sont généralement les indices corrigés lz* et W* qui sont les plus intéressants à utiliser puisque leurs scores suivent approximativement la loi normale et qu’ils permettent de bien détecter la réponse au hasard et l’inattention. [less ▲] Detailed reference viewed: 15 (0 ULg)Computerized adaptive testing with R: Recent updates of the package catR Magis, David ; in Journal of Statistical Software (in press) The purpose of this paper is to list the recent updates of the R package catR. This package allows for generating response patterns under a computerized adaptive testing (CAT) framework with underlying ... [more ▼] The purpose of this paper is to list the recent updates of the R package catR. This package allows for generating response patterns under a computerized adaptive testing (CAT) framework with underlying item response theory (IRT) models. Among the most important updates, well-known polytomous IRT models are now supported by catR; several item selection rules have been added; and it is now possible to perform post-hoc simulations. Some functions were also rewritten or withdrawn to improve the usefulness and performances of the package. [less ▲] Detailed reference viewed: 35 (0 ULg)A cross-sectional analysis of developmental trajectories of vocabulary comprehension among children and adolescents with Down syndrome or intellectual disability of undifferentiated aetiology ; ; Magis, David in Journal of Intellectual & Developmental Disability (in press) Background: This work seeks to expand our knowledge of developmental trajectories of subcomponents of the language systems of individuals with intellectual disability (ID). It aims to explore how general ... [more ▼] Background: This work seeks to expand our knowledge of developmental trajectories of subcomponents of the language systems of individuals with intellectual disability (ID). It aims to explore how general and relational vocabularies evolve as a function of cognitive level. Method: Developmental trajectories of general and relational vocabulary comprehension were compared among typically developing children (TYP) and children and adolescents with ID of undifferentiated aetiology (UND) or Down syndrome (DS). Results: Comparisons between TYP and UND participants showed no interaction between cognitive level and diagnostic status for general vocabulary, and only a very weak interaction for relational vocabulary. Comparisons between TYP and DS participants failed to reveal groupspecific trajectories. Performance in general vocabulary was higher than in relational vocabulary for both UND and DS participants. Conclusion: The developmental trajectories of vocabulary appear to be globally comparable for participants with or without ID. [less ▲] Detailed reference viewed: 38 (3 ULg)Passage de l’administration fixe d’un test à une administration adaptative : application au TCALS-II Magis, David ; in Raîche, Gilles; Ndinga, Pascal; Meunier, Hélène (Eds.) L'interdisciplinarité de la mesure et de l’évaluation (in press) La problématique du passage de l’administration fixe (de type papier-crayon) à une administration adaptative d’un test est étudiée. Une méthode en deux étapes est présentée. Dans un premier temps, des ... [more ▼] La problématique du passage de l’administration fixe (de type papier-crayon) à une administration adaptative d’un test est étudiée. Une méthode en deux étapes est présentée. Dans un premier temps, des patrons de réponses sont générés selon une administration fixe, dans le but de déterminer des valeurs admissibles de l’erreur-type d’estimation du niveau d’habileté. Ensuite, ces valeurs sont utilisées comme critères d’arrêt lors d’une administration adaptative du même test. La longueur du test est alors considérée pour évaluer la qualité du test par rapport à sa version fixe. Le test de classement en anglais, langue seconde, au collégial (TCALS-II) est utilisé en guise d’illustration. Il est établi qu’une administration adaptative du TCALS-II permettrait de réduire sensiblement la longueur du test, sans perte de qualité de l’estimation des niveaux d’habileté. Toutefois, cette amélioration est limitée aux sujets ne présentant pas un niveau d’habileté trop faible ou trop important. [less ▲] Detailed reference viewed: 38 (13 ULg)Examine the effects of two adjustments to the lz statistic ; Magis, David Conference (2016, July 12) Conformity to a known distribution and sensitivity to response aberrance are desirable properties of person-fit statistics. This simulation study examined the joint and independent effects of two ... [more ▼] Conformity to a known distribution and sensitivity to response aberrance are desirable properties of person-fit statistics. This simulation study examined the joint and independent effects of two adjustments to the standardized log-likelihood statistic (lz): (1) correction of the negatively skewed distribution of lz (Snijders, 2001), and (2) improving the sensitivity of the statistic by employing more accurate estimates of item response probability using symmetric functions (Dimitrov and Smith, 2006). Data were simulated using three test lengths (10, 20, 30 items). Data containing misfitting response patterns were simulated using three aberrant response patterns (cheating, guessing, and inattentiveness), and three levels of aberrance (i.e., proportion of item responses affected by misfit; 10%, 30% and 50%). Data containing no simulated misfitting response patterns were also generated for each test length. Non-misfitting responses were generated using the dichotomous Rasch measurement model. For each combination of independent variables, a dataset was generated consisting of 5,000 simulees. Four fit statistics were compared: lz, lz* (Snijders adjustment), lzSYM (Dimitrov and Smith adjustment), and lzSYM* (both adjustments). Mean Type I error rates were ≤ 0.1 across all conditions. The lz* statistic produced the best control of Type I error, which was often below the nominal Type I error rate, whereas the empirical Type I error rate for the unadjusted lz statistic most closely approximated the nominal rate. In contrast, lzSYM and lzSYM* yielded empirical Type I error rates larger than the nominal rate, with the discrepancy being particularly pronounced as the length of the test decreased. As might be expected, power to detect misfitting response patterns increased with test length and with the percentage of misfitting response patterns in the sample. Both lzSYM and lzSYM* evidenced improved power in detecting misfitting response patterns compared to lz and lz*, particularly for guessing response patterns and/or on shorter (i.e., 10 item) tests. [less ▲] Detailed reference viewed: 12 (1 ULg)Open source programming: a new hope for psychometric research Magis, David Conference (2016, July 12) Current psychometric research is most often supported by computer software. New research perspectives often imply intensive simulation studies to validate the tested theories or hypotheses, and therefore ... [more ▼] Current psychometric research is most often supported by computer software. New research perspectives often imply intensive simulation studies to validate the tested theories or hypotheses, and therefore require accurate, fast and stable implementation. To this regards, open source programming (such as in the R language) is a promising approach allowing for flexible implementation, data generation, replication of studies, and worldwide dissemination. The purpose of this talk is to illustrate how psychometrics and open source programming (with special emphasis on the R language) can interact and contribute to each other, by means of some selected examples. Several topics will be illustrated, among others: why open source programming is (to my opinion) as important as psychometric research; why we need for stable and complete implementation of psychometric and statistical routines for research purposes (for e.g., CAT); how accurate implementation of IRT routines can lead to unexpected theoretical results; why (and how) open source software can be valued as research output. Most examples will arise from the CAT framework and the R package catR for simulating CAT patterns. [less ▲] Detailed reference viewed: 18 (1 ULg)On the use of ROC curves in DIF simulation studies Magis, David ; Conference (2016, July 12) Simulation studies are often used to compare methods to detect differential item functioning (DIF). However, comparing the performance of such methods can become complicated when the identification of DIF ... [more ▼] Simulation studies are often used to compare methods to detect differential item functioning (DIF). However, comparing the performance of such methods can become complicated when the identification of DIF items relies on statistics based on pre-defined significance level or on pre-established cutoff values. DIF methods based on conceptually different approaches may therefore become incomparable in terms of summary DIF statistics such as false alarm rate or hit rate. The purpose of this talk is to overcome this analytic issue by introducing receiver operating characteristic (ROC) curves in this context. ROC curves allow for global comparison of methods’ performances by computing pairs of (false alarm, hit) rates and representing them on a common scatter plot. Several summary ROC statistics can be considered for further analysis. The application of the ROC curve methodology, together with its limitation and possible extensions, is illustrated by a simple simulation study that compares three score-based DIF methods (Mantel-Haenszel, standardization and Delta plot). [less ▲] Detailed reference viewed: 8 (1 ULg)Computerized Adaptive Testing ; Magis, David ; Scientific conference (2016, April 25) Why ask a person to answer a problem item, when you a priori know they won’t be able to solve it? It is a waste of time and resources, and you won’t gain any new information; this is both inefficient and ... [more ▼] Why ask a person to answer a problem item, when you a priori know they won’t be able to solve it? It is a waste of time and resources, and you won’t gain any new information; this is both inefficient and ineffective. In contrast, computerized adaptive testing (CAT) is based on the principle that more information can be gained when one tailors the test towards the level of the person being tested. Computational and statistical techniques from item response theory (IRT) and decision theory are combined to implement a test that can behave interactively during the test process and adapts towards the level of the person being tested.The implementation of such a CAT relies on an iterative sequential algorithm that searches the pool of available items (a so-called item bank) for the optimal item to administer based on the current estimate of the person’s level (and optional external constraints). The subsequent response on this item provides new information to update the person’s proficiency estimate. This selection-responding-updating process continues until specified stop criteria have been reached. The consequence of such an adaptive test administration is that you get an individualized tailored test that is more efficient and more effective. Because you have less of a mismatch between the level of the test and the level of the test taker, there is a lesser burden for the latter and a higher precision for the former, and this with fewer items than a traditional fixed item-set test format. Furthermore, because it is computerized and sequential, test performance can be continuously monitored and reported directly after test completion. Item response models come into play to ensure comparable scores of these individual tailored tests by putting them on the same measurement scale and to precalibrate,the psychometric parameters of the items that are part of the item bank on which the sequential iterative algorithm operates. The workshop intends to tackle issues encountered during the setup of a computerized adaptive test, starting from the design towards the actual delivery of a CAT. [less ▲] Detailed reference viewed: 16 (1 ULg)Adaptive versus linear testing: selected examples from psychology, education and medicine Magis, David Scientific conference (2016, March 08) Most often data from psychological, educational or medical research are collected by administering questionnaires to the participants. Such questionnaires are usually made of the same set of items (i.e ... [more ▼] Most often data from psychological, educational or medical research are collected by administering questionnaires to the participants. Such questionnaires are usually made of the same set of items (i.e. questions) and are designed to precisely target the studied latent trait: this is referred to as "linear testing". However, linear testing can be counterproductive in some specific situations. For instance, not all items can accurately target the latent trait (e.g., "easy" items are not informative for "highly able" participants), so that the duration of the test can be uselessly extended. Adaptive testing is an emerging paradigm that aims at selecting and administering each item on the basis of previously selected items and the responses of the participants. The selectuion of the item is made optimally so that the most informative item for the respondent is chosen. This permits, among others, to better target the true latent trait to be estimated, shortening then the test duration. In this talk, linear and adaptive testing will be sketched from an educational testing approach. Then, selected examples from the psychological, educational and medical litterature will be briefly reviewed to illustrate the potentially usefulness of adaptive testing. Pros and cons of this method will also be outlined. [less ▲] Detailed reference viewed: 33 (3 ULg)Computerized adaptive and multi-stage testing with R ; Magis, David Conference (2016, February 19) Computerized Adaptive Testing (CAT) has greatly improved the accuracy and efficiency of psychological testing for decades. Multistage Testing (MST) has received much of attention recently. MST is similar ... [more ▼] Computerized Adaptive Testing (CAT) has greatly improved the accuracy and efficiency of psychological testing for decades. Multistage Testing (MST) has received much of attention recently. MST is similar to CAT such that it allows the adaptation of the difficulty of the test to the level of ability of a test taker. Specifically, in MST, items are interactively selected for each test taker, but rather than selecting individual items, groups of items are selected and the test is built in stages. Over the last decade, researchers have investigated ways for an MST to incorporate most of the advantages from CAT and linear testing, while minimize their disadvantages. These features include testing efficiency and accuracy, greater control of test content, more robust item review, as well as simplified test assembly and administration. Therefore, MST can be an effective compromise between CAT and linear testing, embedding features and benefits from both designs. Thus, MST becomes of more and more interest to researchers and practitioners as technology advances. This presentation will first provide a general overview of a multistage test (MST) design and its important concepts and processes. It will then present the latest development on CAT and MST using R, the mstR package. The presentation will also illustrate how to simulate MST administrations using mstR package, and discuss some practical issues and considerations for MST from design to applications. [less ▲] Detailed reference viewed: 47 (0 ULg)Efficient standard error formulas of ability estimators with dichotomous item response models Magis, David in Psychometrika (2016), 81 This paper focuses on the computation of asymptotic standard errors (ASE) of ability estimators with dichotomous item response models. A general framework is considered and ability estimators are defined ... [more ▼] This paper focuses on the computation of asymptotic standard errors (ASE) of ability estimators with dichotomous item response models. A general framework is considered and ability estimators are defined from a very restricted set of assumptions and formulas. This approach encompasses most standard methods such as maximum likelihood, weighted likelihood, maximum a posteriori and robust estimators. A general formula for the ASE is derived from the theory of M-estimation. Well-known results are found back as particular cases for the maximum and robust estimators, while new ASE proposals for the weighted likelihood and maximum a posteriori estimators are presented. These new formulas are compared to traditional ones by means of a simulation study under Rasch modeling. [less ▲] Detailed reference viewed: 20 (6 ULg)Étude comparative de nouveaux indices de détection de la réponse qui s’apparentent au hasard et à l’inattention ; ; Magis, David et al Conference (2015, November 19) Certains étudiants peuvent répondre au hasard ou être inattentifs dans une situation de testing. Plusieurs approches ont déjà été développées pour détecter ce type de réponse (Zickar et Drasgow, 1996 ... [more ▼] Certains étudiants peuvent répondre au hasard ou être inattentifs dans une situation de testing. Plusieurs approches ont déjà été développées pour détecter ce type de réponse (Zickar et Drasgow, 1996). Parmi celles-ci, l’utilisation d’indices de détection de patrons de réponses inappropriés (person-fit indices) est l’approche qui est la plus étudiée et qui semble la plus prometteuse (Karabatsos, 2003; Meijer et Sijtsma, 2001). Dans le cadre de cette étude, nous nous concentrerons sur trois indices de détection populaires qui présentent des caractéristiques permettant d’en faciliter l’interprétation : lz (Drasgow, Levine et Williams, 1985), ZU (Wright et Masters, 1982) et ZW (Wright et Masters, 1982). Toutefois, il s’est avéré qu’ils sont tous fortement affectés par le fait que l’habileté d’un étudiant est estimée plutôt que réelle (Li et Olejnik, 1997; Molenaar et Hoijtink, 1990). Voilà pourquoi Snijders (2001) a proposé une version corrigée de l’indice lz (nommée lz*) qui prend en considération ce problème important. Nous avons déjà appliqué la correction de Snijders aux indices U et W en créant les indices ZU* et ZW* (Magis, Béland et Raîche, 2014). L’objectif de cette étude sera d’examiner le comportement des indices corrigés lz*, ZU* et ZW* et de leur version standardisée. Pour ce faire, nous effectuerons trois études différentes : une analyse descriptive des scores des indices, une analyse des erreurs de type I et une analyse de leur puissance de détection. Les analyses ont démontré que ce sont les indices corrigés lz* et ZW* qui sont les plus intéressants à utiliser puisque leurs scores suivent approximativement la loi N(0,1) et puisqu’ils permettent de bien détecter les réponses qui s’apparentent au hasard et à l’inattention. [less ▲] Detailed reference viewed: 18 (0 ULg)Receiver operating characteristic (ROC) curves and their use in psychometric simulation studies Magis, David Scientific conference (2015, October 27) Simulation studies are commonly used in psychometric research to compare existing methods or to highlight the outperformance of a newly developed approach with respect to standard techniques. In several ... [more ▼] Simulation studies are commonly used in psychometric research to compare existing methods or to highlight the outperformance of a newly developed approach with respect to standard techniques. In several specific situations, the output of performance evaluations can be summarized by pairs of statistics such as false alarm and hit rates (or Type I error and power). Adequate analysis of these rates, however, is often subject to discussion. The purpose of this ongoing work (jointly with Francis Tuerlinckx) is to advocate the usefulness of receiver operating characteristic (ROC) curves to analyze the output of simulation studies in terms of pairs of summary statistics. Two particular psychometric applications will be considered and illustrated: differential item functioning (DIF) and person fit identification. By means of simple examples, ROC curves will be shown to be efficient in capturing more output than standard analyses, thus allowing for a more refined and precise discussion of the study results. Limitations and other extensions will also be outlined. [less ▲] Detailed reference viewed: 56 (1 ULg)Efficient standard error formulas of ability estimators in item response theory Magis, David Conference (2015, October 15) This talk focuses on the computation of asymptotic standard errors (ASE) of ability estimators with dichotomous item response models. A general framework is considered, and ability estimators are defined ... [more ▼] This talk focuses on the computation of asymptotic standard errors (ASE) of ability estimators with dichotomous item response models. A general framework is considered, and ability estimators are defined from a very restricted set of assumptions and formulas. This approach encompasses most standard methods such as maximum likelihood, weighted likelihood, maximum a posteriori, and robust estimators. A general formula for the ASE is derived from the theory of M-estimation. Well-known results are found back as particular cases for the maximum and robust estimators, while new ASE proposals for the weighted likelihood and maximum a posteriori estimators are presented. These new formulas are compared to traditional ones by means of a simulation study under Rasch modeling. [less ▲] Detailed reference viewed: 20 (1 ULg)Empirical comparison of scoring rules at early stages of CAT Magis, David Conference (2015, September 15) Usual scoring rules in CATs include maximum likelihood (ML), weighted likelihood (WL) and Bayesian approaches. However, at early stages of adaptive testing, only a few item responses are available so the ... [more ▼] Usual scoring rules in CATs include maximum likelihood (ML), weighted likelihood (WL) and Bayesian approaches. However, at early stages of adaptive testing, only a few item responses are available so the amount of information is very limited and in addition constant patterns (i.e. only correct or only incorrect responses) are often observed, yielding ML scoring intractable. Specific scoring rules (such as fixed- or variable stepsize adjustments) were developed for that purpose. However recent research highlighted that both Bayesian and WL scoring rules may provide finite values even with small sets of items. The purpose of this presentation is twofold: (a) to make a quick review of available scoring rules at early stages of CAT, and (b) to present empirical results from a simulation study that compares those scoring rules. More precisely, three scoring scenarios will be investigated: stepsize adjustment followed by ML, Bayes or WL followed by ML, and constant scoring rule throughout the CAT. These methods will be compared by means of simulated item banks and under various CAT scenarios for next item selection and stopping rules. Empirical results will be presented and practical guidelines for early stage scoring will be outlined. [less ▲] Detailed reference viewed: 21 (1 ULg)From psychometric research to implementation and back: selected examples Magis, David Conference (2015, February 13) Current psychometric research is most often supported by computer software. New research perspectives often imply intensive simulation studies to validate the tested theories or hypotheses, and therefore ... [more ▼] Current psychometric research is most often supported by computer software. New research perspectives often imply intensive simulation studies to validate the tested theories or hypotheses, and therefore require accurate implementation as e.g., R packages. However, it may happen that unexpected psychometric phenomena are detected almost accidentally, through such implementations with basically totally different purposes. This talk will illustrate this phenomenon by means of two recent examples from item response theory (IRT) with polytomous models: (a) the equivalence between the weighted likelihood estimator (WLE) and the Bayes modal estimator with Jeffreys prior, and (b) the relationships between observed and expected information functions. Rather than focusing on the technical details, the purpose of this talk is to highlight how the results were identified first through R implementation, then confirmed by theoretical derivations. The talk concludes by advocating for flexible and stable open-source implementations (such as R packages) to support current and ongoing psychometric research. [less ▲] Detailed reference viewed: 28 (4 ULg)Le testing adaptatif informatisé : une brève introduction Magis, David Conference (2015, January 27) L’objet de cet exposé est de présenter les grands principes et concepts du testing adaptatif informatisé (TAI). Les éléments abordés sont : le TAI face au test fixe (papier-crayon), les principes généraux ... [more ▼] L’objet de cet exposé est de présenter les grands principes et concepts du testing adaptatif informatisé (TAI). Les éléments abordés sont : le TAI face au test fixe (papier-crayon), les principes généraux du TAI (banque d’items, estimation provisoire et finale de la compétence, sélection des items à administrer, règles d’arrêt), les principes spécifiques au TAI (contrôle de l’exposition des items, équilibrage du contenu des tests). L’exposé se veut didactique et général afin de dessiner les contours du TAI. Il se termine par un état de l’art sur les recherches actuelles sur le TAI. [less ▲] Detailed reference viewed: 67 (4 ULg)Introduction to item response theory (IRT) and computerized adaptive testing (CAT) with the R software Magis, David Scientific conference (2015, January 13) Item response theory (IRT) has become an important field of research for psychology and educational assessment. Recently, with the increase of computational power, several IRT-related topics have emerged ... [more ▼] Item response theory (IRT) has become an important field of research for psychology and educational assessment. Recently, with the increase of computational power, several IRT-related topics have emerged, among others, computerized adaptive testing (CAT). The main aim of CAT is to provide a framework for individualized assessment by means of optimal item selection and administration to the test takers. CAT has several assets to linear (non-adaptive) testing: individualized assessment, limited risk of cheating or fraud, shorter tests providing the same amount of information as longer linear tests, automatic scoring and reporting at the end of the test. Practical use of CAT, however, remains limited so far due to several factors (lack of available large item banks, content validity and security, lack of suitable software for practical CAT assessment, ethical issues in administering different tests to estimate the same ability, etc.). The purpose of this workshop is threefold: (a) to provide a general overview of IRT and CAT, (b) to introduce the R software in a user-oriented way, as well as several IRT tools (including the package catR for CAT simulations), (c) to perform practical training sessions with the participants. The workshop will be a mix of oral presentations, demonstrations related to the R software, and practical sessions where participants will be invited to train with R and catR. The R software is an open-source platform for statistical inference and testing, graphical display and data visualization. It also holds several add-on packages for specific IRT purposes (item calibration, ability estimation, multidimensional scaling, equating, differential item functioning etc.). The R community is worldwide and proposes free exchanges of shared R packages through the CRAN (comprehensive R archive network). In this workshop, the R package catR will be examined and used in the practical sessions. [less ▲] Detailed reference viewed: 187 (2 ULg) |
||