An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning

被引:54
作者
Ankenmann, RD [1 ]
Witt, EA
Dunbar, SB
机构
[1] Univ Iowa, Coll Educ, Iowa City, IA 52242 USA
[2] Assessment Syst Inc, Psychometr Serv, Bala Cynwyd, PA 19004 USA
关键词
D O I
10.1111/j.1745-3984.1999.tb00558.x
中图分类号
G44 [教育心理学];
学科分类号
0402 ; 040202 ;
摘要
The purpose of this study was to investigate the power and Type I error rate of the the likelihood ratio goodness-of-fit (LR) statistic in detecting differential item functioning (DIF) under Samejima's (1969, 1972) graded response model. A multiple-replication Monte Carlo study was utilized in which DIF was modeled in simulated data sets which were then calibrated with MULTILOG (Thissen, 1991) using hierarchically nested item response models. In addition, the power and Type I error rate of the Mantel (1963) approach for detecting DIF in ordered response categories were investigated using the same simulated data, for comparative purposes. The power of bath the Mantel and LR procedures was affected by sample size, as expected. The LR procedure lacked the power to consistently detect DIF when it existed in reference/focal groups with sample sizes as small as 500/500 The Mantel procedure maintained control of its Type I error rate and was more powerful than the LR procedure when the comparison group ability distributions were identical and there was a constant DIF pattern. On the other hand, the Mantel procedure last control of its Type I error rate, whereas the LR procedure did not, when the comparison groups differed in mean ability; and the LR procedure demonstrated a profound power advantage over the Mantel procedure under conditions of balanced DIF in which the comparison group ability distributions were identical. The choice and subsequent use of any procedure requires a thorough understanding of the power and Type I error rates of the procedure under varying conditions of DIF pattern, comparison group ability distributions-or as a surrogate, observed score distributions-and item characteristics.
引用
收藏
页码:277 / 300
页数:24
相关论文
共 34 条
[1]  
ANKENMANN RD, 1994, THESIS U PITTSBURGH
[2]  
[Anonymous], 1993, IOWA TESTS BASIC SKI
[3]  
[Anonymous], ANN M AM ED RES ASS
[4]  
[Anonymous], PSYCHOMETRIKA MONO S
[5]  
Camilli G., 1994, METHODS IDENTIFYING
[6]  
CHANG H, 1993, ANN M AM ED RES ASS
[7]   An investigation of the likelihood ratio test for detection of differential item functioning [J].
Cohen, AS ;
Kim, SH ;
Wollack, JA .
APPLIED PSYCHOLOGICAL MEASUREMENT, 1996, 20 (01) :15-26
[8]  
Donoghue J. R., 1993, Differential item functioning, P137
[9]  
Dorans N. J., 1993, Differential item functioning, P35, DOI 10.1002/j.2333-8504.1992.tb01440.x
[10]  
DORANS NJ, 1993, CONSTRUCTION VERSUS CHOICE IN COGNITIVE MEASUREMENT : ISSUES IN CONSTRUCTED RESPONSE, PERFORMANCE TESTING, AND PORTFOLIO ASSESSMENT, P135