How invariant and accurate are domain ratings in writing assessment?

被引:30
作者
Wind, Stefanie A. [1 ]
Engelhard, George, Jr. [1 ]
机构
[1] Univ Georgia, Athens, GA 30602 USA
关键词
Rating quality; Analytic rubric; Rasch measurement theory; Rater invariance; Rater accuracy; BEHAVIOR;
D O I
10.1016/j.asw.2013.09.002
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
The use of evidence to guide policy and practice in education (Cooper, Levin, & Campbell, 2009) has included an increased emphasis on constructed-response items, such as essays and portfolios. Because assessments that go beyond selected-response items and incorporate constructed-response items are rater-mediated (Engelhard, 2002, 2013), it is necessary to develop evidence-based indices of quality for the rating processes used to evaluate student performances. This study proposes a set of criteria for evaluating the quality of ratings based on the concepts of measurement invariance and accuracy within the context of a large-scale writing assessment. Two measurement models are used to explore indices of quality for raters and ratings: the first model provides evidence for the invariance of ratings, and the second model provides evidence for rater accuracy. Rating quality is examined within four writing domains from an analytic rubric. Further, this study explores the alignment between indices of rating quality based on these invariance and accuracy models within each of the four domains of writing. Major findings suggest that rating quality varies across analytic rubric domains, and that there is some correspondence between indices of rating quality based on the invariance and accuracy models. Implications for research and practice are discussed. Crown Copyright (C) 2013 Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:278 / 299
页数:22
相关论文
共 46 条
[1]  
Andrich D., 1982, ED RES PERSPECTIVES, V9, P95
[2]  
[Anonymous], MANUAL RELATING H S
[3]  
[Anonymous], SCALE HIST WRITING A
[4]  
[Anonymous], FACETS RASCH MEASURE
[5]  
[Anonymous], TOFEL IBT TEST SCOR
[6]  
[Anonymous], RAC TOP ASS PROGR EX
[7]  
[Anonymous], GEORG GRAD 8 WRIT AS
[8]  
[Anonymous], J APPL MEASUREMENT
[9]   Think-aloud protocols in research on essay rating: An empirical study of their veridicality and reactivity [J].
Barkaoui, Khaled .
LANGUAGE TESTING, 2011, 28 (01) :51-75
[10]   Recurrent issues and recent advances in scoring performance assessments [J].
Clauser, BE .
APPLIED PSYCHOLOGICAL MEASUREMENT, 2000, 24 (04) :310-324