Analytic versus holistic scoring of science performance tasks

被引:29
作者
Klein, SP
Stecher, BM
Shavelson, RJ
McCaffrey, D
Ormseth, T
Bell, RM
Comfort, K
Othman, AR
机构
[1] Rand Corp, Santa Monica, CA 90406 USA
[2] Stanford Univ, Sch Educ, Stanford, CA 94305 USA
[3] El Rancho Unified Sch Dist, Pico Rivera, CA USA
[4] WestEd, San Francisco, CA USA
[5] Univ Sains Malaysia, Ctr Distance Educ, Math Sci Program, Penang, Malaysia
关键词
D O I
10.1207/s15324818ame1102_1
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
We conducted 2 studies to investigate interreader consistency, score reliability, and reader time requirements of 3 hands-on science performance tasks. One study involved scoring the responses of students in Grades 5, 8, and 10 on 3 dimensions ("curriculum standards") of performance. The other study computed scores for each of the 3 parts of the Grade 5 and 8 tasks. Both studies used analytic and holistic scoring rubrics to grade responses but differed in the characteristics of these rubrics. Analytic scoring took much longer but led to higher interreader consistency. Nevertheless, when averaged over all the questions in a task, a student's holistic score was just as reliable as that student's analytic score. There was a very high correlation between analytic and holistic scores after they were disattenuated for inconsistencies among readers. Using 2 readers per answer does not appear to be a cost-effective means for increasing the reliability of task scores.
引用
收藏
页码:121 / 137
页数:17
相关论文
共 16 条
[1]  
[Anonymous], 1972, The dependability of behaviourial measurements: Theory of generalzsability for scores and profiles
[2]  
[Anonymous], 1994, EDUC MEAS-ISSUES PRA, DOI DOI 10.1111/J.1745-3992.1994.TB00778.X
[3]  
BAUER BA, 1981, ED216357 ERIC
[4]   EVALUATION OF PROCEDURE-BASED SCORING FOR HANDS-ON SCIENCE ASSESSMENT [J].
BAXTER, GP ;
SHAVELSON, RJ ;
GOLDMAN, SR ;
PINE, J .
JOURNAL OF EDUCATIONAL MEASUREMENT, 1992, 29 (01) :1-17
[5]  
DUNBAR SB, 1991, APPLIED MEASUREMENT, V4, P289, DOI DOI 10.1207/s15324818ame0404_3
[6]  
Hambleton R. K., 1995, Review of the Measurement Quality of the Kentucky Instructional Results Information System, 1991-1994
[7]   A COMPARISON OF PROCEDURES TO ASSESS WRITTEN LANGUAGE-SKILLS AT GRADE-4, GRADE-7, AND GRADE-10 [J].
MOSS, PA ;
COLE, NS ;
KHAMPALIKIT, C .
JOURNAL OF EDUCATIONAL MEASUREMENT, 1982, 19 (01) :37-47
[8]  
Saner H., 1994, ED ASSESSMENT, V2, P325, DOI [10.1207/s15326977ea0204_4, DOI 10.1207/S15326977EA0204_4]
[9]  
Shavelson R., 1991, GENERALIZABILITY THE
[10]  
Shavelson R. J., 1990, MIL PSYCHOL, V2, P129, DOI [10.1207/s15327876mp0203_1., DOI 10.1207/S15327876MP0203_1]