Reliability: on the reproducibility of assessment data

被引:469
作者
Downing, SM [1 ]
机构
[1] Univ Illinois, Coll Med, Dept Med Educ, Chicago, IL 60612 USA
关键词
education; medical; undergraduate; standards; educational measurement; reproducibility of results;
D O I
10.1111/j.1365-2929.2004.01932.x
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
CONTEXT All assessment data, like other scientific experimental data, must be reproducible in order to be meaningfully interpreted. PURPOSE The purpose of this paper is to discuss applications of reliability to the most common assessment methods in medical education. Typical methods of estimating reliability are discussed intuitively and non-mathematically. SUMMARY Reliability refers to the consistency of assessment outcomes. The exact type of consistency of greatest interest depends on the type of assessment, its purpose and the consequential use of the data. Written tests of cognitive achievement look to internal test consistency, using estimation methods derived from the test-retest design. Rater-based assessment data, such as ratings of clinical performance on the wards, require interrater consistency or agreement. Objective structured clinical examinations, simulated patient examinations and other performance-type assessments generally require generalisability theory analysis to account for various sources of measurement error in complex designs and to estimate the consistency of the generalisations to a universe or domain of skills. CONCLUSIONS Reliability is a major source of validity evidence for assessments. Low reliability indicates that large variations in scores can be expected upon retesting. Inconsistent assessment scores are difficult or impossible to interpret meaningfully and thus reduce validity evidence. Reliability coefficients allow the quantification and estimation of the random errors of measurement in assessments, such that overall assessment can be improved.
引用
收藏
页码:1006 / 1012
页数:7
相关论文
共 21 条
[1]  
[Anonymous], ED MEASUREMENT
[2]  
[Anonymous], 1999, STAND ED PSYCH TEST
[3]  
[Anonymous], 1996, Health measurement scales
[4]  
Brennan R. L., 2001, GEN THEORY, DOI 10.1007/978-1-0716-1621-5_15
[5]   A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].
COHEN, J .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46
[6]  
Crocker L., 2008, INTRO CLASSICAL MODE
[7]  
Cronbach LJ, 1951, PSYCHOMETRIKA, V16, P297
[8]   Generalisability: a key to unlock professional assessment [J].
Crossley, J ;
Davies, H ;
Humphris, G ;
Jolly, B .
MEDICAL EDUCATION, 2002, 36 (10) :972-978
[9]   Assessing health professionals [J].
Crossley, J ;
Humphris, G ;
Jolly, B .
MEDICAL EDUCATION, 2002, 36 (09) :800-804
[10]   Validity: on the meaningful interpretation of assessment data [J].
Downing, SM .
MEDICAL EDUCATION, 2003, 37 (09) :830-837