Does scale length matter? A comparison of nine- versus five-point rating scales for the mini-CEX

被引:71
作者
Cook, David A. [1 ,2 ]
Beckman, Thomas J. [1 ,2 ]
机构
[1] Mayo Clin, Coll Med, Div Gen Internal Med, Rochester, MN 55905 USA
[2] Mayo Clin, Coll Med, Off Educ Res, Rochester, MN 55905 USA
关键词
Medical education; Educational measurement; Clinical competence; Assessment; Reproducibility of results; Psychometrics; Interrater reliability; Accuracy; CLINICAL-EVALUATION EXERCISE; RESPONSE CATEGORIES; OPTIMAL NUMBER; RELIABILITY; VALIDITY; PERFORMANCE; COEFFICIENT; SKILLS;
D O I
10.1007/s10459-008-9147-x
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Educators must often decide how many points to use in a rating scale. No studies have compared interrater reliability for different-length scales, and few have evaluated accuracy. This study sought to evaluate the interrater reliability and accuracy of mini-clinical evaluation exercise (mini-CEX) scores, comparing the traditional mini-CEX nine-point scale to a five-point scale. Methods: The authors conducted a validity study in an academic internal medicine residency program. Fifty-two program faculty participated. Participants rated videotaped resident-patient encounters using the mini-CEX with both a nine-point scale and a five-point scale. Some cases were scripted to reflect a specific level of competence (unsatisfactory, satisfactory, superior). Outcome measures included mini-CEX scores, accuracy (scores compared to scripted competence level), interrater reliability, and domain intercorrelation. Results: Interviewing, exam, counseling, and overall ratings varied significantly across levels of competence (P < .0001). Nine-point scale scores accurately classified competence more often (391/720 [54%] for overall ratings) than five-point scores (316/723 [44%], P < .0001). Interrater reliability was similar for scores from the nine- and five-point scales (0.43 and 0.40, respectively, for overall ratings). With the exception of correlation between exam and counseling scores using the five-point scale (r = 0.38, P = .13), score correlations among all domain combinations were high (r = 0.46-0.89) and statistically significant (P a parts per thousand currency sign .015) for both scales. Conclusions: Mini-CEX scores demonstrated modest interrater reliability and accuracy. Although interrater reliability is similar for nine- and five-point scales, nine-point scales appear to provide more accurate scores. This has implications for many educational assessments.
引用
收藏
页码:655 / 664
页数:10
相关论文
共 20 条
[1]  
[Anonymous], 2001, Generalizability Theory
[2]  
[Anonymous], BIOMETRICS
[3]   How reliable are assessments of clinical teaching? A review of the published instruments [J].
Beckman, TJ ;
Ghosh, AK ;
Cook, DA ;
Erwin, PJ ;
Mandrekar, JN .
JOURNAL OF GENERAL INTERNAL MEDICINE, 2004, 19 (09) :971-977
[4]   Effect of Rater Training on Reliability and Accuracy of Mini-CEX Scores: A Randomized, Controlled Trial [J].
Cook, David A. ;
Dupras, Denise M. ;
Beckman, Thomas J. ;
Thomas, Kris G. ;
Pankratz, V. Shane .
JOURNAL OF GENERAL INTERNAL MEDICINE, 2009, 24 (01) :74-79
[5]   EQUIVALENCE OF WEIGHTED KAPPA AND INTRACLASS CORRELATION COEFFICIENT AS MEASURES OF RELIABILITY [J].
FLEISS, JL ;
COHEN, J .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1973, 33 (03) :613-619
[6]   THE EFFECT OF SCALE MANIPULATIONS ON VALIDITY - TARGETING FREQUENCY RATING-SCALES FOR ANTICIPATED PERFORMANCE LEVELS [J].
HANCOCK, GR ;
KLOCKARS, AJ .
APPLIED ERGONOMICS, 1991, 22 (03) :147-154
[7]  
Harvill L. M., 1991, Educational Measurement: Issues and Practice, V10, P33, DOI [https://doi.org/10.1111/j.1745-3992.1991.tb00195.x, DOI 10.1111/J.1745-3992.1991.TB00195.X, 10.1111/j.1745-3992.1991.tb00195.x]
[8]   Effects of training in direct observation of medical residents' clinical competence - A randomized trial [J].
Holmboe, ES ;
Hawkins, RE ;
Huot, SJ .
ANNALS OF INTERNAL MEDICINE, 2004, 140 (11) :874-881
[9]   Construct validity of the MiniClinical Evaluation Exercise (MiniCEX) [J].
Holmboe, ES ;
Huot, S ;
Chung, J ;
Norcini, J ;
Hawkins, RE .
ACADEMIC MEDICINE, 2003, 78 (08) :826-830
[10]   MONTE-CARLO STUDY OF FACTORS AFFECTING 3 INDEXES OF COMPOSITE SCALE RELIABILITY [J].
JENKINS, GD ;
TABER, TD .
JOURNAL OF APPLIED PSYCHOLOGY, 1977, 62 (04) :392-398