Measuring agreement in medical informatics reliability studies

被引:167
作者
Hripcsak, G
Heitjan, DF
机构
[1] Columbia Univ, Dept Med Informat, New York, NY 10032 USA
[2] Columbia Univ, Dept Biostat, Mailman Sch Publ Hlth, New York, NY USA
关键词
agreement; reliability; kappa; latent structure analysis; tetrachoric correlation; prevalence;
D O I
10.1016/S1532-0464(02)00500-2
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Agreement measures are used frequently in reliability studies that involve categorical data. Simple measures like observed agreement and specific agreement can reveal a good deal about the sample. Chance-corrected agreement in the form of the kappa statistic is used frequently based on its correspondence to an intraclass correlation coefficient and the ease of calculating it, but its magnitude depends on the tasks and categories in the experiment. It is helpful to separate the components of disagreement when the goal is to improve the reliability of an instrument or of the raters. Approaches based on modeling the decision making process can be helpful here, including tetrachoric correlation, polychoric correlation, latent trait models, and latent class models. Decision making models can also be used to better understand the behavior of different agreement inetrics. For example, if the observed prevalence of responses in one of two available categories is low, then there is insufficient information in the sample to judge raters' ability to discriminate cases, and kappa may underestimate the true agreement and observed agreement may overestimate it. (C) 2002 Elsevier Science (USA). All rights reserved.
引用
收藏
页码:99 / 110
页数:12
相关论文
共 36 条
[1]  
[Anonymous], 1997, EVALUATION METHODS M
[2]   Evaluation of the quality of information retrieval of clinical findings from a computerized patient database using a semantic terminological model [J].
Brown, PJB ;
Sönksen, P .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2000, 7 (04) :392-403
[3]   BIAS, PREVALENCE AND KAPPA [J].
BYRT, T ;
BISHOP, J ;
CARLIN, JB .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 1993, 46 (05) :423-429
[4]  
Carletta J, 1996, COMPUT LINGUIST, V22, P249
[5]   HIGH AGREEMENT BUT LOW KAPPA .2. RESOLVING THE PARADOXES [J].
CICCHETTI, DV ;
FEINSTEIN, AR .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 1990, 43 (06) :551-558
[7]   A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].
COHEN, J .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46
[8]   MEASURING AGREEMENT FOR MULTINOMIAL DATA [J].
DAVIES, M ;
FLEISS, JL .
BIOMETRICS, 1982, 38 (04) :1047-1051
[9]  
DUNN G, 1989, DESIGN ANAL RELIABIL, P154
[10]  
Efron B., 1993, INTRO BOOTSTRAP, V1st ed., DOI DOI 10.1201/9780429246593