A COMPARISON OF METHODS FOR CALCULATING A STRATIFIED-KAPPA

被引:45
作者
BARLOW, W
LAI, MY
AZEN, SP
机构
[1] Center for Health Studies, Group Health Cooperative, Seattle, Washington, 98101-1448, 1730 Minor Ave
[2] Department of Preventive Medicine, USC School of Medicine, Los Angeles, California
关键词
D O I
10.1002/sim.4780100913
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Investigators use the kappa coefficient to measure chance-corrected agreement among observers in the classification of subjects into nominal categories. The marginal probability of classification may depend, however, on one or more confounding variables. We consider assessment of interrater agreement with subjects grouped into strata on the basis of these confounders. We assume overall agreement across strata is constant and consider a stratified index of agreement, or 'stratified kappa', based on weighted summations of the individual kappas. We use three weighting schemes: (1) equal weighting; (2) weighting by the size of the table; and (3) weighting by the inverse of the variance. In a simulation study we compare these methods under differing probability structures and differing sample sizes for the tables. We find weighting by sample size moderately efficient under most conditions. We illustrate the techniques by assessing agreement between surgeons and graders of fundus photographs with respect to retinal characteristics, with stratification by initial severity of the disease.
引用
收藏
页码:1465 / 1472
页数:8
相关论文
共 17 条
[1]  
Cohen J., A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20, pp. 37-46, (1960)
[2]  
Cohen J., Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit, Psychological Bulletin, 70, pp. 213-220, (1968)
[3]  
Brennan R.L., Prediger D.J., Coefficient kappa: some uses, misuses, and alternatives, Educational and Psychological Measurement, 41, pp. 687-699, (1981)
[4]  
Light R.J., Measures of response agreement for qualitative data: some generalizations and alternatives, Psychological Bulletin, 76, pp. 365-377, (1971)
[5]  
Fleiss J.L., Measuring nominal scale agreement among many raters, Psychological Bulletin, 76, pp. 378-382, (1971)
[6]  
Davies M., Fleiss J.L., Measuring agreement for multinomial data, Biometrics, 38, pp. 1047-1051, (1982)
[7]  
Gross S.T., The kappa coefficient of agreement for multiple observers when the number of subjects is small, Biometrics, 42, pp. 883-893, (1986)
[8]  
Kraemer H.C., Extensions of the kappa coefficient, Biometrics, 36, pp. 207-216, (1980)
[9]  
Fleiss J.L., Nee C.M., Landis J.R., The large sample variance of kappa in the case of different sets of raters, Psychological Bulletin, 86, pp. 974-977, (1979)
[10]  
Everitt B.S., Moments of the statistics kappa and weighted kappa, British Journal of Mathematical and Statistical Psychology, 21, pp. 97-103, (1968)