ASSESSING THE RELIABILITY OF 2 TOXICITY SCALES - IMPLICATIONS FOR INTERPRETING TOXICITY DATA

被引:95
作者
BRUNDAGE, MD
PATER, JL
ZEE, B
机构
[1] QUEENS UNIV,DEPT COMMUNITY HLTH & EPIDEMIOL,KINGSTON K7L 3N6,ONTARIO,CANADA
[2] ONTARIO CANC TREATMENT & RES FDN,KINGSTON REG CANC CTR,RADIAT ONCOL RES UNIT,KINGSTON,ON,CANADA
关键词
D O I
10.1093/jnci/85.14.1138
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Background: The toxicity of a given cancer therapy is an important end point in clinical trials examining the potential costs and benefits of that therapy. Treatment-related toxicity is conventionally measured with one of several toxicity criteria grading scales, even though the reliability and validity of these scales have not been established. Purpose: We determined the reliability of the National Cancer Institute of Canada Clinical Trials Group (NCIC-CTG) expanded toxicity scale and the World Health Organization (WHO) standard toxicity scale by use of a clinical simulation of actual patients. Methods: Seven experienced data managers each interviewed 12 simulated patients and scored their respective acute toxic effects. Inter-rater agreement (agreement between multiple raters of the same case) was calculated using the kappa (kappa) statistic across all seven randomly assigned raters for each of 18 toxicity categories (13 NCIC-CTG and five WHO categories). Intra-rater agreement (agreement within the same rater on one case rated on separate occasions) was calculated using kappa over repeated cases (where raters were blinded to the repeated nature of the subjects). Proportions of agreement (estimate of the probability of two randomly selected raters assigning the same toxicity grade to a given case) were also calculated for inter-rater agreement. Since minor lack of agreement might have adversely affected these statistics of agreement, both kappa and proportion of agreement analyses were repeated for the following condensed grading categories: none (0) versus low-grade (1 or 2) versus high-grade (3 or 4) toxicity present. Results: Modest levels of inter-rater reliability were demonstrated in this study with kappa values that ranged from 0.50 to 1.00 in laboratory-based categories and from -0.04 to 0.82 for clinically based categories. Proportions of agreement for clinical categories ranged from 0.52 to 0.98. Condensing the toxicity grades improved statistics of agreement, but substantial lack of agreement remained (kappa range, -0.04-0.82; proportions of agreement range, 0.67-0.98). Conclusions: Experienced data managers, when interviewing patients, draw varying conclusions regarding toxic effects experienced by such patients. Neither the NCIC-CTG expanded toxicity scale nor the WHO standard toxicity scale demonstrated a clear superiority in reliability, although the breadth of toxic effects recorded differed.
引用
收藏
页码:1138 / 1148
页数:11
相关论文
共 48 条
  • [1] THE ASSESSMENT OF SUBJECTIVE RESPONSE IN PROSTATIC-CANCER CLINICAL RESEARCH
    AARONSON, NK
    [J]. AMERICAN JOURNAL OF CLINICAL ONCOLOGY-CANCER CLINICAL TRIALS, 1988, 11 : S43 - S47
  • [2] ADELSTEIN DJ, 1990, CANCER-AM CANCER SOC, V65, P1685, DOI 10.1002/1097-0142(19900415)65:8<1685::AID-CNCR2820650804>3.0.CO
  • [3] 2-S
  • [4] ARBUCK SG, 1990, CANCER, V65, P2442, DOI 10.1002/1097-0142(19900601)65:11<2442::AID-CNCR2820651106>3.0.CO
  • [5] 2-7
  • [6] BARROWS HS, 1964, J MED EDUC, V39, P802
  • [7] OBSERVER VARIATION IN THE CLASSIFICATION OF INFORMATION FROM MEDICAL RECORDS
    BOYD, NF
    PATER, JL
    GINSBURG, AD
    MYERS, RE
    [J]. JOURNAL OF CHRONIC DISEASES, 1979, 32 (04): : 327 - 332
  • [8] BUDD GT, 1990, CANCER, V65, P866, DOI 10.1002/1097-0142(19900215)65:4<866::AID-CNCR2820650406>3.0.CO
  • [9] 2-#
  • [10] HIGH AGREEMENT BUT LOW KAPPA .2. RESOLVING THE PARADOXES
    CICCHETTI, DV
    FEINSTEIN, AR
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 1990, 43 (06) : 551 - 558