Inter-rater and test-retest reliability of quality assessments by novice student raters using the Jadad and Newcastle-Ottawa Scales

被引：157

作者：

Oremus, Mark ^{[1
,2
]}

Oremus, Carolina ^{[3
,4
]}

Hall, Geoffrey B. C. ^{[3
,4
]}

McKinnon, Margaret C. ^{[3
,4
]}

机构：

[1] McMaster Univ, McMaster Evidence Based Practice Ctr, Hamilton, ON, Canada

[2] McMaster Univ, Dept Clin Epidemiol & Biostat, Hamilton, ON, Canada

[3] McMaster Integrat Neurosci Discovery & Study MIND, Hamilton, ON, Canada

[4] Dept Psychiat & Behav Neurosci, Hamilton, ON, Canada

来源：

BMJ OPEN | 2012年 / 2卷 / 04期

关键词：

RANDOMIZED CONTROLLED-TRIALS; TOOLS; RISK; BIAS;

D O I：

10.1136/bmjopen-2012-001368

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

Introduction: Quality assessment of included studies is an important component of systematic reviews. Objective: The authors investigated inter-rater and test-retest reliability for quality assessments conducted by inexperienced student raters. Design: Student raters received a training session on quality assessment using the Jadad Scale for randomised controlled trials and the Newcastle-Ottawa Scale (NOS) for observational studies. Raters were randomly assigned into five pairs and they each independently rated the quality of 13-20 articles. These articles were drawn from a pool of 78 papers examining cognitive impairment following electroconvulsive therapy to treat major depressive disorder. The articles were randomly distributed to the raters. Two months later, each rater re-assessed the quality of half of their assigned articles. Setting: McMaster Integrative Neuroscience Discovery and Study Program. Participants: 10 students taking McMaster Integrative Neuroscience Discovery and Study Program courses. Main outcome measures: The authors measured inter-rater reliability using kappa and the intraclass correlation coefficient type 2,1 or ICC(2,1). The authors measured test-retest reliability using ICC (2,1). Results: Inter-rater reliability varied by scale question. For the six-item Jadad Scale, question-specific kappa s ranged from 0.13 (95% CI -0.11 to 0.37) to 0.56 (95% CI 0.29 to 0.83). The ranges were -0.14 (95% CI -0.28 to 0.00) to 0.39 (95% CI -0.02 to 0.81) for the NOS cohort and -0.20 (95% CI -0.49 to 0.09) to 1.00 (95% CI 1.00 to 1.00) for the NOS case-control. For overall scores on the six-item Jadad Scale, ICC(2,1)s for inter-rater and test-retest reliability (accounting for systematic differences between raters) were 0.32 (95% CI 0.08 to 0.52) and 0.55 (95% CI 0.41 to 0.67), respectively. Corresponding ICC(2,1) s for the NOS cohort were -0.19 (95% CI -0.67 to 0.35) and 0.62 (95% CI 0.25 to 0.83), and for the NOS case-control, the ICC(2,1) s were 0.46 (95% CI -0.13 to 0.92) and 0.83 (95% CI 0.48 to 0.95). Conclusions: Inter-rater reliability was generally poor to fair and test-retest reliability was fair to excellent. A pilot rating phase following rater training may be one way to improve agreement.

引用

页数：6

共 33 条

[1] Classification and appraisal of the level of clinical evidence of publications from the Canadian Association of Pediatric Surgeons for the past 10 years [J].

Al-Harbi, Khalad ;

Farrokhyar, Forough ;

Mulla, Sohail ;

Fitzgerald, Peter .

JOURNAL OF PEDIATRIC SURGERY, 2009, 44 (05) :1013-1017

[2]

Altman DG, 1990, PRACTICAL STAT MED R, DOI DOI 10.1201/9780429258589

[3]

[Anonymous], EFF HLTH CAR PROGR

[4]

[Anonymous], 2008, HLTH MEASUREMENT SCA, DOI DOI 10.1093/ACPROF:OSO/9780199231881.001.0001

[5]

[Anonymous], COCHRANE HDB SYSTEMA

[6]

[Anonymous], 2003, Statistical Methods for Rates and Proportions

[7]

[Anonymous], E47 AHRQ

[8] Reliability of Chalmers' scale to assess quality in meta-analyses on pharmacological treatments for osteoporosis [J].

Bérard, A ;

Andreu, N ;

Tétrault, JP ;

Niyonsenga, T ;

Myhal, D .

ANNALS OF EPIDEMIOLOGY, 2000, 10 (08) :498-503

[9] A METHOD FOR ASSESSING THE QUALITY OF A RANDOMIZED CONTROL TRIAL [J].

CHALMERS, TC ;

SMITH, H ;

BLACKBURN, B ;

SILVERMAN, B ;

SCHROEDER, B ;

REITMAN, D ;

AMBROZ, A .

CONTROLLED CLINICAL TRIALS, 1981, 2 (01) :31-49

[10] A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].

COHEN, J .

EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46

← 1 2 3 4 →