HOW RELIABLE IS PEER-REVIEW OF SCIENTIFIC ABSTRACTS - LOOKING BACK AT THE 1991 ANNUAL-MEETING OF THE SOCIETY OF GENERAL INTERNAL-MEDICINE

被引:47
作者
RUBIN, HR
REDELMEIER, DA
WU, AW
STEINBERG, EP
机构
[1] the Division of Internal Medicine, the Program for Medical Technology and Practice Assessment, and the Department of Health Policy and Management, The Johns Hopkins University, Baltimore, Maryland
[2] the Department of Medicine and the Division of Clinical Epidemiology, Wellesley Hospital Research Institute, University of Toronto, Toronto, Ontario
关键词
PEER REVIEW; ABSTRACTS; INTERRATER RELIABILITY; JUDGMENT; AGREEMENT; PSYCHOMETRICS; ANALYSIS OF VARIANCE; GENERAL INTERNAL MEDICINE; RESEARCH;
D O I
10.1007/BF02600092
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objective: To evaluate the interrater reproducibility of scientific abstract review. Design: Retrospective analysis. Setting: Review for the 1991 Society of General Internal Medicine (SGIM) annual meeting. Subjects: 426 abstracts in seven topic categories evaluated by 55 reviewers. Measurements: Reviewers rated abstracts from 1 (poor) to 5 (excellent), globally and on three specific dimensions: interest to the SGIM audience, quality of methods, and quality of presentation. Each abstract was reviewed by five to seven reviewers. Each reviewer's ratings of the three dimensions were added to compute that reviewer's summary score for a given abstract. The mean of all reviewers' summary scores for an abstract, the final score, was used by SGIM to select abstracts for the meeting. Results: Final scores ranged from 4.6 to 13.6 (mean = 9.9). Although 222 abstracts (52%) were accepted for publication, the 95% confidence interval around the final score of 300 (70.4%) of the 426 abstracts overlapped with the threshold for acceptance of an abstract. Thus, these abstracts were potentially misclassified. Only 36% of the variance in summary scores was associated with an abstract's identity, 12% with the reviewer's identity, and the remainder with idiosyncratic reviews of abstracts. Global ratings were more reproducible than summary scores. Conclusion: Reviewers disagreed substantially when evaluating the same abstracts. Future meeting organizers may wish to rank abstracts using global ratings, and to experiment with structured review criteria and other ways to improve raters' agreement.
引用
收藏
页码:255 / 258
页数:4
相关论文
共 31 条
[1]  
Fletcher C.M., Oldham P.D., Bibliography on observer error and variation, Medical surveys and clinical trials, (1964)
[2]  
Koran L.M., The reliability of clinical methods, data and judgments, N Engl J Med, 293, pp. 642-6, (1975)
[3]  
Cole S., Cole J.R., Simon G.A., Chance and consensus in peer review, Science, 214, pp. 881-6, (1981)
[4]  
Feinstein A.R., A bibliography of publications on observer variability, J Chron Dis, 38, pp. 619-32, (1985)
[5]  
Smith R., Problems with peer review and alternatives, BMJ, 296, pp. 774-7, (1988)
[6]  
Green J.G., Calhoun F., Nierzwicki L., Brackett J., Meier P., Rating intervals: an experiment in peer review, FASEB J, 3, pp. 2987-92, (1989)
[7]  
McNutt R.A., Evans A.T., Fletcher R.H., Fletcher S.W., The effects of blinding on the quality of peer review: a randomized trial, JAMA, 263, pp. 1371-6, (1990)
[8]  
Cicchetti D.V., Conn H.O., A statistical analysis of reviewer agreement and bias in evaluating medical abstracts, Yale J Biol Med, 49, pp. 373-83, (1976)
[9]  
Bennett K.J., Sackett D.L., Haynes R.B., Neufeld V.R., Tugwell P., Roberts R., A controlled trial of teaching critical appraisal of the clinical literature to medical students, JAMA, 257, pp. 2451-4, (1987)
[10]  
Haynes R.B., McKibbon K.A., Fitzgerald D., Guyatt G.H., Walker C.J., Sackett D.L., How to keep up with the medical literature: why try to keep up and how to get started, Ann Intern Med, 105, pp. 149-53, (1986)