Is an ROC-type Response Truly Always Better Than a Binary Response in Observer Performance Studies?

被引:17
作者
Gur, David [1 ]
Bandos, Andriy I. [2 ]
Rockette, Howard E. [2 ]
Zuley, Margarita L. [3 ]
Hakim, Christiane M. [3 ]
Chough, Denise M. [3 ]
Ganott, Marie A. [3 ]
Sumkin, Jules H. [3 ]
机构
[1] Univ Pittsburgh, Dept Radiol, Pittsburgh, PA 15213 USA
[2] Univ Pittsburgh, Grad Sch Publ Hlth, Dept Biostat, Pittsburgh, PA 15213 USA
[3] Magee Womens Hosp, Dept Radiol, Pittsburgh, PA USA
关键词
Breast cancer; digital breast tomosynthesis; observer performance; rating scale; DIGITAL BREAST TOMOSYNTHESIS; COMPUTER-AIDED DETECTION; DIAGNOSTIC-RADIOLOGY; MAMMOGRAPHY; CANCER; BOOTSTRAP; RATINGS; READERS; CURVE; AREA;
D O I
10.1016/j.acra.2009.12.012
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Rationale and Objectives: The aim of this study was to assess similarities and differences between methods of performance comparisons under binary (yes or no) and receiver-operating characteristic (ROC)-type pseudocontinuous (0-100) rating data ascertained during an observer performance study of interpretation of full-field digital mammography (FFDM) versus FFDM plus digital breast tomosynthesis. Materials and Methods: Rating data consisted of ROC-type pseudocontinuous and binary ratings generated by eight radiologists evaluating 77 digital mammographic examinations. Overall performance levels were summarized with a conventionally used probability of correct discrimination or, equivalently, the area under the ROC curve (AUC), which under a binary scale is related to Youden's index. Magnitudes of differences in the reader-averaged empirical AUCs between FFDM alone and FFDM plus digital breast tomosynthesis were compared in the context of fixed-reader and random-reader variability of the estimates. Results: The absolute differences between modes using the empirical AUCs were larger on average for the binary scale (0.12 vs 0.07) and for the majority of individual readers (six of eight). Standardized differences were consistent with this finding (2.32 vs 1.63 on average). Reader-averaged differences in AUCs standardized by fixed-reader and random-reader variances were also smaller under the binary rating paradigm. The discrepancy between AUC differences depended on the location of the reader-specific binary operating points. Conclusions: The human observer's operating point should be a primary consideration in designing an observer performance study. Although in general, the ROC-type rating paradigm provides more detailed information on the characteristics of different modes, it does not reflect the actual operating point adopted by human observers. There are application-driven scenarios in which analysis based on binary responses may provide statistical advantages.
引用
收藏
页码:639 / 645
页数:7
相关论文
共 27 条
[1]   Mortality Results from a Randomized Prostate-Cancer Screening Trial [J].
Andriole, Gerald L. ;
Grubb, Robert L., III ;
Buys, Saundra S. ;
Chia, David ;
Church, Timothy R. ;
Fouad, Mona N. ;
Gelmann, Edward P. ;
Kvale, Paul A. ;
Reding, Douglas J. ;
Weissfeld, Joel L. ;
Yokochi, Lance A. ;
Crawford, E. David ;
O'Brien, Barbara ;
Clapp, Jonathan D. ;
Rathmell, Joshua M. ;
Riley, Thomas L. ;
Hayes, Richard B. ;
Kramer, Barnett S. ;
Izmirlian, Grant ;
Miller, Anthony B. ;
Pinsky, Paul F. ;
Prorok, Philip C. ;
Gohagan, John K. ;
Berg, Christine D. .
NEW ENGLAND JOURNAL OF MEDICINE, 2009, 360 (13) :1310-1319
[2]  
[Anonymous], 2002, Statistical Methods in Diagnostic Medicine
[3]   Pulmonary nodules: Estimation of malignancy at thin-section helical CT - Effect of computer-aided diagnosis on performance of radiologists [J].
Awai, K ;
Murao, K ;
Ozawa, A ;
Nakayama, Y ;
Nakaura, T ;
Liu, D ;
Kawanaka, K ;
Funama, Y ;
Morishita, S ;
Yamashita, Y .
RADIOLOGY, 2006, 239 (01) :276-284
[4]   Exact bootstrap variances of the area under ROC curve [J].
Bandos, Andriy I. ;
Rockette, Howard E. ;
Gur, David .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2007, 36 (13-16) :2443-2461
[5]  
Bandura A., 2006, Psychological modeling, P1, DOI DOI 10.1109/SARNOF.2006.4534807
[6]   SATISFACTION OF SEARCH IN DIAGNOSTIC-RADIOLOGY [J].
BERBAUM, KS ;
FRANKEN, EA ;
DORFMAN, DD ;
ROOHOLAMINI, SA ;
KATHOL, MH ;
BARLOON, TJ ;
BEHLKE, FM ;
SATO, Y ;
LU, CH ;
ELKHOURY, GY ;
FLICKINGER, FW ;
MONTGOMERY, WJ .
INVESTIGATIVE RADIOLOGY, 1990, 25 (02) :133-140
[7]   An empirical comparison of discrete ratings and subjective probability ratings [J].
Berbaum, KS ;
Dorfman, DD ;
Franken, EA ;
Caldwell, RT .
ACADEMIC RADIOLOGY, 2002, 9 (07) :756-763
[8]   Collecting 48,000 CT Exams for the Lung Screening Study of the National Lung Screening Trial [J].
Clark, Kenneth W. ;
Gierada, David S. ;
Marquez, Guillermo ;
Moore, Stephen M. ;
Maffitt, David R. ;
Moulton, Joan D. ;
Wolfsberger, Mary A. ;
Koppel, Paul ;
Phillips, Stanley R. ;
Prior, Fred W. .
JOURNAL OF DIGITAL IMAGING, 2009, 22 (06) :667-680
[9]   RECEIVER OPERATING CHARACTERISTIC RATING ANALYSIS - GENERALIZATION TO THE POPULATION OF READERS AND PATIENTS WITH THE JACKKNIFE METHOD [J].
DORFMAN, DD ;
BERBAUM, KS ;
METZ, CE .
INVESTIGATIVE RADIOLOGY, 1992, 27 (09) :723-731
[10]   Context bias - A problem in diagnostic radiology [J].
Egglin, TKP ;
Feinstein, AR .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1996, 276 (21) :1752-1755