Estimation and comparison of CAD system performance in clinical settings

被引:15
作者
Bornefalk, H [1 ]
机构
[1] AlbaNova Univ ctr, Royal Inst Technol, Dept Phys, SE-10691 Stockholm, Sweden
关键词
CAD; performance evaluation; sampling error; confidence interval; mammography; operating point estimation;
D O I
10.1016/j.acra.2005.02.005
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Rationale and Objectives. Computer-aided detection (CAD) systems are frequently compared using free-response receiver operating characteristic (FROC) curves. While there are ample statistical methods for comparing FROC curves, when one is interested in comparing the outcomes of 2 CAD systems applied in a typical clinical setting, there is the additional matter of correctly determining the system operating point. This article shows how the effect of the sampling error on determining the correct CAD operating point can be captured. By incorporating this uncertainty, a method is presented that allows estimation of the probability with which a particular CAD system performs better than another on unseen data in a clinical setting. Materials and Methods. The distribution of possible clinical outcomes from 2 artificial CAD systems with different FROC curves is examined. The sampling error is captured by the distribution of possible system thresholds of the classifying machine that yields a specified sensitivity. After introducing a measure of superiority, the probability of one system being superior to the other can be determined. Results. It is shown that for 2 typical mammography CAD systems, each trained on independent representative datasets of 100 cases, the FROC curves must be separated by 0.20 false positives per image in order to conclude that there is a 90% probability that one is better than the other in a clinical setting. Also, there is no apparent gain in increasing the size of the training set beyond 100 cases. Discussion. CAD systems for mammography are modeled for illustrative purposes, but the method presented is applicable to any computer-aided detection system evaluated with FROC curves. The presented method is designed to construct confidence intervals around possible clinical outcomes and to assess the importance of training set size and separation between FROC curves of systems trained on different datasets.
引用
收藏
页码:687 / 694
页数:8
相关论文
共 22 条
[1]   Optimization and FROG analysis of rule-based detection schemes using a multiobjective approach [J].
Anastasio, MA ;
Kupinski, MA ;
Nishikawa, RM .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 1998, 17 (06) :1089-1093
[2]   Variability in the interpretation of screening mammograms by US radiologists - Findings from a national sample [J].
Beam, CA ;
Layde, PM ;
Sullivan, DC .
ARCHIVES OF INTERNAL MEDICINE, 1996, 156 (02) :209-213
[3]   On the comparison of FROC curves in mammography CAD systems [J].
Bornefalk, H ;
Hermansson, AB .
MEDICAL PHYSICS, 2005, 32 (02) :412-417
[4]  
BOWYER KW, 2000, HDB MED IMAGING, V2, P574
[5]  
BUNCH PC, 1978, J APPL PHOTOGR ENG, V4, P166
[6]   Statistical power in observer-performance studies: Comparison of the receiver operating characteristic and free-response methods in tasks involving localization [J].
Chakraborty, D .
ACADEMIC RADIOLOGY, 2002, 9 (02) :147-156
[7]   Observer studies involving detection and localization: Modeling, analysis, and validation [J].
Chakraborty, DP ;
Berbaum, KS .
MEDICAL PHYSICS, 2004, 31 (08) :2313-2330
[8]   Proposed solution to the FROC problem and an invitation to collaborate [J].
Chakraborty, DP .
MEDICAL IMAGING 2003: IMAGE PERCEPTION, OBSERVER PERFORMANCE, AND TECHNOLOGY ASSESSMENT, 2003, 5034 :204-212
[9]   MAXIMUM-LIKELIHOOD ANALYSIS OF FREE-RESPONSE RECEIVER OPERATING CHARACTERISTIC (FROC) DATA [J].
CHAKRABORTY, DP .
MEDICAL PHYSICS, 1989, 16 (04) :561-568
[10]   FREE-RESPONSE METHODOLOGY - ALTERNATE ANALYSIS AND A NEW OBSERVER-PERFORMANCE EXPERIMENT [J].
CHAKRABORTY, DP ;
WINTER, LHL .
RADIOLOGY, 1990, 174 (03) :873-881