Validation and Statistical Power Comparison of Methods for Analyzing Free-response Observer Performance Studies

被引:89
作者
Chakraborty, Dev P. [1 ]
机构
[1] Univ Pittsburgh, Dept Radiol, Pittsburgh, PA 15261 USA
关键词
Observer performance; evaluation methodologies; validation; statistical power; free-response; FROC; JAFROC; CAD evaluation;
D O I
10.1016/j.acra.2008.07.018
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Rationale and Objectives. The aim of this work was to validate and compare the statistical powers of proposed methods for analyzing free-response data using a search-model-based simulator. Materials and Methods. A free-response data simulator is described that can model a single reader interpreting the same cases in two modalities, or two computer-aided detection (CAD) algorithms, or two human observers, interpreting the same cases in one modality. A variance components model, analogous to the Roe and Metz receiver-operating characteristic (ROC) data simulator, is described; it models intracase and intermodality correlations in free-response studies. Two generic observers were simulated: a quasi-human observer and a quasi-CAD algorithm. Null hypothesis (NH) validity and statistical powers of ROC, jackknife alternative free-response operating characteristic (JAFROC), a variant of JAFROC termed JAFROC-1, initial detection and candidate analysis (IDCA), and a nonparametric (NP) approach were investigated. Results. All methods had valid NH behavior over a wide range of simulator parameters. For equal numbers of normal and abnormal cases, for the human observer, the statistical power ranking of the methods was JAFROC-1 > JAFROC > (IDCA similar to NP) > ROC. For the CAD algorithm, the ranking was (NP similar to IDCA) > (JAFROC-1 similar to JAFROC) > ROC. In either case, the statistical power of the highest ranked method exceeded that of the lowest ranked method by about a factor of two. Dependence of statistical power on simulator parameters followed expected trends. For data sets with more abnormal cases than normal cases, JAFROC-1 power significantly exceeded JAFROC power. Conclusion. Based on this work, the recommendation is to use JAFROC-1 for human observers (including human observers with CAD assist) and the NP method for evaluating CAD algorithms.
引用
收藏
页码:1554 / 1566
页数:13
相关论文
共 35 条
[1]  
[Anonymous], 2009, GNU SCI LIB REFERENC, Patent No. 552301619
[2]  
BUNCH PC, 1978, J APPL PHOTOGR ENG, V4, P166
[3]   Operating characteristics predicted by models for diagnostic tasks involving lesion localization [J].
Chakraborty, D. P. ;
Yoon, Hong-Jun .
MEDICAL PHYSICS, 2008, 35 (02) :435-445
[4]   ROC curves predicted by a model of visual search [J].
Chakraborty, D. P. .
PHYSICS IN MEDICINE AND BIOLOGY, 2006, 51 (14) :3463-3482
[5]   A search model and figure of merit for observer data acquired according to the free-response paradigm [J].
Chakraborty, D. P. .
PHYSICS IN MEDICINE AND BIOLOGY, 2006, 51 (14) :3449-3462
[6]   Observer studies involving detection and localization: Modeling, analysis, and validation [J].
Chakraborty, DP ;
Berbaum, KS .
MEDICAL PHYSICS, 2004, 31 (08) :2313-2330
[7]   MAXIMUM-LIKELIHOOD ANALYSIS OF FREE-RESPONSE RECEIVER OPERATING CHARACTERISTIC (FROC) DATA [J].
CHAKRABORTY, DP .
MEDICAL PHYSICS, 1989, 16 (04) :561-568
[8]   FREE-RESPONSE METHODOLOGY - ALTERNATE ANALYSIS AND A NEW OBSERVER-PERFORMANCE EXPERIMENT [J].
CHAKRABORTY, DP ;
WINTER, LHL .
RADIOLOGY, 1990, 174 (03) :873-881
[9]   MAXIMUM-LIKELIHOOD ESTIMATION OF PARAMETERS OF SIGNAL-DETECTION THEORY AND DETERMINATION OF CONFIDENCE INTERVALS - RATING-METHOD DATA [J].
DORFMAN, DD ;
ALF, E .
JOURNAL OF MATHEMATICAL PSYCHOLOGY, 1969, 6 (03) :487-&
[10]   Proper receiver operating characteristic analysis: The bigamma model [J].
Dorfman, DD ;
Berbaum, KS ;
Metz, CE ;
Lenth, RV ;
Hanley, JA ;
AbuDagga, H .
ACADEMIC RADIOLOGY, 1997, 4 (02) :138-149