On evaluating brain tissue classifiers without a ground truth

被引:69
作者
Bouix, Sylvain [1 ]
Martin-Fernandez, Marcos
Ungar, Lida
Nakamura, Motoaki
Koo, Min-Seong
McCarley, Robert W.
Shenton, Martha E.
机构
[1] Brigham & Womens Hosp, Dept Psychiat, Psychiat Neuroimaging Lab, Boston, MA 02115 USA
[2] Harvard Med Sch, Dept Psychiat, Brockton Div, Clin Neurosci Div,Lab Neurosci,Boston VA Healthca, Boston, MA USA
[3] Brigham & Womens Hosp, Dept Radiol, Lab Math Imaging, Boston, MA 02115 USA
[4] Univ Valladolid, Lab Proc Imagen, Valladolid, Spain
关键词
evaluation; validation; image segmentation; agreement; gold standard;
D O I
10.1016/j.neuroimage.2007.04.031
中图分类号
Q189 [神经科学];
学科分类号
071006 [神经生物学];
摘要
In this paper, we present a set of techniques for the evaluation of brain tissue classifiers on a large data set of MR images of the head. Due to the difficulty of establishing a gold standard for this type of data, we focus our attention on methods which do not require a ground truth, but instead rely on a common agreement principle. Three different techniques are presented: the Williams' index, a measure of common agreement; STAPLE, an Expectation Maximization algorithm which simultaneously estimates performance parameters and constructs an estimated reference standard; and Multidimensional Scaling, a visualization technique to explore similarity data. We apply these different evaluation methodologies to a set of eleven different segmentation algorithms on forty MR images. We then validate our evaluation pipeline by building a ground truth based on human expert tracings. The evaluations with and without a ground truth are compared. Our findings show that comparing classifiers without a gold standard can provide a lot of interesting information. In particular, outliers can be easily detected, strongly consistent or highly variable techniques can be readily discriminated, and the overall similarity between different techniques can be assessed. On the other hand, we also rind that sonic information present in the expert segmentations is not captured by the automatic classifiers, suggesting that common agreement alone may not be sufficient for a precise performance evaluation of brain tissue classifiers. (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:1207 / 1224
页数:18
相关论文
共 32 条
[1]
[Anonymous], 2003, HUMAN BRAIN FUNCTION
[2]
Borg I., 1997, MODERN MULTIDIMENSIO
[3]
A methodology for evaluation of boundary detection algorithms on medical images [J].
Chalana, V ;
Kim, YM .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 1997, 16 (05) :642-652
[4]
Design and construction of a realistic digital brain phantom [J].
Collins, DL ;
Zijdenbos, AP ;
Kollokian, V ;
Sled, JG ;
Kabani, NJ ;
Holmes, CJ ;
Evans, AC .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 1998, 17 (03) :463-468
[5]
Cox T.F., 2000, MULTIDIMENSIONAL SCA
[6]
Gerig G., 2001, INT C MEDICAL IMAGE, VVol. 2208, P516, DOI [10.1007/3-540-45468-3_62, DOI 10.1007/3-540-45468-3_62]
[7]
Improved watershed transform for medical image segmentation using prior information [J].
Grau, V ;
Mewes, AUJ ;
Alcañiz, M ;
Kikinis, R ;
Warfield, SK .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2004, 23 (04) :447-458
[8]
Measuring agreement in medical informatics reliability studies [J].
Hripcsak, G ;
Heitjan, DF .
JOURNAL OF BIOMEDICAL INFORMATICS, 2002, 35 (02) :99-110
[9]
Jaccard P., 1901, Bull Soc Vaudoise Sci Nat, V37, P547, DOI DOI 10.5169/SEALS-266450
[10]
Evaluation of three-dimensional segmentation algorithms for the identification of luminal and medial-adventitial borders in intravascular ultrasound images [J].
Klingensmith, JD ;
Shekhar, R ;
Vince, DG .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2000, 19 (10) :996-1011