A text based indexing system for mammographic image retrieval and classification

被引:23
作者
Farruggia, Alfonso [1 ]
Magro, Rosario [1 ]
Vitabile, Salvatore [1 ]
机构
[1] Univ Palermo, Dipartimento Biopatol & Biotecnol Med & Forensi, I-90127 Palermo, Italy
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2014年 / 37卷
关键词
Information retrieval; Medical documents indexing and classification; Medical images indexing and classification;
D O I
10.1016/j.future.2014.02.008
中图分类号
TP301 [理论、方法];
学科分类号
080201 [机械制造及其自动化];
摘要
In modern medical systems huge amount of text, words, images and videos are produced and stored in ad hoc databases. Medical community needs to extract precise information from that large amount of data. Currently ICT approaches do not provide a methodology for content-based medical images retrieval and classification. On the other hand, from the Internet of Things (IoT) perspective, the ICT medical data can be produced by several devices. Produced data complies with all Big Data features and constraints. The IoT guidelines put at the center of the system a new smart software to manage and transform Big Data in a new understanding form. This paper describes a text based indexing system for mammographic images retrieval and classification. The system deals with text (structured reports) and images (mammograms) mining and classification in a typical Department of Radiology. DICOM structured reports, containing free text for medical diagnosis, have been analyzed and labeled in order to classify the corresponding mammographic images. Information Retrieval process is based on some text manipulation techniques, such as light semantic analysis, stop-word removing, and light medical natural language processing. The system includes also a Search Engine module, based on a Bayes Naive Classifier. The experimental results provide interesting performance in terms of Specificity and Sensibility. Two more indexes have been computed in order to assess the system robustness: the A(Z) (Area under ROC Curve) index and the sigma(Az) (A(z) standard error) index. The dataset is composed of healthy and pathological DICOM structured reports. Two use case scenarios are presented and described to prove the effectiveness of the proposed approach. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:243 / 251
页数:9
相关论文
共 30 条
[1]
An information-theoretic perspective of tf-idf measures [J].
Aizawa, A .
INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) :45-65
[2]
[Anonymous], 2000, DICOM Structured Reporting
[3]
[Anonymous], 2006, PATTERN RECOGN
[4]
[Anonymous], 1949, Human behaviour and the principle of least-effort
[5]
Building text classifiers using positive and unlabeled examples [J].
Bing, L ;
Yang, D ;
Li, XL ;
Lee, WS ;
Yu, PS .
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, :179-186
[6]
Bradley A.P., USE AREA UNDER ROC C, V30, P1145
[7]
GUI Usability in Medical Imaging [J].
Cannella, Vincenzo ;
Gambino, Orazio ;
Pirrone, Roberto ;
Vitabile, Salvatore .
CISIS: 2009 INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS, VOLS 1 AND 2, 2009, :778-+
[8]
Integration of Heterogeneous Medical Decision Support Systems Based on Web Services [J].
Chang, Chung C. ;
Lu, Hsueh-Ming .
2009 9TH IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING, 2009, :415-422
[9]
Intelligent Clinical Decision Support Systems Based on SNOMED CT [J].
Ciolko, Ewelina ;
Lu, Fletcher ;
Joshi, Amardeep .
2010 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2010, :6781-6784
[10]
Das D., P 49 ANN M ASS COMP, V1, P600