The comparisons of prognostic indexes using data mining techniques and Cox regression analysis in the breast cancer data

被引:17
作者
Ture, Mevlut [1 ]
Tokatli, Fusun [2 ]
Omurlu, Imran Kurt [1 ]
机构
[1] Trakya Univ, Fac Med, Dept Biostat, TR-22030 Edirne, Turkey
[2] Trakya Univ, Fac Med, Dept Radiat Oncol, TR-22030 Edirne, Turkey
关键词
Decision tree; C&RT; CHAID; QUEST; ID3; C4.5; C5.0; Cox regression; Kaplan-Meier; Breast cancer; Disease-free survival; Random survival forests;
D O I
10.1016/j.eswa.2008.10.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of this study is to determine new prognostic indexes for the differentiation of subgroups of breast cancer patients with the techniques of decision tree algorithms (C&RT, CHAID, QUEST, ID3, C4.5 and C5.0) and Cox regression analysis for disease-free survival (DFS) in breast cancer patients. A retrospective analysis was performed in 381 breast cancer patients diagnosed. Age, menopausal status, age of menarche, family history of cancer, histologic tumor type, quadrant of tumor, tumor size, estrogen and progesterone receptor status, histologic and nuclear grading, axillary nodal status, pericapsular involvement of lymph nodes, lymphovascular and perineural invasion, adjuvant radiotherapy, chemotherapy and hormonal therapy were assessed. Based on these prognostic factors, new prognostic indexes for C&RT, CHAID, QUEST, ID3, C4.5 and C5.0 and Cox regression were obtained. Prognostic indexes showed a good degree of classification, which demonstrates that an improvement seems possible using standard risk factors. We obtained that C4.5 has a better performance than C&RT, CHAID, QUEST, ID3, C5.0 and Cox regression to determine risk groups using Random Survival Forests (RSF). (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:8247 / 8254
页数:8
相关论文
共 24 条
[1]   Activation of Src kinase in primary colorectal carcinoma - An indicator of poor clinical prognosis [J].
Allgayer, H ;
Boyd, DD ;
Heiss, MM ;
Abdalla, EK ;
Curley, SA ;
Gallick, GE .
CANCER, 2002, 94 (02) :344-351
[2]  
[Anonymous], 1999, Applied Survival Analysis: Regresyon Modelling of Time to Event Data
[3]  
[Anonymous], 1997, DATA MINING TECHNIQU
[4]  
BENJAMIN KT, 2000, ANN M ACL P 2 WORKSH, V12, P38
[5]   HISTOLOGICAL GRADING AND PROGNOSIS IN BREAST CANCER - A STUDY OF 1409 CASES OF WHICH 359 HAVE BEEN FOLLOWED FOR 15 YEARS [J].
BLOOM, HJG ;
RICHARDSON, WW .
BRITISH JOURNAL OF CANCER, 1957, 11 (03) :359-&
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]  
Breiman L., 1984, Classification and regression trees, DOI DOI 10.1201/9781315139470
[8]  
Buchholz TA., 2003, RADIAT ONCOL, V8th, P333
[9]   Data mining approach to policy analysis in a health insurance domain [J].
Chae, YM ;
Ho, SH ;
Cho, KW ;
Lee, DH ;
Ji, SH .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2001, 62 (2-3) :103-111
[10]   OPTIMIZATION OF MECHANICAL ASSEMBLY TOLERANCES BY INCORPORATING TAGUCHIS QUALITY LOSS FUNCTION [J].
CHENG, BW ;
MAGHSOODLOO, S .
JOURNAL OF MANUFACTURING SYSTEMS, 1995, 14 (04) :264-276