Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method

被引:33
作者
Guan, Peng [1 ,2 ]
Huang, Desheng [1 ,2 ]
He, Miao [3 ]
Zhou, Baosen [1 ,2 ]
机构
[1] China Med Univ, Sch Publ Hlth, Dept Epidemiol, Shenyang 110001, Peoples R China
[2] Univ Liaoning Prov, Key Lab Canc Etiol & Intervent, Shenyang 110001, Peoples R China
[3] China Med Univ, Affiliated Hosp 1, Informat Ctr, Shenyang 110001, Peoples R China
来源
JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH | 2009年 / 28卷
关键词
MICROARRAY DATA; FEATURE-SELECTION; CLASS PREDICTION; SIGNATURES; MORTALITY; ADENOCARCINOMA; EPIDEMIOLOGY; REGRESSION; CENTROIDS; SURVIVAL;
D O I
10.1186/1756-9966-28-103
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Background: A reliable and precise classification is essential for successful diagnosis and treatment of cancer. Gene expression microarrays have provided the high-throughput platform to discover genomic biomarkers for cancer diagnosis and prognosis. Rational use of the available bioinformation can not only effectively remove or suppress noise in gene chips, but also avoid one-sided results of separate experiment. However, only some studies have been aware of the importance of prior information in cancer classification. Methods: Together with the application of support vector machine as the discriminant approach, we proposed one modified method that incorporated prior knowledge into cancer classification based on gene expression data to improve accuracy. A public well-known dataset, Malignant pleural mesothelioma and lung adenocarcinoma gene expression database, was used in this study. Prior knowledge is viewed here as a means of directing the classifier using known lung adenocarcinoma related genes. The procedures were performed by software R 2.80. Results: The modified method performed better after incorporating prior knowledge. Accuracy of the modified method improved from 98.86% to 100% in training set and from 98.51% to 99.06% in test set. The standard deviations of the modified method decreased from 0.26% to 0 in training set and from 3.04% to 2.10% in test set. Conclusion: The method that incorporates prior knowledge into discriminant analysis could effectively improve the capacity and reduce the impact of noise. This idea may have good future not only in practice but also in methodology.
引用
收藏
页数:7
相关论文
共 38 条
[1]   Breast cancer molecular signatures as determined by SAGE: correlation with lymph node status [J].
Abba, Martin C. ;
Sun, Hongxia ;
Hawkins, Kathleen A. ;
Drake, Jeffrey A. ;
Hu, Yuhui ;
Nunez, Maria I. ;
Gaddis, Sally ;
Shi, Tao ;
Horvath, Steve ;
Sahin, Aysegul ;
Aldaz, C. Marcelo .
MOLECULAR CANCER RESEARCH, 2007, 5 (09) :881-890
[2]   Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information [J].
Al-Shahrour, F ;
Díaz-Uriarte, R ;
Dopazo, J .
BIOINFORMATICS, 2005, 21 (13) :2988-2993
[3]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[4]   Identifying genes that contribute most to good classification in microarrays [J].
Baker, Stuart G. ;
Kramer, Barnett S. .
BMC BIOINFORMATICS, 2006, 7 (1)
[5]   Gene-expression profiles predict survival of patients with lung adenocarcinoma [J].
Beer, DG ;
Kardia, SLR ;
Huang, CC ;
Giordano, TJ ;
Levin, AM ;
Misek, DE ;
Lin, L ;
Chen, GA ;
Gharib, TG ;
Thomas, DG ;
Lizyness, ML ;
Kuick, R ;
Hayasaka, S ;
Taylor, JMG ;
Iannettoni, MD ;
Orringer, MB ;
Hanash, S .
NATURE MEDICINE, 2002, 8 (08) :816-824
[6]   Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses [J].
Bhattacharjee, A ;
Richards, WG ;
Staunton, J ;
Li, C ;
Monti, S ;
Vasa, P ;
Ladd, C ;
Beheshti, J ;
Bueno, R ;
Gillette, M ;
Loda, M ;
Weber, G ;
Mark, EJ ;
Lander, ES ;
Wong, W ;
Johnson, BE ;
Golub, TR ;
Sugarbaker, DJ ;
Meyerson, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) :13790-13795
[7]   Sample entropy analysis of cervical neoplasia gene-expression signatures [J].
Botting, Shaleen K. ;
Trzeciakowski, Jerome P. ;
Benoit, Michelle F. ;
Salama, Salama A. ;
Diaz-Arrastia, Concepcion R. .
BMC BIOINFORMATICS, 2009, 10
[8]   Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value [J].
Boulesteix, Anne-Laure ;
Porzelius, Christine ;
Daumer, Martin .
BIOINFORMATICS, 2008, 24 (15) :1698-1706
[9]   Identification of genes down-regulated during lung cancer progression: A cDNA array study [J].
Campioni, Mara ;
Ambrogi, Vincenzo ;
Pompeo, Eugenio ;
Citro, Gennaro ;
Castelli, Mauro ;
Spugnini, Enrico P. ;
Gatti, Antonio ;
Cardelli, Pierluigi ;
Lorenzon, Laura ;
Baldi, Alfonso ;
Mineo, Tommaso C. .
JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH, 2008, 27 (1)
[10]   A new regularized least squares support vector regression for gene selection [J].
Chen, Pei-Chun ;
Huang, Su-Yun ;
Chen, Wei J. ;
Hsiao, Chuhsing K. .
BMC BIOINFORMATICS, 2009, 10