Integrating Biological Knowledge with Gene Expression Profiles for Survival Prediction of Cancer

被引:45
作者
Chen, Xi [1 ]
Wang, Lily [2 ]
机构
[1] Cleveland Clin, Dept QHS, Cleveland, OH 44195 USA
[2] Vanderbilt Univ, Dept Biostat, Nashville, TN USA
关键词
gene expression; gene ontology; microarrays; pathway analysis; survival prediction; BREAST-CANCER; REGRESSION-MODELS; PATHWAY ANALYSIS; CLASSIFICATION; CLASSIFIERS; METASTASIS; SIGNATURE; SELECTION; LASSO; SET;
D O I
10.1089/cmb.2008.12TT
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
Due to the large variability in survival times between cancer patients and the plethora of genes on microarrays unrelated to outcome, building accurate prediction models that are easy to interpret remains a challenge. In this paper, we propose a general strategy for improving performance and interpretability of prediction models by integrating gene expression data with prior biological knowledge. First, we link gene identifiers in expression dataset with gene annotation databases such as Gene Ontology (GO). Then we construct "supergenes" for each gene category by summarizing information from genes related to outcome using a modified principal component analysis (PCA) method. Finally, instead of using genes as predictors, we use these supergenes representing information from each gene category as predictors to predict survival outcome. In addition to identifying gene categories associated with outcome, the proposed approach also carries out additional within-category selection to select important genes within each gene set. We show, using two real breast cancer microarray datasets, that the prediction models constructed based on gene sets (or pathway) information outperform the prediction models based on expression values of single genes, with improved prediction accuracy and interpretability.
引用
收藏
页码:265 / 278
页数:14
相关论文
共 45 条
[1]
Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]
Estrogen up-regulates neuropeptide YY1 receptor expression in a human breast cancer cell line [J].
Amlal, H ;
Faroqui, S ;
Balasubramaniam, A ;
Sheriff, S .
CANCER RESEARCH, 2006, 66 (07) :3706-3714
[3]
P53, apoptosis and axon-guidance molecules [J].
Arakawa, H .
CELL DEATH AND DIFFERENTIATION, 2005, 12 (08) :1057-1065
[4]
Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]
Prediction by supervised principal components [J].
Bair, E ;
Hastie, T ;
Paul, D ;
Tibshirani, R .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) :119-137
[6]
Semi-supervised methods to predict patient survival from gene expression data [J].
Bair, E ;
Tibshirani, R .
PLOS BIOLOGY, 2004, 2 (04) :511-522
[7]
Gene-expression profiles predict survival of patients with lung adenocarcinoma [J].
Beer, DG ;
Kardia, SLR ;
Huang, CC ;
Giordano, TJ ;
Levin, AM ;
Misek, DE ;
Lin, L ;
Chen, GA ;
Gharib, TG ;
Thomas, DG ;
Lizyness, ML ;
Kuick, R ;
Hayasaka, S ;
Taylor, JMG ;
Iannettoni, MD ;
Orringer, MB ;
Hanash, S .
NATURE MEDICINE, 2002, 8 (08) :816-824
[8]
Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes [J].
Chen, Xi ;
Wang, Lily ;
Smith, Jonathan D. ;
Zhang, Bing .
BIOINFORMATICS, 2008, 24 (21) :2474-2481
[9]
Network-based classification of breast cancer metastasis [J].
Chuang, Han-Yu ;
Lee, Eunjung ;
Liu, Yu-Tsueng ;
Lee, Doheon ;
Ideker, Trey .
MOLECULAR SYSTEMS BIOLOGY, 2007, 3 (1)
[10]
COX DR, 1972, J R STAT SOC B, V34, P187