Computational prediction of human proteins that can be secreted into the bloodstream

被引:47
作者
Cui, Juan [1 ]
Liu, Qi [1 ,2 ]
Puett, David [1 ]
Xu, Ying [1 ,3 ]
机构
[1] Univ Georgia, Dept Biochem & Mol Biol, Athens, GA 30602 USA
[2] Zhejiang Univ, Zhejiang California Int Nanosyst Inst, Hangzhou 310029, Zhejiang, Peoples R China
[3] Univ Georgia, Inst Bioinformat, Athens, GA 30602 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btn418
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We present a novel computational method for predicting which proteins from highly and abnormally expressed genes in diseased human tissues, such as cancers, can be secreted into the bloodstream, suggesting possible marker proteins for follow-up serum proteomic studies. A main challenging issue in tackling this problem is that our understanding about the downstream localization after proteins are secreted outside the cells is very limited and not sufficient to provide useful hints about secretion to the bloodstream. To bypass this difficulty, we have taken a data mining approach by first collecting, through extensive literature searches, human proteins that are known to be secreted into the bloodstream due to various pathological conditions as detected by previous proteomic studies, and then asking the question: what do these secreted proteins have in common in terms of their physical and chemical properties, amino acid sequence and structural features that can be used to predict them? We have identified a list of features, such as signal peptides, transmembrane domains, glycosylation sites, disordered regions, secondary structural content, hydrophobicity and polarity measures that show relevance to protein secretion. Using these features, we have trained a support vector machine-based classifier to predict protein secretion to the bloodstream. On a large test set containing 98 secretory proteins and 6601 non-secretory proteins of human, our classifier achieved 90 prediction sensitivity and 98 prediction specificity. Several additional datasets are used to further assess the performance of our classifier. On a set of 122 proteins that were found to be of abnormally high abundance in human blood due to various cancers, our program predicted 62 as blood-secreted proteins. By applying our program to abnormally highly expressed genes in gastric cancer and lung cancer tissues detected through microarray gene expression studies, we predicted 13 and 31 as blood secreted, respectively, suggesting that they could serve as potential biomarkers for these two cancers, respectively. Our study demonstrated that our method can provide highly useful information to link genomic and proteomic studies for disease biomarker discovery. Our software can be accessed at http://csbl1.bmb.uga.edu/cgi-bin/Secretion/secretion.cgi.
引用
收藏
页码:2370 / 2375
页数:6
相关论文
共 54 条
[1]   Toward a human blood serum proteome - Analysis by multidimensional separation coupled with mass spectrometry [J].
Adkins, JN ;
Varnum, SM ;
Auberry, KJ ;
Moore, RJ ;
Angell, NH ;
Smith, RD ;
Springer, DL ;
Pounds, JG .
MOLECULAR & CELLULAR PROTEOMICS, 2002, 1 (12) :947-955
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   The human plasma proteome - History, character, and diagnostic prospects [J].
Anderson, NL ;
Anderson, NG .
MOLECULAR & CELLULAR PROTEOMICS, 2002, 1 (11) :845-867
[4]  
Bateman A, 2002, NUCLEIC ACIDS RES, V30, P276, DOI [10.1093/nar/gkr1065, 10.1093/nar/gkp985, 10.1093/nar/gkh121]
[5]   Kernel methods for predicting protein-protein interactions [J].
Ben-Hur, A ;
Noble, WS .
BIOINFORMATICS, 2005, 21 :I38-I46
[6]   Prediction of twin-arginine signal peptides [J].
Bendtsen, JD ;
Nielsen, H ;
Widdick, D ;
Palmer, T ;
Brunak, S .
BMC BIOINFORMATICS, 2005, 6 (1)
[7]   Classification of nuclear receptors based on amino acid composition and dipeptide composition [J].
Bhasin, M ;
Raghava, GPS .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2004, 279 (22) :23262-23266
[8]   The sweet side of biomarker discovery [J].
Bosques, Carlos J. ;
Raguram, S. ;
Sasisekharan, Ram .
NATURE BIOTECHNOLOGY, 2006, 24 (09) :1100-1101
[9]   Molecular markers of prostate cancer [J].
Bradford, Timothy J. ;
Tomlins, Scott A. ;
Wang, Xiaoju ;
Chinnaiyan, Arul M. .
UROLOGIC ONCOLOGY-SEMINARS AND ORIGINAL INVESTIGATIONS, 2006, 24 (06) :538-551
[10]  
Brown JM, 1998, CANCER RES, V58, P1408