Using complexity measure factor to predict protein subcellular location

被引:166
作者
Xiao, X [5 ]
Shao, S
Ding, Y
Huang, Z
Huang, Y
Chou, KC
机构
[1] Gordon Life Sci Inst, 13784 Torrey Mar Dr, San Diego, CA 92130 USA
[2] Tianjin Inst Bioinformat & Drug Discovery, Tianjin, Peoples R China
[3] Shanghai Jiao Tong Univ, Shanghai 200030, Peoples R China
[4] Jing Zhen Ceram Inst, Dept Comp, Jing De Zhen, Peoples R China
[5] Donghua Univ, Coll Informat Sci & Technol, Shanghai, Peoples R China
关键词
pseudo amino acid composition; complexity measure factor; covariant-discriminant algorithm; Chou's invariance theorem;
D O I
10.1007/s00726-004-0148-7
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Recent advances in large-scale genome sequencing have led to the rapid accumulation of amino acid sequences of proteins whose functions are unknown. Because the functions of these proteins are closely correlated with their subcellular localizations, it is vitally important to develop an automated method as a high-throughput tool to timely identify their subcellular location. Based on the concept of the pseudo amino acid composition by which a considerable amount of sequence-order effects can be incorporated into a set of discrete numbers (Chou, K. C., Proteins: Structure, Function, and Genetics, 2001, 43: 246 - 255), the complexity measure approach is introduced. The advantage by incorporating the complexity measure factor as one of the pseudo amino acid components for a protein is that it can more effectively reflect its overall sequence-order feature than the conventional correlation factors. With such a formulation frame to represent the samples of protein sequences, the covariant-discriminant predictor ( Chou, K. C. and Elrod, D. W., Protein Engineering, 1999, 12: 107 - 118) was adopted to conduct prediction. High success rates were obtained by both the jackknife cross-validation test and independent dataset test, suggesting that introduction of the concept of the complexity measure into prediction of protein subcellular location is quite promising, and might also hold a great potential as a useful vehicle for the other areas of molecular biology.
引用
收藏
页码:57 / 61
页数:5
相关论文
共 51 条
[1]   ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST [J].
Bhasin, M ;
Raghava, GPS .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W414-W419
[2]   Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition [J].
Cai, YD ;
Chou, KC .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2003, 305 (02) :407-411
[3]   Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect [J].
Cai, YD ;
Liu, XJ ;
Xu, XB ;
Chou, KC .
JOURNAL OF CELLULAR BIOCHEMISTRY, 2002, 84 (02) :343-348
[4]  
Cai Yu-Dong, 2000, Molecular Cell Biology Research Communications, V4, P172, DOI 10.1006/mcbr.2001.0269
[5]   Relation between amino acid composition and cellular location of proteins [J].
Cedano, J ;
Aloy, P ;
PerezPons, JA ;
Querol, E .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) :594-600
[6]   A JOINT PREDICTION OF THE FOLDING TYPES OF 1490 HUMAN PROTEINS FROM THEIR GENETIC CODONS [J].
CHOU, JJW ;
ZHANG, CT .
JOURNAL OF THEORETICAL BIOLOGY, 1993, 161 (02) :251-262
[7]   Prediction of protein subcellular locations by incorporating quasi-sequence-order effect [J].
Chou, KC .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2000, 278 (02) :477-483
[8]   Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes [J].
Chou, KC .
BIOINFORMATICS, 2005, 21 (01) :10-19
[9]   Structural bioinformatics and its impact to biomedical science [J].
Chou, KC .
CURRENT MEDICINAL CHEMISTRY, 2004, 11 (16) :2105-2134
[10]   Prediction of protein subcellular locations by GO-FunD-PseAA predictor [J].
Chou, KC ;
Cai, YD .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2004, 320 (04) :1236-1239