Some remarks on protein attribute prediction and pseudo amino acid composition

被引:1175
作者
Chou, Kuo-Chen [1 ]
机构
[1] Gordon Life Sci Inst, San Diego, CA 92130 USA
关键词
PseAAC; Functional domain mode; Gene ontology mode; Sequential evolution mode; Cross-validation; SUPPORT VECTOR MACHINES; FUNCTIONAL DOMAIN COMPOSITION; SUBCELLULAR LOCATION PREDICTION; STRUCTURAL CLASS PREDICTION; MODIFIED MAHALANOBIS DISCRIMINANT; RECOGNITION SEQUENCE PEPTIDES; COUPLED RECEPTOR CLASSES; IMPROVED HYBRID APPROACH; DIFFUSION-CONTROLLED REACTIONS; PRINCIPAL COMPONENT ANALYSIS;
D O I
10.1016/j.jtbi.2010.12.024
中图分类号
Q [生物科学];
学科分类号
090105 [作物生产系统与生态工程];
摘要
With the accomplishment of human genome sequencing, the number of sequence-known proteins has increased explosively. In contrast, the pace is much slower in determining their biological attributes. As a consequence, the gap between sequence-known proteins and attribute-known proteins has become increasingly large. The unbalanced situation, which has critically limited our ability to timely utilize the newly discovered proteins for basic research and drug development, has called for developing computational methods or high-throughput automated tools for fast and reliably identifying various attributes of uncharacterized proteins based on their sequence information alone. Actually, during the last two decades or so, many methods in this regard have been established in hope to bridge such a gap. In the course of developing these methods, the following things were often needed to consider: (1) benchmark dataset construction, (2) protein sample formulation, (3) operating algorithm (or engine), (4) anticipated accuracy, and (5) web-server establishment. In this review, we are to discuss each of the five procedures, with a special focus on the introduction of pseudo amino acid composition (PseAAC), its different modes and applications as well as its recent development, particularly in how to use the general formulation of PseAAC to reflect the core and essential features that are deeply hidden in complicated protein sequences. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:236 / 247
页数:12
相关论文
共 215 条
[1]
Altschul SE, 1997, THEORETICAL AND COMPUTATIONAL METHODS IN GENOME RESEARCH, P1
[2]
Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates [J].
Anand, Ashish ;
Suganthan, P. N. .
JOURNAL OF THEORETICAL BIOLOGY, 2009, 259 (03) :533-540
[3]
Kinetic plasticity and the determination of product ratios for kinetic schemes leading to multiple products without rate laws - New methods based on directed graphs [J].
Andraos, John .
CANADIAN JOURNAL OF CHEMISTRY, 2008, 86 (04) :342-357
[4]
[Anonymous], 1936, P NATL I SCI INDIA, DOI DOI 10.1007/S13171-019-00164-5
[5]
[Anonymous], NAT SCI
[6]
Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[7]
Predicting membrane protein type by functional domain composition and pseudo-amino acid composition [J].
Cai, YD ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2006, 238 (02) :395-400
[8]
Predicting enzyme subclass by functional domain composition and pseudo amino acid composition [J].
Cai, YD ;
Chou, KC .
JOURNAL OF PROTEOME RESEARCH, 2005, 4 (03) :967-971
[9]
Predicting enzyme family classes by hybridizing gene product composition and pseudo-amino acid composition [J].
Cai, YD ;
Zhou, GP ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2005, 234 (01) :145-149
[10]
Application of SVM to predict membrane protein types [J].
Cai, YD ;
Ricardo, PW ;
Jen, CH ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2004, 226 (04) :373-376