PHYSEAN: PHYsical SEquence ANalysis for the identification of protein domains on the basis of physical and chemical properties of amino acids

被引:22
作者
Ladunga, I [1 ]
机构
[1] SmithKline Beecham Pharmaceut, Bioinformat Dept, King Of Prussia, PA 19406 USA
[2] Hungarian Acad Sci, Res Grp Evolutionary Genet, H-1051 Budapest, Hungary
[3] Eotvos Lorand Univ, H-1051 Budapest, Hungary
基金
匈牙利科学研究基金会;
关键词
D O I
10.1093/bioinformatics/15.12.1028
中图分类号
Q5 [生物化学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Motivation: PHYSEAN predicts protein classes with highly variable sequences on the basis of their physical, chemical and biological characteristics such as diverse hydrophobicity, structural propensity and steric properties. These characteristics, calculated from multiple positions in a sequence, may be conserved even between sequences that fail to produce alignments at any acceptable level of statistical significance. PHYSEAN complements methods that require sequence alignments (BLAST, FASTA, dynamic programming) by adding less residue- and position-specific physicochemical information on the protein or the domain. Results: We predict proteins or their domains like signal peptides using physical, chemical, geometric, and biological properties of the 20 amino acids. This comprehensive set of properties may cover the diagnostic functional and structural aspects of a domain or a protein class. We automatically select and weight a subset of properties so as to discriminate between, e.g., signal peptides and amino-termini of cytosolic proteins with the lowest number of incorrect predictions. This optimal selection of properties and their weights significantly decreases the number of incorrect predictions as compared to any single property or any combination of unweighted properties. Weights have been optimized by high-performance linear programming models that systematically find the optimal solution from among an astronomic number of property/weight combinations. PHYSEAN's performance is demonstrated by highly accurate predictions of signal peptides (the vehicles for protein transport across membranes) and their cleavage sites. The results indicate reliable predictions are possible even in the lack of sequence conservation using an automated physical and chemical analysis of proteins.
引用
收藏
页码:1028 / 1038
页数:11
相关论文
共 71 条
[1]
Akutsu T, 1998, Pac Symp Biocomput, P413
[2]
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]
BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]
ARGOS P, 1982, EUR J BIOCHEM, V128, P565
[5]
BETA-GLUCOSYLTRANSFERASE AND PHOSPHORYLASE REVEAL THEIR COMMON THEME [J].
ARTYMIUK, PJ ;
RICE, DW ;
POIRRETTE, AR ;
WILLETT, P .
NATURE STRUCTURAL BIOLOGY, 1995, 2 (02) :117-120
[6]
The SWISS-PROT protein sequence data bank and its supplement TrEMBL [J].
Bairoch, A ;
Apweller, R .
NUCLEIC ACIDS RESEARCH, 1997, 25 (01) :31-36
[7]
Beasley J. E., 1996, ADV LINEAR INTEGER P
[8]
BENNETT KP, 1992, OPTIMIZATION METHODS, V1, P23, DOI DOI 10.1080/10556789208805504
[9]
Chvatal V, 1983, Linear programming
[10]
Prediction of N-terminal protein sorting signals [J].
Claros, MG ;
Brunak, S ;
vonHeijne, G .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1997, 7 (03) :394-398