POLYPEPTIDE SEQUENCE PROPERTY RELATIONSHIPS IN ESCHERICHIA-COLI BASED ON AUTO CROSS COVARIANCES

被引:30
作者
SJOSTROM, M [1 ]
RANNAR, S [1 ]
WIESLANDER, A [1 ]
机构
[1] UMEA UNIV,INST CHEM,DEPT BIOCHEM,S-90187 UMEA,SWEDEN
关键词
PEPTIDE SEQUENCES; PARTIAL LEAST SQUARES DISCRIMINANT ANALYSIS; PROTEIN CLASSIFICATION; SEQUENCE ANALYSIS; AUTO CROSS COVARIANCES; MULTIVARIATE DATA ANALYSIS;
D O I
10.1016/0169-7439(95)00059-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For multivariate classification and quantitative structure activity studies of proteins, which involve amino acid sequences of different length, preprocessing methods are needed which make it possible to translate the sequence into a quantitative measure with the same number of variables. Here three different preprocessing methods are investigated. Two of the methods are variants of auto cross covariances calculated from a multipositional description of the protein sequence. For the multipositional description three orthogonal scales were used which physico-chemically describes the amino acids. The third method is a quantification of each sequence by a diamino acid frequency histogram. The methods are investigated by a classification of 106 Escherichia coli and Gramnegative bacteria proteins. The proteins were divided into four classes depending on their location in the cell. The four classes were: cytoplasm, inner membrane, periplasm and outer membrane. For the proceeding classification PLS discriminant analysis was used. The results showed that one of the variants of auto cross covariances and the diamino acid frequency histogram representation contained much information related to the given classification problem. Hence the amino acid sequences for proteins with different final locations in Escherichia coli have significant features related to protein structure and location.
引用
收藏
页码:295 / 305
页数:11
相关论文
共 18 条
[11]   SIGNAL PEPTIDE AMINO-ACID-SEQUENCES IN ESCHERICHIA-COLI CONTAIN INFORMATION RELATED TO FINAL PROTEIN LOCALIZATION - A MULTIVARIATE DATA-ANALYSIS [J].
SJOSTROM, M ;
WOLD, S ;
WIESLANDER, A ;
RILFORS, L .
EMBO JOURNAL, 1987, 6 (03) :823-831
[12]  
SJOSTROM M, 1986, PATTERN RECOGN, V2, P461, DOI DOI 10.1016/B978-0-444-87877-9.50042-X
[13]  
SJOSTROM M, 1995, QSAR CHEMOMETRIC MET, V2, P62
[14]   Partial least squares analysis with cross-validation for the two-class problem: A Monte Carlo study [J].
Ståhle, Lars ;
Wold, Svante .
Journal of Chemometrics, 1987, 1 (03) :185-196
[15]   A NEW FAMILY OF POWERFUL MULTIVARIATE STATISTICAL SEQUENCE-ANALYSIS TECHNIQUES [J].
VANHEEL, M .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 220 (04) :877-887
[16]  
VONHEIJNE G, 1994, ANNU REV BIOPH BIOM, V23, P167
[18]   DNA AND PEPTIDE SEQUENCES AND CHEMICAL PROCESSES MULTIVARIATELY MODELED BY PRINCIPAL COMPONENT ANALYSIS AND PARTIAL LEAST-SQUARES PROJECTIONS TO LATENT STRUCTURES [J].
WOLD, S ;
JONSSON, J ;
SJOSTROM, M ;
SANDBERG, M ;
RANNAR, S .
ANALYTICA CHIMICA ACTA, 1993, 277 (02) :239-253