POLYPEPTIDE SEQUENCE PROPERTY RELATIONSHIPS IN ESCHERICHIA-COLI BASED ON AUTO CROSS COVARIANCES

被引:30
作者
SJOSTROM, M [1 ]
RANNAR, S [1 ]
WIESLANDER, A [1 ]
机构
[1] UMEA UNIV,INST CHEM,DEPT BIOCHEM,S-90187 UMEA,SWEDEN
关键词
PEPTIDE SEQUENCES; PARTIAL LEAST SQUARES DISCRIMINANT ANALYSIS; PROTEIN CLASSIFICATION; SEQUENCE ANALYSIS; AUTO CROSS COVARIANCES; MULTIVARIATE DATA ANALYSIS;
D O I
10.1016/0169-7439(95)00059-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For multivariate classification and quantitative structure activity studies of proteins, which involve amino acid sequences of different length, preprocessing methods are needed which make it possible to translate the sequence into a quantitative measure with the same number of variables. Here three different preprocessing methods are investigated. Two of the methods are variants of auto cross covariances calculated from a multipositional description of the protein sequence. For the multipositional description three orthogonal scales were used which physico-chemically describes the amino acids. The third method is a quantification of each sequence by a diamino acid frequency histogram. The methods are investigated by a classification of 106 Escherichia coli and Gramnegative bacteria proteins. The proteins were divided into four classes depending on their location in the cell. The four classes were: cytoplasm, inner membrane, periplasm and outer membrane. For the proceeding classification PLS discriminant analysis was used. The results showed that one of the variants of auto cross covariances and the diamino acid frequency histogram representation contained much information related to the given classification problem. Hence the amino acid sequences for proteins with different final locations in Escherichia coli have significant features related to protein structure and location.
引用
收藏
页码:295 / 305
页数:11
相关论文
共 18 条
[1]   PREDICTION OF PROTEIN FOLDING CLASS FROM AMINO-ACID-COMPOSITION [J].
DUBCHAK, I ;
HOLBROOK, SR ;
KIM, SH .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1993, 16 (01) :79-91
[2]   PARTIAL LEAST-SQUARES REGRESSION - A TUTORIAL [J].
GELADI, P ;
KOWALSKI, BR .
ANALYTICA CHIMICA ACTA, 1986, 185 :1-17
[3]  
Gribskov M., 1991, SEQUENCE ANAL PRIMER
[4]   PEPTIDE QUANTITATIVE STRUCTURE-ACTIVITY-RELATIONSHIPS, A MULTIVARIATE APPROACH [J].
HELLBERG, S ;
SJOSTROM, M ;
SKAGERBERG, B ;
WOLD, S .
JOURNAL OF MEDICINAL CHEMISTRY, 1987, 30 (07) :1126-1135
[5]  
HELLBERG S, 1991, INT J PEPT PROT RES, V37, P414
[6]  
Hoskuldsson A., 1988, J CHEMOMETR, V2, P211, DOI DOI 10.1002/CEM.1180020306
[7]  
JACKSON JE, 1991, USERS PRINCIPAL COMP
[8]  
JONSSON J, 1992, THESIS UMEA U UMEA
[9]   EXPERT SYSTEM FOR PREDICTING PROTEIN LOCALIZATION SITES IN GRAM-NEGATIVE BACTERIA [J].
NAKAI, K ;
KANEHISA, M .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1991, 11 (02) :95-110
[10]   PROTEIN SUPERFAMILIES AND DOMAIN SUPERFOLDS [J].
ORENGO, CA ;
JONES, DT ;
THORNTON, JM .
NATURE, 1994, 372 (6507) :631-634