Novel 2D maps and coupling numbers for protein sequences.: The first QSAR study of polygalacturonases;: isolation and prediction of a novel sequence from Psidium guajava']java L.

被引:94
作者
Agüero-Chapin, GA
González-Díaz, H
Molina, R
Varona-Santos, J
Uriarte, E
González-Díaz, Y
机构
[1] Cent Univ Las Villas, CBQ&CAP, Santa Clara 54830, Cuba
[2] Univ Santiago de Compostela, Fac Pharm, Dept Organ Chem, Santiago De Compostela 15782, Spain
[3] Univ Rostock, D-18059 Rostock, Germany
[4] Univ Nacl Autonoma Mexico, FES Iztacala, Biomed Unit, Tlalnepantla 54090, DF, Mexico
[5] Prov Ctr Med Genet, Las Tunas 77400, Cuba
[6] ICBP Victoria Giron, Natl Ctr Human Genet, Havana 11600, Cuba
来源
FEBS LETTERS | 2006年 / 580卷 / 03期
关键词
protein sequence; polygalactouronases; Markov model; quantitative-structure-activity-relationship; sequence maps;
D O I
10.1016/j.febslet.2005.12.072
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The development of 2D graph-theoretic representations for DNA sequences was very important for qualitative and quantitative comparison of sequences. Calculation of numeric features for these representations is useful for DNA-QSAR studies. Most of all graph-theoretic representations identify each one of the four bases with a unitary walk in one axe direction in the 2D space. In the case of proteins, twenty amino acids instead of four bases have to be considered. This fact has limited the introduction of useful 2D Cartesian representations and the corresponding sequences descriptors to encode protein sequence information. In this study, we overcome this problem grouping amino acids into four groups: acid, basic, polar and non-polar amino acids. The identification of each group with one of the four axis directions determines a novel 2D representation and numeric descriptors for proteins sequences. Afterwards, a Markov model has been used to calculate new numeric descriptors of the protein sequence. These descriptors are called herein the sequence 2D coupling numbers (zeta(k)). In this work, we calculated the zeta(k) values for 108 sequences of different polygalacturonases (PGs) and for 100 sequences of other proteins. A Linear Discriminant Analysis model derived here (PG = 5.36 center dot zeta(1)-3.98 center dot zeta(3)-42.21) successfully discriminates between PGs and other proteins. The model correctly classified 100% of a subset of 81 PGs and 75 non-PG proteins sequences used to train the model. The model also correctly classified 51 out of 52 (98.07%) of proteins sequences used as external validation series. The uses of different group of amino acids and/or axes orientation give different results, so it is suggested to be explored for other databases. Finally, to illustrates the use of the model we report the isolation and prediction of the PG action for a novel sequence (AY908988) isolated by our group from Psidium guajava L. This prediction coincides very well with sequence alignment results found by the BLAST methodology. These findings illustrate the possibilities of the sequence descriptors derived for this novel 2D sequence representation in proteins sequence QSAR studies. (c) 2005 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:723 / 730
页数:8
相关论文
共 58 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], [No title captured]
[3]  
BARANIDHARAN S, 1994, INT J GENOME RES, V1, P309
[4]   Chaos game representation of proteins [J].
Basu, S ;
Pan, A ;
Dutta, C ;
Das, J .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 1997, 15 (05) :279-289
[5]   Cell wall metabolism in fruit softening and quality and its manipulation in transgenic plants [J].
Brummell, DA ;
Harpster, MH .
PLANT MOLECULAR BIOLOGY, 2001, 47 (1-2) :311-340
[6]   Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence [J].
Cai, YD ;
Lin, SL .
BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS, 2003, 1648 (1-2) :127-133
[7]   Deciphering protein sequence information through hydrophobic cluster analysis (HCA): current status and perspectives [J].
Callebaut, I ;
Labesse, G ;
Durand, P ;
Poupon, A ;
Canard, L ;
Chomilier, J ;
Henrissat, B ;
Mornon, JP .
CELLULAR AND MOLECULAR LIFE SCIENCES, 1997, 53 (08) :621-645
[8]   Prediction of protein signal sequences [J].
Chou, KC .
CURRENT PROTEIN & PEPTIDE SCIENCE, 2002, 3 (06) :615-622
[9]   Markovian backbone negentropies:: Molecular descriptors for protein research.: I.: Predicting protein stability in arc repressor mutants [J].
de Armas, RR ;
Díaz, HG ;
Molina, R ;
Uriarte, E .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (04) :715-723
[10]  
Dellaporta S. L., 1983, Plant Molecular Biology Reporter, V1, P19, DOI [10.1007/BF02712670, DOI 10.1007/BF02712670]