A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins

被引:14
作者
Brick, Kevin [1 ]
Pizzi, Elisabetta [1 ]
机构
[1] Ist Super Sanita, Dipartimento Malattie Infett Parassitarie & Immun, I-00161 Rome, Italy
关键词
D O I
10.1186/1471-2105-9-236
中图分类号
Q5 [生物化学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Background: The most common substitution matrices currently used (BLOSUM and PAM) are based on protein sequences with average amino acid distributions, thus they do not represent a fully accurate substitution model for proteins characterized by a biased amino acid composition. This problem has been addressed recently by adjusting existing matrices, however, to date, no empirical approach has been taken to build matrices which offer a substitution model for comparing proteins sharing an amino acid compositional bias. Here, we present a novel procedure to construct series of symmetrical substitution matrices to align proteins from similarly biased Plasmodium proteomes. Results: We generated substitution matrices by selecting from the BLOCKS database those multiple alignments with a compositional bias similar to that of P. falciparum and P. yoelii proteins. A novel 'fuzzy' clustering method was adopted to group sequences within these alignments, showing that this method retains more complete information on the amino acid substitutions when compared to hierarchical clustering. We assessed the performance against the BLOSUM62 series and showed that the usage of our matrices results in an improvement in the performance of BLAST database searches, greatly reducing the number of false positive hits. We then demonstrated applications of the use of novel matrices to improve the annotation of homologs between the two Plasmodium species and to classify members of the P. falciparum RIFIN/STEVOR family. Conclusion: We confirmed that in the case of compositionally biased proteins, standard BLOSUM matrices are not suited for optimal alignments, and specific substitution matrices are required. In addition, we showed that the usage of these matrices leads to a reduction of false positive hits, facilitating the automatic annotation process.
引用
收藏
页数:15
相关论文
共 41 条
[1]
Protein database searches using compositionally adjusted substitution matrices [J].
Altschul, SF ;
Wootton, JC ;
Gertz, EM ;
Agarwala, R ;
Morgulis, A ;
Schäffer, AA ;
Yu, YK .
FEBS JOURNAL, 2005, 272 (20) :5101-5109
[2]
The estimation of statistical parameters for local alignment score distributions [J].
Altschul, SF ;
Bundschuh, R ;
Olsen, R ;
Hwa, T .
NUCLEIC ACIDS RESEARCH, 2001, 29 (02) :351-361
[3]
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]
[Anonymous], 1978, Atlas of protein sequence and structure
[5]
[Anonymous], 2004, WORLD HLTH REPORT 20
[6]
Applications of the pyramidal clustering method to biological objects [J].
Aude, JC ;
Diaz-Lazcoz, Y ;
Codani, JJ ;
Risler, JL .
COMPUTERS & CHEMISTRY, 1999, 23 (3-4) :303-315
[7]
PlasmoDB:: the Plasmodium genome resource.: A database integrating experimental and computational data [J].
Bahl, A ;
Brunk, B ;
Crabtree, J ;
Fraunholz, MJ ;
Gajria, B ;
Grant, GR ;
Ginsburg, H ;
Gupta, D ;
Kissinger, JC ;
Labo, P ;
Li, L ;
Mailman, MD ;
Milgram, AJ ;
Pearson, DS ;
Roos, DS ;
Schug, J ;
Stoeckert, CJ ;
Whetzel, P .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :212-215
[8]
Construction of non-symmetric substitution matrices derived from proteomes with biased amino acid distributions [J].
Bastien, O ;
Roy, S ;
Maréchal, É .
COMPTES RENDUS BIOLOGIES, 2005, 328 (05) :445-453
[9]
Analysis of the compositional biases in Plasmodium falciparum genome and proteome using Arabidopsis thaliana as a reference [J].
Bastien, O ;
Lespinats, S ;
Roy, S ;
Métayer, K ;
Fertil, B ;
Codani, JJ ;
Maréchal, E .
GENE, 2004, 336 (02) :163-173
[10]
Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii [J].
Carlton, JM ;
Angiuoli, SV ;
Suh, BB ;
Kooij, TW ;
Pertea, M ;
Silva, JC ;
Ermolaeva, MD ;
Allen, JE ;
Selengut, JD ;
Koo, HL ;
Peterson, JD ;
Pop, M ;
Kosack, DS ;
Shumway, MF ;
Bidwell, SL ;
Shallom, SJ ;
van Aken, SE ;
Riedmuller, SB ;
Feldblyum, TV ;
Cho, JK ;
Quackenbush, J ;
Sedegah, M ;
Shoaibi, A ;
Cummings, LM ;
Florens, L ;
Yates, JR ;
Raine, JD ;
Sinden, RE ;
Harris, MA ;
Cunningham, DA ;
Preiser, PR ;
Bergman, LW ;
Vaidya, AB ;
Van Lin, LH ;
Janse, CJ ;
Waters, AP ;
Smith, HO ;
White, OR ;
Salzberg, SL ;
Venter, JC ;
Fraser, CM ;
Hoffman, SL ;
Gardner, MJ ;
Carucci, DJ .
NATURE, 2002, 419 (6906) :512-519