Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments

被引:4314
作者
Talavera, Gerard [1 ]
Castresana, Jose [1 ]
机构
[1] CSIC, Inst Mol Biol, Dept Physiol & Mol Biodivers, E-08034 Barcelona, Spain
关键词
D O I
10.1080/10635150701472164
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Alignment quality may have as much impact on phylogenetic reconstruction as the phylogenetic methods used. Not only the alignment algorithm, but also the method used to deal with the most problematic alignment regions, may have a critical effect on the final tree. Although some authors remove such problematic regions, either manually or using automatic methods, in order to improve phylogenetic performance, others prefer to keep such regions to avoid losing any information. Our aim in the present work was to examine whether phylogenetic reconstruction improves after alignment cleaning or not. Using simulated protein alignments with gaps, we tested the relative performance in diverse phylogenetic analyses of the whole alignments versus the alignments with problematic regions removed with our previously developed Gblocks program. We also tested the performance of more or less stringent conditions in the selection of blocks. Alignments constructed with different alignment methods (ClustalW, Mafft, and Probcons) were used to estimate phylogenetic trees by maximum likelihood, neighbor joining, and parsimony. We show that, in most alignment conditions, and for alignments that are not too short, removal of blocks leads to better trees. That is, despite losing some information, there is an increase in the actual phylogenetic signal. Overall, the best trees are obtained by maximum-likelihood reconstruction of alignments cleaned by Gblocks. In general, a relaxed selection of blocks is better for short alignment, whereas a stringent selection is more adequate for longer ones. Finally, we show that cleaned alignments produce better topologies although, paradoxically, with lower bootstrap. This indicates that divergent and problematic alignment regions may lead, when present, to apparently better supported although, in fact, more biased topologies.
引用
收藏
页码:564 / 577
页数:14
相关论文
共 55 条
[31]   Unalignable sequences and molecular evolution [J].
Lee, MSY .
TRENDS IN ECOLOGY & EVOLUTION, 2001, 16 (12) :681-685
[32]   SOAP, cleaning multiple alignments from unstable blocks [J].
Löytynoja, A ;
Milinkovitch, MC .
BIOINFORMATICS, 2001, 17 (06) :573-574
[33]   Bayesian coestimation of phylogeny and sequence alignment -: art. no. 83 [J].
Lunter, G ;
Miklós, I ;
Drummond, A ;
Jensen, JL ;
Hein, J .
BMC BIOINFORMATICS, 2005, 6 (1)
[34]   Integrating ambiguously aligned regions of DNA sequences in phylogenetic analyses without violating positional homology [J].
Lutzoni, F ;
Wagner, P ;
Reeb, V ;
Zoller, S .
SYSTEMATIC BIOLOGY, 2000, 49 (04) :628-651
[35]   Effects of nucleotide sequence alignment on phylogeny estimation: A case study of 18S rDNAs of Apicomplexa [J].
Morrison, DA ;
Ellis, JT .
MOLECULAR BIOLOGY AND EVOLUTION, 1997, 14 (04) :428-441
[36]   A GENERAL METHOD APPLICABLE TO SEARCH FOR SIMILARITIES IN AMINO ACID SEQUENCE OF 2 PROTEINS [J].
NEEDLEMAN, SB ;
WUNSCH, CD .
JOURNAL OF MOLECULAR BIOLOGY, 1970, 48 (03) :443-+
[37]   T-Coffee: A novel method for fast and accurate multiple sequence alignment [J].
Notredame, C ;
Higgins, DG ;
Heringa, J .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 302 (01) :205-217
[38]   The accuracy of several multiple sequence alignment programs for proteins [J].
Nuin, Paulo A. S. ;
Wang, Zhouzhi ;
Tillier, Elisabeth R. M. .
BMC BIOINFORMATICS, 2006, 7 (1)
[39]   Multiple sequence alignment accuracy and phylogenetic inference [J].
Ogden, TH ;
Rosenberg, MS .
SYSTEMATIC BIOLOGY, 2006, 55 (02) :314-328
[40]   A Statistical Method for Detecting Regions with Different Evolutionary Dynamics in Multialigned Sequences [J].
Pesole, G. ;
Attimonelli, M. ;
Preparata, G. ;
Saccone, C. .
MOLECULAR PHYLOGENETICS AND EVOLUTION, 1992, 1 (02) :91-96