BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments

被引:1091
作者
Criscuolo, Alexis [1 ]
Gribaldo, Simonetta [1 ]
机构
[1] Inst Pasteur, Dept Microbiol, Unite Biol Mol Gene Chez Extremophiles, F-75015 Paris, France
来源
BMC EVOLUTIONARY BIOLOGY | 2010年 / 10卷
关键词
AMINO-ACID SUBSTITUTION; EVOLUTIONARY TREES; MODEL; DNA; ACCURACY; SUPPORT; HOMOGENEITY; ALGORITHM; MATRICES; PATTERN;
D O I
10.1186/1471-2148-10-210
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The quality of multiple sequence alignments plays an important role in the accuracy of phylogenetic inference. It has been shown that removing ambiguously aligned regions, but also other sources of bias such as highly variable (saturated) characters, can improve the overall performance of many phylogenetic reconstruction methods. A current scientific trend is to build phylogenetic trees from a large number of sequence datasets (semi-) automatically extracted from numerous complete genomes. Because these approaches do not allow a precise manual curation of each dataset, there exists a real need for efficient bioinformatic tools dedicated to this alignment character trimming step. Results: Here is presented a new software, named BMGE (Block Mapping and Gathering with Entropy), that is designed to select regions in a multiple sequence alignment that are suited for phylogenetic inference. For each character, BMGE computes a score closely related to an entropy value. Calculation of these entropy-like scores is weighted with BLOSUM or PAM similarity matrices in order to distinguish among biologically expected and unexpected variability for each aligned character. Sets of contiguous characters with a score above a given threshold are considered as not suited for phylogenetic inference and then removed. Simulation analyses show that the character trimming performed by BMGE produces datasets leading to accurate trees, especially with alignments including distantly-related sequences. BMGE also implements trimming and recoding methods aimed at minimizing phylogeny reconstruction artefacts due to compositional heterogeneity. Conclusions: BMGE is able to perform biologically relevant trimming on a multiple alignment of DNA, codon or amino acid sequences. Java source code and executable are freely available at ftp://ftp.pasteur.fr/pub/GenSoft/projects/BMGE/.
引用
收藏
页数:21
相关论文
共 90 条
[1]   Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences [J].
Ababneh, F ;
Jermiin, LS ;
Ma, CS ;
Robinson, J .
BIOINFORMATICS, 2006, 22 (10) :1225-1231
[2]   Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA [J].
Adachi, J ;
Waddell, PJ ;
Martin, W ;
Hasegawa, M .
JOURNAL OF MOLECULAR EVOLUTION, 2000, 50 (04) :348-358
[3]  
Adachi J, 1996, J MOL EVOL, V42, P459
[4]  
Ané C, 2007, MOL BIOL EVOL, V24, P412
[5]  
ANISIMOVA M, 2006, SYST BIOL
[6]  
[Anonymous], 1975, SIGNAL DETECTION THE
[7]  
[Anonymous], JAMA, A Java Matrix Package
[8]   SUGGESTIONS FOR SAFE RESIDUE SUBSTITUTIONS IN SITE-DIRECTED MUTAGENESIS [J].
BORDO, D ;
ARGOS, P .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 217 (04) :721-729
[9]   A TEST FOR SYMMETRY IN CONTINGENCY TABLES [J].
BOWKER, AH .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1948, 43 (244) :572-574
[10]   Phylogenomics reveals a new 'megagroup' including most photosynthetic eukaryotes [J].
Burki, Fabien ;
Shalchian-Tabrizi, Kamran ;
Pawlowski, Jan .
BIOLOGY LETTERS, 2008, 4 (04) :366-369