Noisy:: Identification of problematic columns in multiple sequence alignments

被引:121
作者
Dress, Andreas W. M. [2 ,3 ]
Flamm, Christoph [1 ]
Fritzsch, Guido [4 ,5 ]
Gruenewald, Stefan [2 ,3 ]
Kruspe, Matthias [5 ]
Prohaska, Sonja J. [1 ,6 ,7 ]
Stadler, Peter F. [1 ,5 ,6 ,8 ,9 ]
机构
[1] Univ Vienna, Inst Theoret Chem & Mol Strukturbiol, A-1090 Vienna, Austria
[2] Shanghai Inst Biol Sci, Partner Inst Computat Biol, MPG CAS, Dept Combinator & Geometry, Shanghai, Peoples R China
[3] Max Planck Inst Math Sci, D-04103 Leipzig, Germany
[4] Univ Leipzig, Inst Biol Zool Mol Evolut & Syst Tiere 2, D-04103 Leipzig, Germany
[5] Univ Leipzig, Insterdisciplinary Ctr Bioinformat, D-04107 Leipzig, Germany
[6] Santa Fe Inst, Santa Fe, NM 87501 USA
[7] Arizona State Univ, Tempe, AZ 85287 USA
[8] Univ Leipzig, Dept Comp Sci, Bioinformat Grp, D-04107 Leipzig, Germany
[9] Fraunhofer Inst Cell Therapy & Immunol IZI, RN Grp, D-04103 Leipzig, Germany
关键词
D O I
10.1186/1748-7188-3-7
中图分类号
Q5 [生物化学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Motivation: Sequence-based methods for phylogenetic reconstruction from (nucleic acid) sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i) phylogenetically informative and (ii) effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction. Results: We present here a method that, based on assessing the distribution of character states along a cyclic ordering of the taxa, allows the identification of phylogenetically uninformative homoplastic sites in a multiple sequence alignment. Removal of these sites appears to improve the performance of phylogenetic reconstruction algorithms as measured by various indices of "tree quality". In particular, we obtain more stable trees due to the exclusion of phylogenetically incompatible sites that most likely represent strongly randomized characters. Software: The computer program noisy implements this approach. It can be employed to improving phylogenetic reconstruction capability with quite a considerable success rate whenever (1) the average bootstrap support obtained from the original alignment is low, and (2) there are sufficiently many taxa in the data set - at least, say, 12 to 15 taxa. The software can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/noisy/.
引用
收藏
页数:10
相关论文
共 36 条
[1]
Split Decomposition: A New and Useful Approach to Phylogenetic Analysis of Distance Data [J].
Bandelt, Hans-Juergen ;
Dress, Andreas W. M. .
MOLECULAR PHYLOGENETICS AND EVOLUTION, 1992, 1 (03) :242-252
[2]
A CANONICAL DECOMPOSITION-THEORY FOR METRICS ON A FINITE-SET [J].
BANDELT, HJ ;
DRESS, AWM .
ADVANCES IN MATHEMATICS, 1992, 92 (01) :47-105
[3]
Björklund M, 1999, CLADISTICS, V15, P191, DOI 10.1111/j.1096-0031.1999.tb00261.x
[4]
Archaea sister group of bacteria? Indications from tree reconstruction artifacts in ancient phylogenies [J].
Brinkmann, H ;
Philippe, H .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (06) :817-825
[5]
Neighbor-Net: An agglomerative method for the construction of phylogenetic networks [J].
Bryant, D ;
Moulton, V .
MOLECULAR BIOLOGY AND EVOLUTION, 2004, 21 (02) :255-265
[6]
Consistency of the Neighbor-Net algorithm [J].
Bryant, David ;
Moulton, Vincent ;
Spillner, Andreas .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2007, 2 (1)
[7]
Buneman P., 1971, Mathematics in the Archaeological and Historical Sciences, P387
[8]
DNA assembly with gaps (Dawg): simulating sequence evolution [J].
Cartwright, RA .
BIOINFORMATICS, 2005, 21 :31-38
[9]
Bootstrap confidence levels for phylogenetic trees [J].
Efron, B ;
Halloran, E ;
Holmes, S .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (14) :7085-7090
[10]
THE RETENTION INDEX AND THE RESCALED CONSISTENCY INDEX [J].
FARRIS, JS .
CLADISTICS-THE INTERNATIONAL JOURNAL OF THE WILLI HENNIG SOCIETY, 1989, 5 (04) :417-419