Aligning short reads to reference alignments and trees

被引:140
作者
Berger, Simon A. [1 ]
Stamatakis, Alexandros [1 ]
机构
[1] Heidelberg Inst Theoret Studies, Sci Comp Grp, Exelixis Lab, D-69118 Heidelberg, Germany
关键词
MAXIMUM-LIKELIHOOD; SEQUENCE ALIGNMENT; DNA-SEQUENCES; PLACEMENT; ACCURACY;
D O I
10.1093/bioinformatics/btr320
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Likelihood-based methods for placing short read sequences from metagenomic samples into reference phylogenies have been recently introduced. At present, it is unclear how to align those reads with respect to the reference alignment that was deployed to infer the reference phylogeny. Moreover, the adaptability of such alignment methods with respect to the underlying reference alignment strategies/philosophies has not been explored. It has also not been assessed if the reference phylogeny can be deployed in conjunction with the reference alignment to improve alignment accuracy in this context. Results: We assess different strategies for short read alignment and propose a novel phylogeny-aware alignment procedure. Our alignment method can improve the accuracy of subsequent phylogenetic placement of the reads into a reference phylogeny by up to 5.8 times compared with phylogeny-agnostic methods. It can be deployed to align reads to alignments generated by using fundamentally different alignment strategies (e.g. PRANK(+F) versus MUSCLE).
引用
收藏
页码:2068 / 2075
页数:8
相关论文
共 21 条
  • [1] The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes
    Angly, Florent E.
    Willner, Dana
    Prieto-Davo, Alejandra
    Edwards, Robert A.
    Schmieder, Robert
    Vega-Thurber, Rebecca
    Antonopoulos, Dionysios A.
    Barott, Katie
    Cottrell, Matthew T.
    Desnues, Christelle
    Dinsdale, Elizabeth A.
    Furlan, Mike
    Haynes, Matthew
    Henn, Matthew R.
    Hu, Yongfei
    Kirchman, David L.
    McDole, Tracey
    McPherson, John D.
    Meyer, Folker
    Miller, R. Michael
    Mundt, Egbert
    Naviaux, Robert K.
    Rodriguez-Mueller, Beltran
    Stevens, Rick
    Wegley, Linda
    Zhang, Lixin
    Zhu, Baoli
    Rohwer, Forest
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (12)
  • [2] Characteristics of 454 pyrosequencing data-enabling realistic simulation with flowsim
    Balzer, Susanne
    Malde, Ketil
    Lanzen, Anders
    Sharma, Animesh
    Jonassen, Inge
    [J]. BIOINFORMATICS, 2010, 26 (18) : i420 - i425
  • [3] Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood
    Berger, Simon A.
    Krompass, Denis
    Stamatakis, Alexandros
    [J]. SYSTEMATIC BIOLOGY, 2011, 60 (03) : 291 - 302
  • [4] Profile hidden Markov models
    Eddy, SR
    [J]. BIOINFORMATICS, 1998, 14 (09) : 755 - 763
  • [5] MUSCLE: multiple sequence alignment with high accuracy and high throughput
    Edgar, RC
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (05) : 1792 - 1797
  • [6] Striped Smith-Waterman speeds database searches six times over other SIMD implementations
    Farrar, Michael
    [J]. BIOINFORMATICS, 2007, 23 (02) : 156 - 161
  • [7] The influence of sex, handedness, and washing on the diversity of hand surface bacteria
    Fierer, Noah
    Hamady, Micah
    Lauber, Christian L.
    Knight, Rob
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (46) : 17994 - 17999
  • [8] CONSTRUCTION OF PHYLOGENETIC TREES
    FITCH, WM
    MARGOLIASH, E
    [J]. SCIENCE, 1967, 155 (3760) : 279 - +
  • [9] AN IMPROVED ALGORITHM FOR MATCHING BIOLOGICAL SEQUENCES
    GOTOH, O
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1982, 162 (03) : 705 - 708
  • [10] A CONTIG ASSEMBLY PROGRAM BASED ON SENSITIVE DETECTION OF FRAGMENT OVERLAPS
    HUANG, XQ
    [J]. GENOMICS, 1992, 14 (01) : 18 - 25