MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons

被引:473
作者
Ranwez, Vincent [1 ]
Harispe, Sebastien [1 ,2 ]
Delsuc, Frederic [1 ]
Douzery, Emmanuel J. P. [1 ]
机构
[1] Univ Montpellier 2, CNRS, UMR5554, Inst Sci Evolut, Montpellier, France
[2] Ecole Mines dAles, Ctr Rech LGI2P, Nimes, France
来源
PLOS ONE | 2011年 / 6卷 / 09期
关键词
FUNCTIONAL DIVERGENCE; DNA; PHYLOGENY; EVOLUTION; SEAVIEW; TIME;
D O I
10.1371/journal.pone.0022594
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment. We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence. MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse.
引用
收藏
页数:10
相关论文
共 52 条
  • [1] TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations
    Abascal, Federico
    Zardoya, Rafael
    Telford, Maximilian J.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : W7 - W13
  • [2] GAP COSTS FOR MULTIPLE SEQUENCE ALIGNMENT
    ALTSCHUL, SF
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 1989, 138 (03) : 297 - 309
  • [3] OPTIMAL SEQUENCE ALIGNMENT USING AFFINE GAP COSTS
    ALTSCHUL, SF
    ERICKSON, BW
    [J]. BULLETIN OF MATHEMATICAL BIOLOGY, 1986, 48 (5-6) : 603 - 616
  • [4] Arvestad L, 1997, LECT NOTES COMPUT SC, V1264, P180
  • [5] transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences
    Bininda-Emonds, ORP
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [6] Fast Statistical Alignment
    Bradley, Robert K.
    Roberts, Adam
    Smoot, Michael
    Juvekar, Sudeep
    Do, Jaeyoung
    Dewey, Colin
    Holmes, Ian
    Pachter, Lior
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (05)
  • [7] Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs
    Chevreux, B
    Pfisterer, T
    Drescher, B
    Driesel, AJ
    Müller, WEG
    Wetter, T
    Suhai, S
    [J]. GENOME RESEARCH, 2004, 14 (06) : 1147 - 1159
  • [8] Dayhoff M O., 1978, Atlas of Protein Seq Struct, ppp 345
  • [9] Additional Molecular Support for the New Chordate Phylogeny
    Delsuc, Frederic
    Tsagkogeorga, Georgia
    Lartillot, Nicolas
    Philippe, Herve
    [J]. GENESIS, 2008, 46 (11) : 592 - 604
  • [10] Morphological and molecular evidence for a stepwise evolutionary transition from teeth to baleen in mysticete whales
    Demere, Thomas A.
    Mcgowen, Michael R.
    Berta, Annalisa
    Gatesy, John
    [J]. SYSTEMATIC BIOLOGY, 2008, 57 (01) : 15 - 37