Automatic assessment of alignment quality

被引:95
作者
Lassmann, T [1 ]
Sonnhammer, ELL [1 ]
机构
[1] Karolinska Inst, Ctr Genom & Bioinformat, S-17177 Stockholm, Sweden
关键词
D O I
10.1093/nar/gki1020
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multiple sequence alignments play a central role in the annotation of novel genomes. Given the biological and computational complexity of this task, the automatic generation of high-quality alignments remains challenging. Since multiple alignments are usually employed at the very start of data analysis pipelines, it is crucial to ensure high alignment quality. We describe a simple, yet elegant, solution to assess the biological accuracy of alignments automatically. Our approach is based on the comparison of several alignments of the same sequences. We introduce two functions to compare alignments: the average overlap score and the multiple overlap score. The former identifies difficult alignment cases by expressing the similarity among several alignments, while the latter estimates the biological correctness of individual alignments. We implemented both functions in the MUMSA program and demonstrate the overall robustness and accuracy of both functions on three large benchmark sets.
引用
收藏
页码:7120 / 7128
页数:9
相关论文
共 38 条
  • [11] The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence
    Hickson, RE
    Simon, C
    Perrey, SW
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (04) : 530 - 539
  • [12] MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
    Katoh, K
    Misawa, K
    Kuma, K
    Miyata, T
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (14) : 3059 - 3066
  • [13] MAFFT version 5: improvement in accuracy of multiple sequence alignment
    Katoh, K
    Kuma, K
    Toh, H
    Miyata, T
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 (02) : 511 - 518
  • [14] Quality assessment of multiple alignment programs
    Lassmann, T
    Sonnhammer, ELL
    [J]. FEBS LETTERS, 2002, 529 (01) : 126 - 130
  • [15] Multiple alignment of complete sequences (MACS) in the post-genomic era
    Lecompte, O
    Thompson, JD
    Plewniak, F
    Thierry, JC
    Poch, O
    [J]. GENE, 2001, 270 (1-2) : 17 - 30
  • [16] Multiple sequence alignment using partial order graphs
    Lee, C
    Grasso, C
    Sharlow, MF
    [J]. BIOINFORMATICS, 2002, 18 (03) : 452 - 464
  • [17] DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment
    Morgenstern, B
    [J]. BIOINFORMATICS, 1999, 15 (03) : 211 - 218
  • [18] Multiple DNA and protein sequence alignment based on segment-to-segment comparison
    Morgenstern, B
    Dress, A
    Werner, T
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (22) : 12098 - 12103
  • [19] AltAVisT: Comparing alternative multiple sequence alignments
    Morgenstern, B
    Goel, S
    Sczyrba, A
    Dress, A
    [J]. BIOINFORMATICS, 2003, 19 (03) : 425 - 426
  • [20] Effects of nucleotide sequence alignment on phylogeny estimation: A case study of 18S rDNAs of Apicomplexa
    Morrison, DA
    Ellis, JT
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 1997, 14 (04) : 428 - 441