Automatic assessment of alignment quality

被引:95
作者
Lassmann, T [1 ]
Sonnhammer, ELL [1 ]
机构
[1] Karolinska Inst, Ctr Genom & Bioinformat, S-17177 Stockholm, Sweden
关键词
D O I
10.1093/nar/gki1020
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multiple sequence alignments play a central role in the annotation of novel genomes. Given the biological and computational complexity of this task, the automatic generation of high-quality alignments remains challenging. Since multiple alignments are usually employed at the very start of data analysis pipelines, it is crucial to ensure high alignment quality. We describe a simple, yet elegant, solution to assess the biological accuracy of alignments automatically. Our approach is based on the comparison of several alignments of the same sequences. We introduce two functions to compare alignments: the average overlap score and the multiple overlap score. The former identifies difficult alignment cases by expressing the similarity among several alignments, while the latter estimates the biological correctness of individual alignments. We implemented both functions in the MUMSA program and demonstrate the overall robustness and accuracy of both functions on three large benchmark sets.
引用
收藏
页码:7120 / 7128
页数:9
相关论文
共 38 条
  • [21] Recent progress in multiple sequence alignment: a survey
    Notredame, C
    [J]. PHARMACOGENOMICS, 2002, 3 (01) : 131 - 144
  • [22] T-Coffee: A novel method for fast and accurate multiple sequence alignment
    Notredame, C
    Higgins, DG
    Heringa, J
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 302 (01) : 205 - 217
  • [23] APDB: a novel measure for benchmarking sequence alignment methods without reference alignments
    O'Sullivan, Orla
    Zehnder, Mark
    Higgins, Des
    Bucher, Philipp
    Grosdidier, Aurelien
    Notredame, Cedric
    [J]. BIOINFORMATICS, 2003, 19 : i215 - i221
  • [24] PEARSON WR, 1990, METHOD ENZYMOL, V183, P63
  • [25] IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON
    PEARSON, WR
    LIPMAN, DJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) : 2444 - 2448
  • [26] AL2CO: calculation of positional conservation in a protein sequence alignment
    Pei, JM
    Grishin, NV
    [J]. BIOINFORMATICS, 2001, 17 (08) : 700 - 712
  • [27] Tcoffee@igs: a web server for computing, evaluating and combining multiple sequence alignments
    Poirot, O
    O'Toole, E
    Notredame, C
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3503 - 3506
  • [28] A novel method for multiple alignment of sequences with repeated and shuffled elements
    Raphael, B
    Zhi, DG
    Tang, HX
    Pevzner, P
    [J]. GENOME RESEARCH, 2004, 14 (11) : 2336 - 2346
  • [29] BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs
    Thompson, JD
    Plewniak, F
    Poch, O
    [J]. BIOINFORMATICS, 1999, 15 (01) : 87 - 88
  • [30] BAliBASE 3.0: Latest developments of the multiple sequence alignment benchmark
    Thompson, JD
    Koehl, P
    Ripp, R
    Poch, O
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 61 (01) : 127 - 136