A study on protein sequence alignment quality

被引:54
作者
Elofsson, A [1 ]
机构
[1] Univ Stockholm, Stockholm Bioinformat Ctr, SE-10691 Stockholm, Sweden
关键词
sequence alignment; hidden Markov models; dynamic programming; homology modeling; fold recognition;
D O I
10.1002/prot.10043
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
One of the most central methods in bioinformatics is the alignment of two protein or DNA sequences. However, so far large-scale benchmarks examining the quality of these alignments are scarce. On the other hand, recently several large-scale studies of the capacity of different methods to identify related sequences has led to new insights about the performance of fold recognition methods. To increase our understanding about fold recognition methods, we present a large-scale benchmark of alignment quality. We compare alignments from several different alignment methods, including sequence alignments, hidden Markov models, PSI-BLAST, CLUSTALW, and threading methods. For most methods, the alignment quality increases significantly at about 20% sequence identity. The difference in alignment quality between different methods is quite small, and the main difference can he seen at the exact positioning of the sharp rise in alignment quality, that is, around 15-20% sequence identity. The alignments are improved by using structural information. In general, the best alignments are obtained by methods that use predicted secondary structure information and sequence profiles obtained from PSI-BLAST. One interesting observation is that for different pairs many different methods create the best alignments. This finding implies that if a method that could select the best alignment method for each pair existed, a significant improvement of the alignment quality could be gained. (C) 2002 Wiley-Liss, Inc.
引用
收藏
页码:330 / 339
页数:10
相关论文
共 43 条
  • [11] Feng ZK, 1996, FOLD DES, V1, P123
  • [12] Fischer D, 1996, PROTEIN SCI, V5, P947
  • [13] FISCHER D, UNPUB CAFASP2 CRITIC
  • [14] The structural alignment between two proteins: Is there a unique answer?
    Godzik, A
    [J]. PROTEIN SCIENCE, 1996, 5 (07) : 1325 - 1338
  • [15] PROFILE ANALYSIS - DETECTION OF DISTANTLY RELATED PROTEINS
    GRIBSKOV, M
    MCLACHLAN, AD
    EISENBERG, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1987, 84 (13) : 4355 - 4358
  • [16] HOBOHM U, 1992, PROTEIN SCI, V1, P409
  • [17] Hubbard TJP, 1999, PROTEINS, P15
  • [18] Improving the quality of twilight-zone alignments
    Jaroszewski, L
    Rychlewski, L
    Godzik, A
    [J]. PROTEIN SCIENCE, 2000, 9 (08) : 1487 - 1496
  • [19] A NEW APPROACH TO PROTEIN FOLD RECOGNITION
    JONES, DT
    TAYLOR, WR
    THORNTON, JM
    [J]. NATURE, 1992, 358 (6381) : 86 - 89
  • [20] KIND - a non-redundant protein database
    Kallberg, Y
    Persson, B
    [J]. BIOINFORMATICS, 1999, 15 (03) : 260 - 261