A study on protein sequence alignment quality

被引:54
作者
Elofsson, A [1 ]
机构
[1] Univ Stockholm, Stockholm Bioinformat Ctr, SE-10691 Stockholm, Sweden
关键词
sequence alignment; hidden Markov models; dynamic programming; homology modeling; fold recognition;
D O I
10.1002/prot.10043
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
One of the most central methods in bioinformatics is the alignment of two protein or DNA sequences. However, so far large-scale benchmarks examining the quality of these alignments are scarce. On the other hand, recently several large-scale studies of the capacity of different methods to identify related sequences has led to new insights about the performance of fold recognition methods. To increase our understanding about fold recognition methods, we present a large-scale benchmark of alignment quality. We compare alignments from several different alignment methods, including sequence alignments, hidden Markov models, PSI-BLAST, CLUSTALW, and threading methods. For most methods, the alignment quality increases significantly at about 20% sequence identity. The difference in alignment quality between different methods is quite small, and the main difference can he seen at the exact positioning of the sharp rise in alignment quality, that is, around 15-20% sequence identity. The alignments are improved by using structural information. In general, the best alignments are obtained by methods that use predicted secondary structure information and sequence profiles obtained from PSI-BLAST. One interesting observation is that for different pairs many different methods create the best alignments. This finding implies that if a method that could select the best alignment method for each pair existed, a significant improvement of the alignment quality could be gained. (C) 2002 Wiley-Liss, Inc.
引用
收藏
页码:330 / 339
页数:10
相关论文
共 43 条
  • [1] Do aligned sequences share the same fold?
    Abagyan, RA
    Batalov, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 273 (01) : 355 - 368
  • [2] Alexandrov NN, 1998, PROTEIN SCI, V7, P254
  • [3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [4] Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships
    Brenner, SE
    Chothia, C
    Hubbard, TJP
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) : 6073 - 6078
  • [5] LiveBench-1: Continuous benchmarking of protein structure prediction servers
    Bujnicki, JM
    Elofsson, A
    Fischer, D
    Rychlewski, L
    [J]. PROTEIN SCIENCE, 2001, 10 (02) : 352 - 361
  • [6] A study of quality measures for protein threading models
    Cristobal, Susana
    Zemla, Adam
    Fischer, Daniel
    Rychlewski, Leszek
    Elofsson, Arne
    [J]. BMC BIOINFORMATICS, 2001, 2 (1)
  • [7] Di Francesco V, 1997, PROTEINS, P123
  • [8] Structure-based evaluation of sequence comparison and fold recognition alignment accuracy
    Domingues, FS
    Lackner, P
    Andreeva, A
    Sippl, MJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 297 (04) : 1003 - 1013
  • [9] Profile hidden Markov models
    Eddy, SR
    [J]. BIOINFORMATICS, 1998, 14 (09) : 755 - 763
  • [10] EDDY SR, 1997, HMMER HIDDEN MARKOV