Alignment of protein sequences by their profiles

被引:139
作者
Marti-Renom, MA
Madhusudhan, MS
Sali, A
机构
[1] Univ Calif San Francisco, Dept Biopharmaceut Sci, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, Dept Pharmaceut Chem, San Francisco, CA 94143 USA
[3] Univ Calif San Francisco, Calif Inst Quantitat Biomed Res, San Francisco, CA 94143 USA
关键词
protein sequence alignment; sequence profiles; comparative protein structure modeling;
D O I
10.1110/ps.03379804
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST. pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.
引用
收藏
页码:1071 / 1087
页数:17
相关论文
共 87 条
[1]   Do aligned sequences share the same fold? [J].
Abagyan, RA ;
Batalov, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 273 (01) :355-368
[2]   Combining multiple structure and sequence alignments to improve sequence detection and alignment: Application to the SH2 domains of Janus kinases [J].
Al-Lazikani, B ;
Sheinerman, FB ;
Honig, B .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (26) :14796-14801
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[5]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[6]   Protein structure prediction and structural genomics [J].
Baker, D ;
Sali, A .
SCIENCE, 2001, 294 (5540) :93-96
[7]   Protein sequence alignment techniques [J].
Barton, GJ .
ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 1998, 54 :1139-1146
[8]  
Barton GJ, 1996, PROTEIN STRUCTURE PR
[9]   The Protein Data Bank [J].
Berman, HM ;
Battistuz, T ;
Bhat, TN ;
Bluhm, WF ;
Bourne, PE ;
Burkhardt, K ;
Iype, L ;
Jain, S ;
Fagan, P ;
Marvin, J ;
Padilla, D ;
Ravichandran, V ;
Schneider, B ;
Thanki, N ;
Weissig, H ;
Westbrook, JD ;
Zardecki, C .
ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2002, 58 :899-907
[10]   Pairwise sequence alignment below the twilight zone [J].
Blake, JD ;
Cohen, FE .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 307 (02) :721-735