In search for more accurate alignments in the twilight zone

被引:64
作者
Jaroszewski, L [1 ]
Li, WZ [1 ]
Godzik, A [1 ]
机构
[1] Burnham Inst, Program Bioinformat & Biol Complex, La Jolla, CA 92037 USA
关键词
profile-profile alignments; suboptimal alignments; sequence profiles; FFAS;
D O I
10.1110/ps.4820102
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A major bottleneck in comparative modeling is the alignment quality; this is especially true for proteins whose distant relationships could be reliably recognized only by recent advances in fold recognition. The best algorithms excel in recognizing distant homologs but often produce incorrect alignments for over 50% of protein pairs in large fold-prediction bench-marks. The alignments obtained by sequence-sequence or sequence-structure matching, algorithms differ significantly from the structural alignments. To study this problem, we developed a simplified method to explicitly enumerate all possible alignments for a pair of proteins. This allowed us to estimate the number of significantly different alignments for a given scoring method that score better than the structural alignment. Using several examples of distantly related proteins, we show that for standard sequence-sequence alignment methods, the number of significantly different alignments is usually large, often about 10(10) alternatives. This distance decreases when the alignment method is improved, but the number is still too large for the brute force enumeration approach. More effective strategies were needed, so we evaluated and compared two well-known approaches for searching the space of suboptimal alignments. We combined their best features and produced a hybrid method, which yielded alignments that surpassed the original alignments for about 50% of protein pairs with minimal computational effort.
引用
收藏
页码:1702 / 1713
页数:12
相关论文
共 43 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   The Protein Data Bank and the challenge of structural genomics [J].
Berman, HM ;
Bhat, TN ;
Bourne, PE ;
Feng, ZK ;
Gilliland, G ;
Weissig, H ;
Westbrook, J .
NATURE STRUCTURAL BIOLOGY, 2000, 7 (Suppl 11) :957-959
[3]   Structural genomics: beyond the Human Genome Project [J].
Burley, SK ;
Almo, SC ;
Bonanno, JB ;
Capel, M ;
Chance, MR ;
Gaasterland, T ;
Lin, DW ;
Sali, A ;
Studier, FW ;
Swaminathan, S .
NATURE GENETICS, 1999, 23 (02) :151-157
[4]  
*CASP4, 2000, 4 M CRIT ASS TECHN P
[5]   PROTEIN MODEL STRUCTURE EVALUATION USING THE SOLVATION FREE-ENERGY OF FOLDING [J].
CHICHE, L ;
GREGORET, LM ;
COHEN, FE ;
KOLLMAN, PA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1990, 87 (08) :3240-3243
[6]   VERIFY3D: Assessment of protein models with three-dimensional profiles [J].
Eisenberg, D ;
Luthy, R ;
Bowie, JU .
MACROMOLECULAR CRYSTALLOGRAPHY, PT B, 1997, 277 :396-404
[7]   The structural alignment between two proteins: Is there a unique answer? [J].
Godzik, A .
PROTEIN SCIENCE, 1996, 5 (07) :1325-1338
[8]   REGULARITIES IN INTERACTION PATTERNS OF GLOBULAR-PROTEINS [J].
GODZIK, A ;
SKOLNICK, J ;
KOLINSKI, A .
PROTEIN ENGINEERING, 1993, 6 (08) :801-810
[9]   TOPOLOGY FINGERPRINT APPROACH TO THE INVERSE PROTEIN FOLDING PROBLEM [J].
GODZIK, A ;
KOLINSKI, A ;
SKOLNICK, J .
JOURNAL OF MOLECULAR BIOLOGY, 1992, 227 (01) :227-238
[10]   Recognizing misfolded and distorted protein structures by the assumption-based similarity score [J].
Golovanov, AP ;
Volynsky, PE ;
Ermakova, SB ;
Arseniev, AS .
PROTEIN ENGINEERING, 1999, 12 (01) :31-40