Improving fold recognition without folds

被引:42
作者
Przybylski, D
Rost, B
机构
[1] Columbia Univ, CUBIC, Dept Biochem & Mol Biophys, New York, NY 10032 USA
[2] Columbia Univ, Dept Phys, New York, NY 10027 USA
[3] Columbia Univ, Ctr Computat Bio & Bioinformat C2B2, New York, NY 10032 USA
[4] Columbia Univ, NESG, Dept Biochem & Mol Biophys, New York, NY 10032 USA
关键词
protein structure prediction; fold recognition; sequence alignment; database search; secondary structure;
D O I
10.1016/j.jmb.2004.05.041
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The most reliable way to align two proteins of unknown structure is through sequence-profile and profile-profile alignment methods. If the structure for one of the two is known, fold recognition methods outperform purely sequence-based alignments. Here, we introduced a novel method that aligns generalised sequence and predicted structure profiles. Using predicted 1D structure (secondary structure and solvent accessibility) significantly improved over sequence-only methods, both in terms of correctly recognising pairs of proteins with different sequences and similar structures and in terms of correctly aligning the pairs. The scores obtained by our generalised scoring matrix followed an extreme value distribution; this yielded accurate estimates of the statistical significance of our alignments. We found that mistakes in 1D structure predictions correlated between proteins from different sequence-structure families. The impact of this surprising result was that our method succeeded in significantly out-performing sequence-only methods even without explicitly using structural information from any of the two. Since AGAPE also outperformed established methods that rely on 3D information, we made it available through http://www.predictprotein.org. If we solved the problem of CPU-time required to apply AGAPE on millions of proteins, our results could also impact everyday database searches. (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:255 / 269
页数:15
相关论文
共 93 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   Continuum secondary structure captures protein flexibility [J].
Anderson, CAF ;
Palmer, AG ;
Brunak, S ;
Rost, B .
STRUCTURE, 2002, 10 (02) :175-184
[4]   Combining evidence using p-values: application to sequence homology searches [J].
Bailey, TL ;
Gribskov, M .
BIOINFORMATICS, 1998, 14 (01) :48-54
[5]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[6]  
Bates PA, 2001, PROTEINS, P39
[7]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[8]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[9]   The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[10]   A flexible motif search technique based on generalized profiles [J].
Bucher, P ;
Karplus, K ;
Moeri, N ;
Hofmann, K .
COMPUTERS & CHEMISTRY, 1996, 20 (01) :3-23