Accurate and automated classification of protein secondary structure with PsiCSI

被引:35
作者
Hung, LH [1 ]
Samudrala, R [1 ]
机构
[1] Univ Washington, Dept Microbiol, Seattle, WA 98109 USA
关键词
NMR; chemical shifts; secondary structure; neural networks;
D O I
10.1110/ps.0222303
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
PsiCSI is a highly accurate and automated method of assigning secondary structure from NMR data, which is a useful intermediate step in the determination of tertiary structures. The method combines information from chemical shifts and protein sequence using three layers of neural networks. Training and testing was performed on a suite of 92 proteins (9437 residues) with known secondary and tertiary structure. Using a stringent cross-validation procedure in which the target and homologous proteins were removed from the databases used for training the neural networks, an average 89% Q3 accuracy (per residue) was observed. This is an increase of 6.2% and 5.5% (representing 36% and 33% fewer errors) over methods that use chemical shifts (CSI) or sequence information (Psipred) alone. In addition, PsiCSI improves upon the translation of chemical shift information to secondary structure (Q3 = 87.4%) and is able to use sequence information as an effective substitute for sparse NMR data (Q3 = 86.9% without C-13 shifts and Q3 = 86.8% with only H. shifts available). Finally, errors made by PsiCSI almost exclusively involve, the interchange of helix or strand with coil and not helix with strand (<2.5 occurrences per 10000 residues). The automation, increased accuracy, absence of gross errors, and robustness with regards to sparse data make PsiCSI ideal for high-throughput applications, and should improve the effectiveness of hybrid NMR/de novo structure determination methods. A Web server is available for users to submit data and have the assignment returned.
引用
收藏
页码:288 / 295
页数:8
相关论文
共 33 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   The NOESY JIGSAW: Automated protein secondary structure and main-chain assignment from sparse, unassigned NMR data [J].
Bailey-Kellogg, C ;
Widge, A ;
Kelley, JJ ;
Berardi, MJ ;
Bushweller, JH ;
Donald, BR .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (3-4) :537-558
[3]  
Bonneau R, 2001, PROTEINS, P119
[4]   Rapid protein fold determination using secondary chemical shifts and cross-hydrogen bond 15N-13C′ scalar couplings (3hbJNC′) [J].
Bonvin, AMJJ ;
Houben, K ;
Guenneugues, M ;
Kaptein, R ;
Boelens, R .
JOURNAL OF BIOMOLECULAR NMR, 2001, 21 (03) :221-233
[5]   A tour of structural genomics [J].
Brenner, SE .
NATURE REVIEWS GENETICS, 2001, 2 (10) :801-809
[6]   An overview of structural genomics [J].
Burley, SK .
NATURE STRUCTURAL BIOLOGY, 2000, 7 (Suppl 11) :932-934
[7]   PREDICTION OF PROTEIN CONFORMATION [J].
CHOU, PY ;
FASMAN, GD .
BIOCHEMISTRY, 1974, 13 (02) :222-245
[8]   Protein backbone angle restraints from searching a database for chemical shift and sequence homology [J].
Cornilescu, G ;
Delaglio, F ;
Bax, A .
JOURNAL OF BIOMOLECULAR NMR, 1999, 13 (03) :289-302
[9]  
Cuff JA, 1999, PROTEINS, V34, P508, DOI 10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO
[10]  
2-4