Natively unstructured loops differ from other loops

被引:76
作者
Schlessinger, Avner [1 ]
Liu, Jinfeng
Rost, Burkhard
机构
[1] Columbia Univ, Dept Biochem & Mol Biophys, New York, NY 10027 USA
[2] Columbia Univ, Ctr Computat Biol & Bioinformat, New York, NY USA
[3] Columbia Univ, NE Structural Genom Consortium, New York, NY USA
关键词
D O I
10.1371/journal.pcbi.0030140
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Natively unstructured or disordered protein regions may increase the functional complexity of an organism; they are particularly abundant in eukaryotes and often evade structure determination. Many computational methods predict unstructured regions by training on outliers in otherwise well-ordered structures. Here, we introduce an approach that uses a neural network in a very different and novel way. We hypothesize that very long contiguous segments with nonregular secondary structure (NORS regions) differ significantly from regular, well-structured loops, and that a method detecting such features could predict natively unstructured regions. Training our new method, NORSnet, on predicted information rather than on experimental data yielded three major advantages: it removed the overlap between testing and training, it systematically covered entire proteomes, and it explicitly focused on one particular aspect of unstructured regions with a simple structural interpretation, namely that they are loops. Our hypothesis was correct: well-structured and unstructured loops differ so substantially that NORSnet succeeded in their distinction. Benchmarks on previously used and new experimental data of unstructured regions revealed that NORSnet performed very well. Although it was not the best single prediction method, NORSnet was sufficiently accurate to flag unstructured regions in proteins that were previously not annotated. In one application, NORSnet revealed previously undetected unstructured regions in putative targets for structural genomics and may thereby contribute to increasing structural coverage of large eukaryotic families. NORSnet found unstructured regions more often in domain boundaries than expected at random. In another application, we estimated that 50%-70% of all worm proteins observed to have more than seven protein-protein interaction partners have unstructured regions. The comparative analysis between NORSnet and DISOPRED2 suggested that long unstructured loops are a major part of unstructured regions in molecular networks.
引用
收藏
页码:1335 / 1346
页数:12
相关论文
共 86 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   SCOP database in 2004: refinements integrate structure and sequence family data [J].
Andreeva, A ;
Howorth, D ;
Brenner, SE ;
Hubbard, TJP ;
Chothia, C ;
Murzin, AG .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D226-D229
[3]  
[Anonymous], 2005, The proteomics protocols handbook. Totowa (New Jersey)
[4]   Structure of a conserved domain common to the transcription factors TFIIS, elongin A, and CRSP70 [J].
Booth, V ;
Koth, CM ;
Edwards, AM ;
Arrowsmith, CH .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2000, 275 (40) :31266-31268
[5]  
Branden C., 1999, Introduction to Protein Structure, V2nd
[6]   Accurate prediction of protein disordered regions by mining protein structure data [J].
Cheng, JL ;
Sweredoski, MJ ;
Baldi, P .
DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (03) :213-222
[7]  
Christendat D, 2000, NAT STRUCT BIOL, V7, P903
[8]   IUPred:: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content [J].
Dosztányi, Z ;
Csizmok, V ;
Tompa, P ;
Simon, I .
BIOINFORMATICS, 2005, 21 (16) :3433-3434
[9]   The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins [J].
Dosztányi, Z ;
Csizmók, V ;
Tompa, P ;
Simon, I .
JOURNAL OF MOLECULAR BIOLOGY, 2005, 347 (04) :827-839
[10]  
Dunker A K, 2000, Genome Inform Ser Workshop Genome Inform, V11, P161