CSpritz: accurate prediction of protein disorder segments with annotation for homology, secondary structure and linear motifs

被引:66
作者
Walsh, Ian [1 ]
Martin, Alberto J. M. [1 ]
Di Domenico, Tomas [1 ]
Vullo, Alessandro [2 ]
Pollastri, Gianluca [3 ]
Tosatto, Silvio C. E. [1 ]
机构
[1] Univ Padua, Dept Biol, I-35131 Padua, Italy
[2] European Bioinformat Inst, EMBL Outstn, Hinxton CB10 1SD, England
[3] Univ Coll Dublin, Sch Comp Sci & Informat, Dublin 4, Ireland
关键词
INTRINSICALLY UNSTRUCTURED PROTEINS; SEQUENCE; REGIONS; DATABASE; SERVER; SINGLE;
D O I
10.1093/nar/gkr411
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
CSpritz is a web server for the prediction of intrinsic protein disorder. It is a combination of previous Spritz with two novel orthogonal systems developed by our group (Punch and ESpritz). Punch is based on sequence and structural templates trained with support vector machines. ESpritz is an efficient single sequence method based on bidirectional recursive neural networks. Spritz was extended to filter predictions based on structural homologues. After extensive testing, predictions are combined by averaging their probabilities. The CSpritz website can elaborate single or multiple predictions for either short or long disorder. The server provides a global output page, for download and simultaneous statistics of all predictions. Links are provided to each individual protein where the amino acid sequence and disorder prediction are displayed along with statistics for the individual protein. As a novel feature, CSpritz provides information about structural homologues as well as secondary structure and short functional linear motifs in each disordered segment. Benchmarking was performed on the very recent CASP9 data, where CSpritz would have ranked consistently well with a Sw measure of 49.27 and AUC of 0.828. The server, together with help and methods pages including examples, are freely available at URL: http://protein.bio.unipd.it/cspritz/.
引用
收藏
页码:W190 / W196
页数:7
相关论文
共 47 条
[1]   Simple consensus procedures are effective and sufficient in secondary structure prediction [J].
Albrecht, M ;
Tosatto, SCE ;
Lengauer, T ;
Valle, G .
PROTEIN ENGINEERING, 2003, 16 (07) :459-462
[2]  
Ali KM, 1996, MACH LEARN, V24, P173, DOI 10.1007/BF00058611
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]   The principled design of large-scale recursive neural network architectures-DAG-RNNs and the protein structure prediction problem [J].
Baldi, P ;
Pollastri, G .
JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (04) :575-602
[5]   The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data [J].
Berman, Helen ;
Henrick, Kim ;
Nakamura, Haruki ;
Markley, John L. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D301-D303
[6]   Accurate prediction of protein disordered regions by mining protein structure data [J].
Cheng, JL ;
Sweredoski, MJ ;
Baldi, P .
DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (03) :213-222
[7]   Understanding eukaryotic linear motifs and their role in cell signaling and regulation [J].
Diella, Francesca ;
Haslam, Niall ;
Chica, Claudia ;
Budd, Aidan ;
Michael, Sushama ;
Brown, Nigel P. ;
Trave, Gilles ;
Gibson, Toby J. .
FRONTIERS IN BIOSCIENCE-LANDMARK, 2008, 13 :6580-6603
[8]   The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins [J].
Dosztányi, Z ;
Csizmók, V ;
Tompa, P ;
Simon, I .
JOURNAL OF MOLECULAR BIOLOGY, 2005, 347 (04) :827-839
[9]  
Dunker A K, 2000, Genome Inform Ser Workshop Genome Inform, V11, P161
[10]   Intrinsic disorder and protein function [J].
Dunker, AK ;
Brown, CJ ;
Lawson, JD ;
Iakoucheva, LM ;
Obradovic, Z .
BIOCHEMISTRY, 2002, 41 (21) :6573-6582