Finding motifs in protein secondary structure for use in function prediction

被引:10
作者
Ferré, Sébastien
King, Ross D.
机构
[1] Univ Rennes 1, Irisa, F-35042 Rennes, France
[2] Univ Wales, Dept Comp Sci, Aberystwyth SY23 3DB, Dyfed, Wales
关键词
functional genomics; protein secondary structure; flexible motifs; dichotomic search algorithm;
D O I
10.1089/cmb.2006.13.719
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This paper presents a novel algorithm for the discovery of biological sequence motifs. Our motivation is the prediction of gene function. We seek to discover motifs and combinations of motifs in the secondary structure of proteins for application to the understanding and prediction of functional classes. The motifs found by our algorithm allow both flexible length structural elements and flexible length gaps and can be of arbitrary length. The algorithm is based on neither top-down nor bottom-up search, but rather is dichotomic. It is also "anytime," so that fixed termination of the search is not necessary. We have applied our algorithm to yeast sequence data to discover rules predicting function classes from secondary structure. These resultant rules are informative, consistent with known biology, and a contribution to scientific knowledge. Surprisingly, the rules also demonstrate that secondary structure prediction algorithms are effective for membrane proteins and suggest that the association between secondary structure and function is stronger in membrane proteins than globular ones. We demonstrate that our algorithm can successfully predict gene function directly from predicted secondary structure; e.g., we correctly predict the gene YGL124c to be involved in the functional class "cytoplasmic and nuclear degradation." Datasets and detailed results (generated motifs, rules, evaluation on test dataset, and predictions on unknown dataset) are available at www.aber.ac.uk/compsci/Research/bio/dss/yeast.ss.mips/, and www.genepredictions.org.
引用
收藏
页码:719 / 731
页数:13
相关论文
共 31 条
[1]  
ALBERTLORINCZ H, 2003, SIAM INT C DATA MINI
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
Apostolico A., 1997, HDB FORMAL LANGUAGES, P361, DOI DOI 10.1007/978-3-662-07675-0_8
[4]  
BAKER BS, 1998, LNCS, V1461, P79
[5]  
BRAZMA A, 1995, 113 U BERG DEP INF
[6]  
BREJOVA B, 2000, CS200022 U WAT
[7]   Predicting gene function in Saccharomyces cerevisiae [J].
Clare, A. ;
King, R. D. .
BIOINFORMATICS, 2003, 19 :II42-II49
[8]   CAFASP3 in the spotlight of EVA [J].
Eyrich, VA ;
Przybylski, D ;
Koh, IYY ;
Grana, O ;
Pazos, F ;
Valencia, A ;
Rost, B .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 :548-560
[9]  
Ferré S, 2000, LECT NOTES ARTIF INT, V1867, P371
[10]  
FERRE S, 2005, ADV MINING GRAPHS TR, V66, P1