SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in Proteins

被引:124
作者
Edwards, Richard J. [1 ,2 ]
Davey, Norman E. [1 ]
Shields, Denis C. [1 ]
机构
[1] Univ Coll Dublin, Univ Coll Dublin Complex & Adapt Syst Lab, Univ Coll Dublin Conway Inst Biomol & Biomed Sci, Dublin 2, Ireland
[2] Univ Southampton, Sch Biol Sci, Southampton, Hants, England
来源
PLOS ONE | 2007年 / 2卷 / 10期
基金
爱尔兰科学基金会;
关键词
D O I
10.1371/journal.pone.0000967
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background. Short linear motifs (SLiMs) in proteins are functional microdomains of fundamental importance in many biological systems. SLiMs typically consist of a 3 to 10 amino acid stretch of the primary protein sequence, of which as few as two sites may be important for activity, making identification of novel SLiMs extremely difficult. In particular, it can be very difficult to distinguish a randomly recurring "motif'' from a truly over-represented one. Incorporating ambiguous amino acid positions and/or variable-length wildcard spacers between defined residues further complicates the matter. Methodology/Principal Findings. In this paper we present two algorithms. SLiMBuild identifies convergently evolved, short motifs in a dataset of proteins. Motifs are built by combining dimers into longer patterns, retaining only those motifs occurring in a sufficient number of unrelated proteins. Motifs with fixed amino acid positions are identified and then combined to incorporate amino acid ambiguity and variable-length wildcard spacers. The algorithm is computationally efficient compared to alternatives, particularly when datasets include homologous proteins, and provides great flexibility in the nature of motifs returned. The SLiMChance algorithm estimates the probability of returned motifs arising by chance, correcting for the size and composition of the dataset, and assigns a significance value to each motif. These algorithms are implemented in a software package, SLiMFinder. SLiMFinder default settings identify known SLiMs with 100% specificity, and have a low false discovery rate on random test data. Conclusions/Significance. The efficiency of SLiMBuild and low false discovery rate of SLiMChance make SLiMFinder highly suited to high throughput motif discovery and individual high quality analyses alike. Examples of such analyses on real biological data, and how SLiMFinder results can help direct future discoveries, are provided. SLiMFinder is freely available for download under a GNU license from http://bioinformatics.ucd.ie/shields/software/slimfinder/.
引用
收藏
页数:11
相关论文
共 18 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   The universal protein resource (UniProt) [J].
Bairoch, A ;
Apweiler, R ;
Wu, CH ;
Barker, WC ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E ;
Huang, HZ ;
Lopez, R ;
Magrane, M ;
Martin, MJ ;
Natale, DA ;
O'Donovan, C ;
Redaschi, N ;
Yeh, LSL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D154-D159
[3]   Minimotif Miner: a tool for investigating protein function [J].
Balla, S ;
Thapar, V ;
Verma, S ;
Luong, T ;
Faghri, T ;
Huang, CH ;
Rajasekaran, S ;
del Campo, JJ ;
Shinn, JH ;
Mohler, WA ;
Maciejewski, MW ;
Gryk, MR ;
Piccirillo, B ;
Schiller, SR ;
Schiller, MR .
NATURE METHODS, 2006, 3 (03) :175-177
[4]   Ensembl 2006 [J].
Birney, E. ;
Andrews, D. ;
Caccamo, M. ;
Chen, Y. ;
Clarke, L. ;
Coates, G. ;
Cox, T. ;
Cunningham, F. ;
Curwen, V. ;
Cutts, T. ;
Down, T. ;
Durbin, R. ;
Fernandez-Suarez, X. M. ;
Flicek, P. ;
Graf, S. ;
Hammond, M. ;
Herrero, J. ;
Howe, K. ;
Iyer, V. ;
Jekosch, K. ;
Kahari, A. ;
Kasprzyk, A. ;
Keefe, D. ;
Kokocinski, F. ;
Kulesha, E. ;
London, D. ;
Longden, I. ;
Melsopp, C. ;
Meidl, P. ;
Overduin, B. ;
Parker, A. ;
Proctor, G. ;
Prlic, A. ;
Rae, M. ;
Rios, D. ;
Redmond, S. ;
Schuster, M. ;
Sealy, I. ;
Searle, S. ;
Severin, J. ;
Slater, G. ;
Smedley, D. ;
Smith, J. ;
Stabenau, A. ;
Stalker, J. ;
Trevanion, S. ;
Ureta-Vidal, A. ;
Vogel, J. ;
White, S. ;
Woodwark, C. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D556-D561
[5]  
CEOL A, 2006, NUCLEIC ACIDS RES, V29, P29
[6]   SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent [J].
Davey, Norman E. ;
Shields, Denis C. ;
Edwards, Richard J. .
NUCLEIC ACIDS RESEARCH, 2006, 34 (12) :3546-3554
[7]   Cloning and expression of a novel hepatitis B virus-binding protein from HepG2 cells [J].
De Falco, S ;
Ruvoletto, MG ;
Verdoliva, A ;
Ruvo, M ;
Raucci, A ;
Marino, M ;
Senatore, S ;
Cassani, G ;
Alberti, A ;
Pontisso, P ;
Fassina, G .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2001, 276 (39) :36613-36623
[8]   Identification and characterization of peptides that interact with hepatitis B virus via the putative receptor binding site [J].
Deng, Qiang ;
Zhai, Jian-wei ;
Michel, Marie-Louise ;
Zhang, Jun ;
Qin, Jun ;
Kong, Yu-ying ;
Zhang, Xin-xin ;
Budkowska, Agata ;
Tiollais, Pierre ;
Wang, Yuan ;
Xie, You-hua .
JOURNAL OF VIROLOGY, 2007, 81 (08) :4244-4254
[9]   IUPred:: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content [J].
Dosztányi, Z ;
Csizmok, V ;
Tompa, P ;
Simon, I .
BIOINFORMATICS, 2005, 21 (16) :3433-3434
[10]   Human protein reference database - 2006 update [J].
Mishra, Gopa R. ;
Suresh, M. ;
Kumaran, K. ;
Kannabiran, N. ;
Suresh, Shubha ;
Bala, P. ;
Shivakumar, K. ;
Anuradha, N. ;
Reddy, Raghunath ;
Raghavan, T. Madhan ;
Menon, Shalini ;
Hanumanthu, G. ;
Gupta, Malvika ;
Upendran, Sapna ;
Gupta, Shweta ;
Mahesh, M. ;
Jacob, Bincy ;
Mathew, Pinky ;
Chatterjee, Pritam ;
Arun, K. S. ;
Sharma, Salil ;
Chandrika, K. N. ;
Deshpande, Nandan ;
Palvankar, Kshitish ;
Raghavnath, R. ;
Krishnakanth, R. ;
Karathia, Hiren ;
Rekha, B. ;
Nayak, Rashmi ;
Vishnupriya, G. ;
Kumar, H. G. Mohan ;
Nagini, M. ;
Kumar, G. S. Sameer ;
Jose, Rojan ;
Deepthi, P. ;
Mohan, S. Sujatha ;
Gandhi, T. K. B. ;
Harsha, H. C. ;
Deshpande, Krishna S. ;
Sarker, Malabika ;
Prasad, T. S. Keshava ;
Pandey, Akhilesh .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D411-D414