Estimation and efficient computation of the true probability of recurrence of short linear protein sequence motifs in unrelated proteins

被引:12
作者
Davey, Norman E. [1 ,2 ,3 ,4 ]
Edwards, Richard J. [5 ]
Shields, Denis C. [1 ,2 ,3 ]
机构
[1] Univ Coll Dublin, UCD Complex & Adapt Syst Lab, Dublin 2, Ireland
[2] Univ Coll Dublin, UCD Conway Inst Biomol & Biomed Res, Dublin 2, Ireland
[3] Univ Coll Dublin, UCD Sch Med & Med Sci, Dublin 2, Ireland
[4] EMBL Struct & Computat Biol Unit, D-69117 Heidelberg, Germany
[5] Univ Southampton, Sch Biol Sci, Southampton, Hants, England
来源
BMC BIOINFORMATICS | 2010年 / 11卷
基金
爱尔兰科学基金会;
关键词
DISCOVERY; DATABASE; SITES; CONSERVATION; PREDICTION; NETWORKS; PATTERNS; RESOURCE; UPDATE; SERVER;
D O I
10.1186/1471-2105-11-14
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Large datasets of protein interactions provide a rich resource for the discovery of Short Linear Motifs (SLiMs) that recur in unrelated proteins. However, existing methods for estimating the probability of motif recurrence may be biased by the size and composition of the search dataset, such that p-value estimates from different datasets, or from motifs containing different numbers of non wildcard positions, are not strictly comparable. Here, we develop more exact methods and explore the potential biases of computationally efficient approximations. Results: A widely used heuristic for the calculation of motif over-representation approximates motif probability by assuming that all proteins have the same length and composition. We introduce p(v), which calculates the probability exactly. Secondly, the recently introduced SLiMFinder statistic Sig, accounts for multiple testing (across all possible motifs) in motif discovery. However, it approximates the probability of all other possible motifs, occurring with a score of p or less, as being equal to p. Here, we show that the exhaustive calculation of the probability of all possible motif occurrences that are as rare or rarer than the motif of interest, Sig', may be carried out efficiently by grouping motifs of a common probability (i.e. those which have permuted orders of the same residues). Sig(v)', which corrects both approximations, is shown to be uniformly distributed in a random dataset when searching for non-ambiguous motifs, indicating that it is a robust significance measure. Conclusions: A method is presented to compute exactly the true probability of a non-ambiguous short protein sequence motif, and the utility of an approximate approach for novel motif discovery across a large number of datasets is demonstrated.
引用
收藏
页数:10
相关论文
共 26 条
[1]   Characterization of protein hubs by inferring interacting motifs from protein interactions [J].
Aragues, Ramon ;
Sali, Andrej ;
Bonet, Jaume ;
Marti-Renom, Marc A. ;
Oliva, Baldo .
PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (09) :1761-1771
[2]   The universal protein resource (UniProt) [J].
Bairoch, A ;
Apweiler, R ;
Wu, CH ;
Barker, WC ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E ;
Huang, HZ ;
Lopez, R ;
Magrane, M ;
Martin, MJ ;
Natale, DA ;
O'Donovan, C ;
Redaschi, N ;
Yeh, LSL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D154-D159
[3]   Minimotif Miner: a tool for investigating protein function [J].
Balla, S ;
Thapar, V ;
Verma, S ;
Luong, T ;
Faghri, T ;
Huang, CH ;
Rajasekaran, S ;
del Campo, JJ ;
Shinn, JH ;
Mohler, WA ;
Maciejewski, MW ;
Gryk, MR ;
Piccirillo, B ;
Schiller, SR ;
Schiller, MR .
NATURE METHODS, 2006, 3 (03) :175-177
[4]   The EHI motif in metazoan transcription factors [J].
Copley, RR .
BMC GENOMICS, 2005, 6 (1)
[5]   Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery [J].
Davey, Norman E. ;
Shields, Denis C. ;
Edwards, Richard J. .
BIOINFORMATICS, 2009, 25 (04) :443-450
[6]   Understanding eukaryotic linear motifs and their role in cell signaling and regulation [J].
Diella, Francesca ;
Haslam, Niall ;
Chica, Claudia ;
Budd, Aidan ;
Michael, Sushama ;
Brown, Nigel P. ;
Trave, Gilles ;
Gibson, Toby J. .
FRONTIERS IN BIOSCIENCE-LANDMARK, 2008, 13 :6580-6603
[7]   Phospho.ELM: a database of phosphorylation sites - update 2008 [J].
Diella, Francesca ;
Gould, Cathryn M. ;
Chica, Claudia ;
Via, Allegra ;
Gibson, Toby J. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D240-D244
[8]   IUPred:: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content [J].
Dosztányi, Z ;
Csizmok, V ;
Tompa, P ;
Simon, I .
BIOINFORMATICS, 2005, 21 (16) :3433-3434
[9]   SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in Proteins [J].
Edwards, Richard J. ;
Davey, Norman E. ;
Shields, Denis C. .
PLOS ONE, 2007, 2 (10)
[10]   Local structural disorder imparts plasticity on linear motifs [J].
Fuxreiter, Monika ;
Tompa, Peter ;
Simon, Istvan .
BIOINFORMATICS, 2007, 23 (08) :950-956