Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals

被引:1502
作者
Yeo, G
Burge, CB
机构
[1] MIT, Dept Biol, Cambridge, MA 02139 USA
[2] MIT, Dept Brain & Cognit Sci, Cambridge, MA 02139 USA
关键词
maximum entropy; splice sites; nonneighboring dependencies; Markov models; maximal dependence decomposition; molecular sequence analysis; sequence motif;
D O I
10.1089/1066527041410418
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We propose a framework for modeling sequence motifs based on the maximum entropy principle (MEP). We recommend approximating short sequence motif distributions with the maximum entropy distribution (MED) consistent with low-order marginal constraints estimated from available data, which may include dependencies between nonadjacent as well as adjacent positions. Many maximum entropy models (MEMs) are specified by simply changing the set of constraints. Such models can be utilized to discriminate between signals and decoys. Classification performance using different MEMs gives insight into the relative importance of dependencies between different positions. We apply our framework to large datasets of RNA splicing signals. Our best models out-perform previous probabilistic models in the discrimination of human 5' (donor) and 3' (acceptor) splice sites from decoys. Finally, we discuss mechanistically motivated ways of comparing models.
引用
收藏
页码:377 / 394
页数:18
相关论文
共 30 条
[1]   Modeling splicing sites with pairwise correlations [J].
Arita, M ;
Tsuda, K ;
Asai, K .
BIOINFORMATICS, 2002, 18 :S27-S34
[2]  
Berger AL, 1996, COMPUT LINGUIST, V22, P39
[3]  
Brown D. T., 1959, INFORM CONTR, V2, P386, DOI DOI 10.1016/S0019-9958(59)80016-4
[4]  
BUEHLER E, 2001, WORKSH DAT MIN BIOIN
[5]   Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors [J].
Bulyk, ML ;
Johnson, PLF ;
Church, GM .
NUCLEIC ACIDS RESEARCH, 2002, 30 (05) :1255-1261
[6]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[7]  
Burge CB, 1998, N COMP BIOC, V32, P129
[8]  
Burge CB, 1999, RNA WORLD, P525
[9]   Modeling splice sites with Bayes networks [J].
Cai, DY ;
Delcher, A ;
Kao, B ;
Kasif, S .
BIOINFORMATICS, 2000, 16 (02) :152-158
[10]   The U1 snRNP protein U1C recognizes the 5′ splice site in the absence of base pairing [J].
Du, HS ;
Rosbash, M .
NATURE, 2002, 419 (6902) :86-90