ESPERR: Learning strong and weak signals in genomic sequence alignments to identify functional elements

被引:89
作者
Taylor, James [1 ]
Tyekucheva, Svitlana [1 ]
King, David C. [1 ]
Hardison, Ross C. [1 ]
Miller, Webb [1 ]
Chiaromonte, Francesca [1 ]
机构
[1] Penn State Univ, Ctr Comparat Genom & Bioinformat, University Pk, PA 16802 USA
关键词
D O I
10.1101/gr.4537706
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Genomic sequence signals-such as base composition, presence of particular motifs, or evolutionary constraint-have been used effectively to identify functional elements. However, approaches based only on specific signals known to correlate with function can be quite limiting. When training data are available, application of computational learning algorithms to multispecies alignments has the potential to capture broader and more informative sequence and evolutionary patterns that better characterize a class of elements. However, effective exploitation of patterns in multispecies alignments is impeded by the vast number of possible alignment columns and by a limited understanding of which particular strings of columns may characterize a given class. We have developed a computational method, called ESPERR (evolutionary and sequence pattern extraction through reduced representations), which uses training examples to learn encodings of multispecies alignments into reduced forms tailored for the prediction of chosen classes of functional elements. ESPERR produces a greatly improved Regulatory Potential score, which can discriminate regulatory regions from neutral sites with excellent accuracy (similar to 94%). This score captures strong signals (GC content and conservation), as well as subtler signals (with small contributions from many different alignment patterns) that characterize the regulatory elements in our training set. ESPERR is also effective for predicting other classes of functional elements, as we show for DNaseI hypersensitive sites and highly conserved regions with developmental enhancer activity. Our software, training data, and genome-wide predictions are available from our Web site (http://www.bx.psu.edu/projects/esperr).
引用
收藏
页码:1596 / 1604
页数:9
相关论文
共 28 条
[1]   Into the heart of darkness: large-scale clustering of human non-coding DNA [J].
Bejerano, Gill ;
Haussler, David ;
Blanchette, Mathieu .
BIOINFORMATICS, 2004, 20 :40-48
[2]   Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome [J].
Bieda, M ;
Xu, XQ ;
Singer, MA ;
Green, R ;
Farnham, PJ .
GENOME RESEARCH, 2006, 16 (05) :595-605
[3]   Aligning multiple genomic sequences with the threaded blockset aligner [J].
Blanchette, M ;
Kent, WJ ;
Riemer, C ;
Elnitski, L ;
Smit, AFA ;
Roskin, KM ;
Baertsch, R ;
Rosenbloom, K ;
Clawson, H ;
Green, ED ;
Haussler, D ;
Miller, W .
GENOME RESEARCH, 2004, 14 (04) :708-715
[4]  
Bühlmann P, 1998, ANN STAT, V26, P48
[5]   Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs [J].
Cawley, S ;
Bekiranov, S ;
Ng, HH ;
Kapranov, P ;
Sekinger, EA ;
Kampa, D ;
Piccolboni, A ;
Sementchenko, V ;
Cheng, J ;
Williams, AJ ;
Wheeler, R ;
Wong, B ;
Drenkow, J ;
Yamanaka, M ;
Patel, S ;
Brubaker, S ;
Tammana, H ;
Helt, G ;
Struhl, K ;
Gingeras, TR .
CELL, 2004, 116 (04) :499-509
[6]   Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome [J].
Cooper, SJ ;
Trinklein, ND ;
Anton, ED ;
Nguyen, L ;
Myers, RM .
GENOME RESEARCH, 2006, 16 (01) :1-10
[7]   Turnover of binding sites for transcription factors involved in early Drosophila development [J].
Costas, J ;
Casares, F ;
Vieira, J .
GENE, 2003, 310 :215-220
[8]   Evolution of transcription factor binding sites in mammalian gene regulatory regions: Conservation and turnover [J].
Dermitzakis, ET ;
Clark, AG .
MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (07) :1114-1121
[9]  
Durbin R., 1998, BIOL SEQUENCE ANAL
[10]   Distinguishing regulatory DNA from neutral sites [J].
Elnitski, L ;
Hardison, RC ;
Li, J ;
Yang, S ;
Kolbe, D ;
Eswara, P ;
O'Connor, MJ ;
Schwartz, S ;
Miller, W ;
Chiaromonte, F .
GENOME RESEARCH, 2003, 13 (01) :64-72