Distinguishing regulatory DNA from neutral sites

被引:99
作者
Elnitski, L
Hardison, RC
Li, J
Yang, S
Kolbe, D
Eswara, P
O'Connor, MJ
Schwartz, S
Miller, W
Chiaromonte, F [1 ]
机构
[1] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
[2] Penn State Univ, Dept Hlth Evaluat Sci, University Pk, PA 16802 USA
[3] Penn State Univ, Dept Biochem & Mol Biol, University Pk, PA 16802 USA
[4] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
[5] Penn State Univ, Dept Biol, University Pk, PA 16802 USA
关键词
D O I
10.1101/gr.817703
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We explore several computational approaches to analyzing interspecies genomic sequence alignments, aiming to distinguish regulatory regions from neutrally evolving DNA. Human-mouse genomic alignments were collected for three sets of human regions: (1) experimentally defined gene regulatory regions, (2) well-characterized exons (coding sequences, as a positive control), and (3) interspersed repeats thought to have inserted before the human-mouse split (a good model for neutrally evolving DNA). Models that potentially could distinguish functional noncoding sequences from neutral DNA were evaluated on these three data sets, as well as bulk genome alignments. Our analyses show that discrimination based on frequencies of individual nucleotide pairs or gaps (i.e., of possible alignment columns) is only partially successful. In contrast, scoring procedures that include the alignment context, based on frequencies of short runs of alignment columns, dramatically improve separation between regulatory and neutral features. Such scoring functions should aid in the identification of putative regulatory regions throughout the human genome.
引用
收藏
页码:64 / 72
页数:9
相关论文
共 28 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Human and mouse gene structure: Comparative analysis and application to exon prediction [J].
Batzoglou, S ;
Pachter, L ;
Mesirov, JP ;
Berger, B ;
Lander, ES .
GENOME RESEARCH, 2000, 10 (07) :950-958
[3]   Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome [J].
Berman, BP ;
Nibu, Y ;
Pfeiffer, BD ;
Tomancak, P ;
Celniker, SE ;
Levine, M ;
Rubin, GM ;
Eisen, MB .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (02) :757-762
[4]   Harvesting the mouse genome [J].
Botcherby, M .
COMPARATIVE AND FUNCTIONAL GENOMICS, 2002, 3 (04) :319-324
[5]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[6]  
Chiaromonte F, 2002, Pac Symp Biocomput, P115
[7]  
Cook R. D., 1998, WILEY PROB STAT
[8]   Evolution of transcription factor binding sites in mammalian gene regulatory regions: Conservation and turnover [J].
Dermitzakis, ET ;
Clark, AG .
MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (07) :1114-1121
[9]  
Elnitski L, 1997, J BIOL CHEM, V272, P369
[10]   Comparative genome analysis delimits a chromosomal domain and identifies key regulatory elements in the α globin cluster [J].
Flint, J ;
Tufarelli, C ;
Peden, J ;
Clark, K ;
Daniels, RJ ;
Haudison, R ;
Miller, W ;
Philipsen, S ;
Tan-Un, KC ;
NcMorrow, T ;
Frampton, J ;
Alter, BP ;
Frischauf, AM ;
Higgs, DR .
HUMAN MOLECULAR GENETICS, 2001, 10 (04) :371-382