MotifMap: a human genome-wide map of candidate regulatory motif sites

被引:105
作者
Xie, Xiaohui [1 ,2 ]
Rigor, Paul [1 ,2 ]
Baldi, Pierre [1 ,2 ]
机构
[1] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92697 USA
[2] Univ Calif Irvine, Inst Genom & Bioinformat, Irvine, CA 92697 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
FACTOR BINDING-SITES; SYSTEMATIC DISCOVERY; FUNCTIONAL ELEMENTS; VERTEBRATE; SELECTION; DATABASE;
D O I
10.1093/bioinformatics/btn605
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Achieving a comprehensive map of all the regulatory elements encoded in the human genome is a fundamental challenge of biomedical research. So far, only a small fraction of the regulatory elements have been characterized, and there is great interest in applying computational techniques to systematically discover these elements. Such efforts, however, have been significantly hindered by the overwhelming size of non-coding DNA regions and the statistical variability and complex spatial organizations of mammalian regulatory elements. Results: Here we combine information from multiple mammalian genomes to derive the first fairly comprehensive map of regulatory elements in the human genome. We develop a procedure for identifying regulatory sites, with high levels of conservation across different species, using a new scoring scheme, the Bayesian branch length score ( BBLS). Using BBLS, we predict 1.5 million regulatory sites, corresponding to 380 known regulatory motifs, with an estimated false discovery rate ( FDR) of < 50%. We demonstrate that the method is particularly effective for 155 motifs, for which 121 056 sites can be mapped with an estimated FDR of < 10%. Over 28K SNPs are located in regions overlapping the 1.5 million predicted motif sites, suggesting potential functional implications for these SNPs. We have deposited these elements in a database and created a user-friendly web server for the retrieval, analysis and visualization of these elements. The initial map provides a systematic view of gene regulation in the genome, which will be re. ned as additional motifs become available.
引用
收藏
页码:167 / 174
页数:8
相关论文
共 28 条
[1]   High-resolution profiling of histone methylations in the human genome [J].
Barski, Artern ;
Cuddapah, Suresh ;
Cui, Kairong ;
Roh, Tae-Young ;
Schones, Dustin E. ;
Wang, Zhibin ;
Wei, Gang ;
Chepelev, Iouri ;
Zhao, Keji .
CELL, 2007, 129 (04) :823-837
[2]   Analysis of xbx genes in C-elegans [J].
Efimenko, E ;
Bubb, K ;
Mak, HY ;
Holzman, T ;
Leroux, MR ;
Ruvkun, G ;
Thomas, JH ;
Swoboda, P .
DEVELOPMENT, 2005, 132 (08) :1923-1934
[3]   Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach [J].
Elemento, O ;
Tavazoie, S .
GENOME BIOLOGY, 2005, 6 (02)
[4]   The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates [J].
Ettwiller, L ;
Paten, B ;
Souren, M ;
Loosli, F ;
Wittbrodt, J ;
Birney, E .
GENOME BIOLOGY, 2005, 6 (12)
[5]   A second generation human haplotype map of over 3.1 million SNPs [J].
Frazer, Kelly A. ;
Ballinger, Dennis G. ;
Cox, David R. ;
Hinds, David A. ;
Stuve, Laura L. ;
Gibbs, Richard A. ;
Belmont, John W. ;
Boudreau, Andrew ;
Hardenbol, Paul ;
Leal, Suzanne M. ;
Pasternak, Shiran ;
Wheeler, David A. ;
Willis, Thomas D. ;
Yu, Fuli ;
Yang, Huanming ;
Zeng, Changqing ;
Gao, Yang ;
Hu, Haoran ;
Hu, Weitao ;
Li, Chaohua ;
Lin, Wei ;
Liu, Siqi ;
Pan, Hao ;
Tang, Xiaoli ;
Wang, Jian ;
Wang, Wei ;
Yu, Jun ;
Zhang, Bo ;
Zhang, Qingrun ;
Zhao, Hongbin ;
Zhao, Hui ;
Zhou, Jun ;
Gabriel, Stacey B. ;
Barry, Rachel ;
Blumenstiel, Brendan ;
Camargo, Amy ;
Defelice, Matthew ;
Faggart, Maura ;
Goyette, Mary ;
Gupta, Supriya ;
Moore, Jamie ;
Nguyen, Huy ;
Onofrio, Robert C. ;
Parkin, Melissa ;
Roy, Jessica ;
Stahl, Erich ;
Winchester, Ellen ;
Ziaugra, Liuda ;
Altshuler, David ;
Shen, Yan .
NATURE, 2007, 449 (7164) :851-U3
[6]   Genome-wide mapping of in vivo protein-DNA interactions [J].
Johnson, David S. ;
Mortazavi, Ali ;
Myers, Richard M. ;
Wold, Barbara .
SCIENCE, 2007, 316 (5830) :1497-1502
[7]   The UCSC Genome Browser Database [J].
Karolchik, D ;
Baertsch, R ;
Diekhans, M ;
Furey, TS ;
Hinrichs, A ;
Lu, YT ;
Roskin, KM ;
Schwartz, M ;
Sugnet, CW ;
Thomas, DJ ;
Weber, RJ ;
Haussler, D ;
Kent, WJ .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :51-54
[8]   Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome [J].
Kim, Tae Hoon ;
Abdullaev, Ziedulla K. ;
Smith, Andrew D. ;
Ching, Keith A. ;
Loukinov, Dmitri I. ;
Green, Roland D. ;
Zhang, Michael Q. ;
Lobanenkov, Victor V. ;
Ren, Bing .
CELL, 2007, 128 (06) :1231-1245
[9]   Finding cis-regulatory elements using comparative genomics:: Some lessons from ENCODE data [J].
King, David C. ;
Taylor, James ;
Zhang, Ying ;
Cheng, Yong ;
Lawson, Heather A. ;
Martin, Joel ;
Chiaromonte, Francesca ;
Miller, Webb ;
Hardison, Ross C. .
GENOME RESEARCH, 2007, 17 (06) :775-786
[10]   Sampling motifs on phylogenetic trees [J].
Li, XM ;
Wong, WH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (27) :9481-9486