Identifying novel constrained elements by exploiting biased substitution patterns

被引:246
作者
Garber, Manuel [1 ]
Guttman, Mitchell [1 ,2 ]
Clamp, Michele [1 ]
Zody, Michael C. [1 ,3 ]
Friedman, Nir [4 ]
Xie, Xiaohui [1 ,5 ]
机构
[1] MIT & Harvard, Broad Inst, Cambridge, MA 02142 USA
[2] MIT, Dept Biol, Cambridge, MA 02142 USA
[3] Uppsala Univ, Dept Med Biochem & Microbiol, Uppsala, Sweden
[4] Hebrew Univ Jerusalem, Inst Life Sci, Sch Comp Sci & Engn, IL-91904 Jerusalem, Israel
[5] Univ Calif Irvine, Inst Genom & Bioinformat, Dept Comp Sci, Irvine, CA 92697 USA
基金
美国国家科学基金会;
关键词
HUMAN GENOME; FUNCTIONAL ELEMENTS; VERTEBRATE GENOMES; SEQUENCE-ANALYSIS; IDENTIFICATION; DISCOVERY; 1-PERCENT; BROWSER; MAMMALS; FAMILY;
D O I
10.1093/bioinformatics/btp190
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Comparing the genomes from closely related species provides a powerful tool to identify functional elements in a reference genome. Many methods have been developed to identify conserved sequences across species; however, existing methods only model conservation as a decrease in the rate of mutation and have ignored selection acting on the pattern of mutations. Results: We present a new approach that takes advantage of deeply sequenced clades to identify evolutionary selection by uncovering not only signatures of rate-based conservation but also substitution patterns characteristic of sequence undergoing natural selection. We describe a new statistical method for modeling biased nucleotide substitutions, a learning algorithm for inferring site-specific substitution biases directly from sequence alignments and a hidden Markov model for detecting constrained elements characterized by biased substitutions. We show that the new approach can identify significantly more degenerate constrained sequences than rate-based methods. Applying it to the ENCODE regions, we identify as much as 10.2% of these regions are under selection.
引用
收藏
页码:I54 / I62
页数:9
相关论文
共 29 条
[1]  
[Anonymous], J ROYAL STAT SOC B
[2]   Analysis of sequence conservation at nucleotide resolution [J].
Asthana, Saurabh ;
Roytberg, Mikhail ;
Stamatoyannopoulos, John ;
Sunyaev, Shamil .
PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (12) :2559-2568
[3]   A distal enhancer and an ultraconserved exon are derived from a novel retroposon [J].
Bejerano, G ;
Lowe, CB ;
Ahituv, N ;
King, B ;
Siepel, A ;
Salama, SR ;
Rubin, EM ;
Kent, WJ ;
Haussler, D .
NATURE, 2006, 441 (7089) :87-90
[4]   Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project [J].
Birney, Ewan ;
Stamatoyannopoulos, John A. ;
Dutta, Anindya ;
Guigo, Roderic ;
Gingeras, Thomas R. ;
Margulies, Elliott H. ;
Weng, Zhiping ;
Snyder, Michael ;
Dermitzakis, Emmanouil T. ;
Stamatoyannopoulos, John A. ;
Thurman, Robert E. ;
Kuehn, Michael S. ;
Taylor, Christopher M. ;
Neph, Shane ;
Koch, Christoph M. ;
Asthana, Saurabh ;
Malhotra, Ankit ;
Adzhubei, Ivan ;
Greenbaum, Jason A. ;
Andrews, Robert M. ;
Flicek, Paul ;
Boyle, Patrick J. ;
Cao, Hua ;
Carter, Nigel P. ;
Clelland, Gayle K. ;
Davis, Sean ;
Day, Nathan ;
Dhami, Pawandeep ;
Dillon, Shane C. ;
Dorschner, Michael O. ;
Fiegler, Heike ;
Giresi, Paul G. ;
Goldy, Jeff ;
Hawrylycz, Michael ;
Haydock, Andrew ;
Humbert, Richard ;
James, Keith D. ;
Johnson, Brett E. ;
Johnson, Ericka M. ;
Frum, Tristan T. ;
Rosenzweig, Elizabeth R. ;
Karnani, Neerja ;
Lee, Kirsten ;
Lefebvre, Gregory C. ;
Navas, Patrick A. ;
Neri, Fidencio ;
Parker, Stephen C. J. ;
Sabo, Peter J. ;
Sandstrom, Richard ;
Shafer, Anthony .
NATURE, 2007, 447 (7146) :799-816
[5]   Aligning multiple genomic sequences with the threaded blockset aligner [J].
Blanchette, M ;
Kent, WJ ;
Riemer, C ;
Elnitski, L ;
Smit, AFA ;
Roskin, KM ;
Baertsch, R ;
Rosenbloom, K ;
Clawson, H ;
Green, ED ;
Haussler, D ;
Miller, W .
GENOME RESEARCH, 2004, 14 (04) :708-715
[6]   Distribution and intensity of constraint in mammalian genomic sequence [J].
Cooper, GM ;
Stone, EA ;
Asimenos, G ;
Green, ED ;
Batzoglou, S ;
Sidow, A .
GENOME RESEARCH, 2005, 15 (07) :901-913
[7]   Exact and heuristic algorithms for the Indel Maximum Likelihood Problem [J].
Diallo, Abdoulaye Banire ;
Makarenkov, Vladimir ;
Blanchette, Mathieu .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2007, 14 (04) :446-461
[8]  
Durbin R., 1998, Analysis, V356, DOI [10.1017/CBO9780511790492, DOI 10.1017/CBO9780511790492]
[9]   A model of the statistical power of comparative genome sequence analysis [J].
Eddy, SR .
PLOS BIOLOGY, 2005, 3 (01) :95-102
[10]  
Felsenstein J., 2004, Inferring phylogenies