Detection of nonneutral substitution rates on mammalian phylogenies

被引:1512
作者
Pollard, Katherine S. [1 ]
Hubisz, Melissa J. [2 ]
Rosenbloom, Kate R. [3 ]
Siepel, Adam [2 ]
机构
[1] Univ Calif San Francisco, Gladstone Inst, San Francisco, CA 94158 USA
[2] Cornell Univ, Dept Biol Stat & Computat Biol, Dept Biol Stat & Computat Biol, Ithaca, NY 14850 USA
[3] Univ Calif Santa Cruz, Sch Engn, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
基金
美国国家科学基金会;
关键词
FACTOR-BINDING-SITES; NONCODING SEQUENCES; REGIONS; GENOME; SELECTION; CONSTRAINT; EVOLUTION; DISCOVERY; ALIGNMENT; DIVERGENCE;
D O I
10.1101/gr.097857.109
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny. Here we examine the more general problem of detecting departures from the neutral rate of substitution in either direction, possibly in a clade-specific manner. We consider four statistical, phylogenetic tests for addressing this problem: a likelihood ratio test, a score test, a test based on exact distributions of numbers of substitutions, and the genomic evolutionary rate profiling (GERP) test. All four tests have been implemented in a freely available program called phyloP. Based on extensive simulation experiments, these tests are remarkably similar in statistical power. With 36 mammalian species, they all appear to be capable of fairly good sensitivity with low false-positive rates in detecting strong selection at individual nucleotides, moderate selection in 3-bp elements, and weaker or clade-specific selection in longer elements. By applying phyloP to mammalian multiple alignments from the ENCODE project, we shed light on patterns of conservation/acceleration in known and predicted functional elements, approximate fractions of sites subject to constraint, and differences in clade-specific selection in the primate and glires clades. We also describe new "Conservation" tracks in the UCSC Genome Browser that display both phyloP and phastCons scores for genome-wide alignments of 44 vertebrate species.
引用
收藏
页码:110 / 121
页数:12
相关论文
共 63 条
[1]   Analysis of sequence conservation at nucleotide resolution [J].
Asthana, Saurabh ;
Roytberg, Mikhail ;
Stamatoyannopoulos, John ;
Sunyaev, Shamil .
PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (12) :2559-2568
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   Fast-evolving noncoding sequences in the human genome [J].
Bird, Christine P. ;
Stranger, Barbara E. ;
Liu, Maureen ;
Thomas, Daryl J. ;
Ingle, Catherine E. ;
Beazley, Claude ;
MillerO, Webb ;
Hurles, Matthew E. ;
Dermitzakis, Emmanouil T. .
GENOME BIOLOGY, 2007, 8 (06)
[4]   Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project [J].
Birney, Ewan ;
Stamatoyannopoulos, John A. ;
Dutta, Anindya ;
Guigo, Roderic ;
Gingeras, Thomas R. ;
Margulies, Elliott H. ;
Weng, Zhiping ;
Snyder, Michael ;
Dermitzakis, Emmanouil T. ;
Stamatoyannopoulos, John A. ;
Thurman, Robert E. ;
Kuehn, Michael S. ;
Taylor, Christopher M. ;
Neph, Shane ;
Koch, Christoph M. ;
Asthana, Saurabh ;
Malhotra, Ankit ;
Adzhubei, Ivan ;
Greenbaum, Jason A. ;
Andrews, Robert M. ;
Flicek, Paul ;
Boyle, Patrick J. ;
Cao, Hua ;
Carter, Nigel P. ;
Clelland, Gayle K. ;
Davis, Sean ;
Day, Nathan ;
Dhami, Pawandeep ;
Dillon, Shane C. ;
Dorschner, Michael O. ;
Fiegler, Heike ;
Giresi, Paul G. ;
Goldy, Jeff ;
Hawrylycz, Michael ;
Haydock, Andrew ;
Humbert, Richard ;
James, Keith D. ;
Johnson, Brett E. ;
Johnson, Ericka M. ;
Frum, Tristan T. ;
Rosenzweig, Elizabeth R. ;
Karnani, Neerja ;
Lee, Kirsten ;
Lefebvre, Gregory C. ;
Navas, Patrick A. ;
Neri, Fidencio ;
Parker, Stephen C. J. ;
Sabo, Peter J. ;
Sandstrom, Richard ;
Shafer, Anthony .
NATURE, 2007, 447 (7146) :799-816
[5]   Aligning multiple genomic sequences with the threaded blockset aligner [J].
Blanchette, M ;
Kent, WJ ;
Riemer, C ;
Elnitski, L ;
Smit, AFA ;
Roskin, KM ;
Baertsch, R ;
Rosenbloom, K ;
Clawson, H ;
Green, ED ;
Haussler, D ;
Miller, W .
GENOME RESEARCH, 2004, 14 (04) :708-715
[6]   Phylogenetic shadowing of primate sequences to find functional regions of the human genome [J].
Boffelli, D ;
McAuliffe, J ;
Ovcharenko, D ;
Lewis, KD ;
Ovcharenko, I ;
Pachter, L ;
Rubin, EM .
SCIENCE, 2003, 299 (5611) :1391-1394
[7]   Fast Statistical Alignment [J].
Bradley, Robert K. ;
Roberts, Adam ;
Smoot, Michael ;
Juvekar, Sudeep ;
Do, Jaeyoung ;
Dewey, Colin ;
Holmes, Ian ;
Pachter, Lior .
PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (05)
[8]  
Casella G., 2002, Statistical inference, V2nd edition
[9]   The share of human genomic DNA under selection estimated from human-mouse genomic alignments [J].
Chiaromonte, F ;
Weber, RJ ;
Roskin, KM ;
Diekhans, M ;
Kent, WJ ;
Haussler, D .
COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY, 2003, 68 :245-254
[10]   Distribution and intensity of constraint in mammalian genomic sequence [J].
Cooper, GM ;
Stone, EA ;
Asimenos, G ;
Green, ED ;
Batzoglou, S ;
Sidow, A .
GENOME RESEARCH, 2005, 15 (07) :901-913