Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes

被引:2820
作者
Siepel, A [1 ]
Bejerano, G
Pedersen, JS
Hinrichs, AS
Hou, MM
Rosenbloom, K
Clawson, H
Spieth, J
Hillier, LW
Richards, S
Weinstock, GM
Wilson, RK
Gibbs, RA
Kent, WJ
Miller, W
Haussler, D
机构
[1] Univ Calif Santa Cruz, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
[2] Univ Calif Santa Cruz, Howard Hughes Med Inst, Santa Cruz, CA 95064 USA
[3] Penn State Univ, Ctr Comparat Genom & Bioinformat, University Pk, PA 16802 USA
[4] Washington Univ, Sch Med, Genome Sequencing Ctr, St Louis, MO 63108 USA
[5] Baylor Coll Med, Human Genome Sequencing Ctr, Dept Mol & Human Genet, Houston, TX 77030 USA
关键词
D O I
10.1101/gr.3715005
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide Multiple alignments of five vertebrate species (human, Mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments Of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, Subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based oil this model. The predicted elements cover roughly 3%-8% of the human genome (depending on the details of the calibration procedure) and Substantially higher fractions of the more compact Drosophila melanogaster (37%-53%), Caenorhabditis elegans (18%-37%), and Saccharaomyces cerevisiae (47%-68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie Outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties With Ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3' UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of ail enrichment for RNA secondary structure.
引用
收藏
页码:1034 / 1050
页数:17
相关论文
共 92 条
  • [1] A phylogenetic analysis reveals an unusual sequence conservation within introns involved in RNA editing
    Aruscavage, PJ
    Bass, BL
    [J]. RNA, 2000, 6 (02) : 257 - 269
  • [2] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [3] Ultraconserved elements in the human genome
    Bejerano, G
    Pheasant, M
    Makunin, I
    Stephen, S
    Kent, WJ
    Mattick, JS
    Haussler, D
    [J]. SCIENCE, 2004, 304 (5675) : 1321 - 1325
  • [4] Into the heart of darkness: large-scale clustering of human non-coding DNA
    Bejerano, Gill
    Haussler, David
    Blanchette, Mathieu
    [J]. BIOINFORMATICS, 2004, 20 : 40 - 48
  • [5] Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences
    Bergman, CM
    Kreitman, M
    [J]. GENOME RESEARCH, 2001, 11 (08) : 1335 - 1345
  • [6] BERGMAN CM, 2002, GENOME BIOL, V3, DOI DOI 10.1186/GB-2002-3-12-RESEARCH0086
  • [7] Mechanisms of alternative pre-messenger RNA splicing
    Black, DL
    [J]. ANNUAL REVIEW OF BIOCHEMISTRY, 2003, 72 : 291 - 336
  • [8] Aligning multiple genomic sequences with the threaded blockset aligner
    Blanchette, M
    Kent, WJ
    Riemer, C
    Elnitski, L
    Smit, AFA
    Roskin, KM
    Baertsch, R
    Rosenbloom, K
    Clawson, H
    Green, ED
    Haussler, D
    Miller, W
    [J]. GENOME RESEARCH, 2004, 14 (04) : 708 - 715
  • [9] Phylogenetic shadowing of primate sequences to find functional regions of the human genome
    Boffelli, D
    McAuliffe, J
    Ovcharenko, D
    Lewis, KD
    Ovcharenko, I
    Pachter, L
    Rubin, EM
    [J]. SCIENCE, 2003, 299 (5611) : 1391 - 1394
  • [10] MAVID: Constrained ancestral alignment of multiple sequences
    Bray, N
    Pachter, L
    [J]. GENOME RESEARCH, 2004, 14 (04) : 693 - 699