A model of the statistical power of comparative genome sequence analysis

被引:96
作者
Eddy, SR [1 ]
机构
[1] Washington Univ, Sch Med, Howard Hughes Med Inst, St Louis, MO 63110 USA
[2] Washington Univ, Sch Med, Dept Genet, St Louis, MO 63110 USA
关键词
D O I
10.1371/journal.pbio.0030010
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying conserved sequences. The model leads to some useful rules of thumb. For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identifying conserved regions scales inversely with the size of the conserved feature to be detected. At short evolutionary distances, the number of comparative genomes required also scales inversely with distance. These scaling behaviors provide some intuition for future comparative genome sequencing needs, such as the proposed use of " phylogenetic shadowing'' methods using closely related comparative genomes, and the feasibility of high- resolution detection of small conserved features.
引用
收藏
页码:95 / 102
页数:8
相关论文
共 30 条
  • [1] Accuracy and power of Bayes prediction of amino acid sites under positive selection
    Anisimova, M
    Bielawski, JP
    Yang, ZH
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (06) : 950 - 958
  • [2] BERGMAN CM, 2002, GENOME BIOL, V3, DOI DOI 10.1186/GB-2002-3-12-RESEARCH0086
  • [3] Comparative genomics at the vertebrate extremes
    Boffelli, D
    Nobrega, MA
    Rubin, EM
    [J]. NATURE REVIEWS GENETICS, 2004, 5 (06) : 456 - 465
  • [4] Phylogenetic shadowing of primate sequences to find functional regions of the human genome
    Boffelli, D
    McAuliffe, J
    Ovcharenko, D
    Lewis, KD
    Ovcharenko, I
    Pachter, L
    Rubin, EM
    [J]. SCIENCE, 2003, 299 (5611) : 1391 - 1394
  • [5] Finding functional features in Saccharomyces genomes by phylogenetic footprinting
    Cliften, P
    Sudarsanam, P
    Desikan, A
    Fulton, L
    Fulton, B
    Majors, J
    Waterston, R
    Cohen, BA
    Johnston, M
    [J]. SCIENCE, 2003, 301 (5629) : 71 - 76
  • [6] Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis
    Cliften, PF
    Hillier, LW
    Fulton, L
    Graves, T
    Miner, T
    Gish, WR
    Waterston, RH
    Johnston, M
    [J]. GENOME RESEARCH, 2001, 11 (07) : 1175 - 1186
  • [7] Characterization of evolutionary rates and constraints in three mammalian genomes
    Cooper, GM
    Brudno, M
    Stone, EA
    Dubchak, I
    Batzoglou, S
    Sidow, A
    [J]. GENOME RESEARCH, 2004, 14 (04) : 539 - 548
  • [8] Genomic regulatory regions: insights from comparative sequence analysis
    Cooper, GM
    Sidow, A
    [J]. CURRENT OPINION IN GENETICS & DEVELOPMENT, 2003, 13 (06) : 604 - 610
  • [9] Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes
    Cooper, GM
    Brudno, M
    Green, ED
    Batzoglou, S
    Sidow, A
    [J]. GENOME RESEARCH, 2003, 13 (05) : 813 - 820
  • [10] Turnover of binding sites for transcription factors involved in early Drosophila development
    Costas, J
    Casares, F
    Vieira, J
    [J]. GENE, 2003, 310 : 215 - 220