Subtree power analysis and species selection for comparative genomics

被引:13
作者
McAuliffe, JD
Jordan, MI
Pachter, L
机构
[1] Univ Calif Berkeley, Dept Math, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA
关键词
hypothesis testing; likelihood ratio; sequence analysis;
D O I
10.1073/pnas.0502790102
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Sequence comparison across multiple organisms aids in the detection of regions under selection. However, resource limitations require a prioritization of genomes to be sequenced. This prioritization should be grounded in two considerations: the lineal scope encompassing the biological phenomena of interest, and the optimal species within that scope for detecting functional elements. We introduce a statistical framework for optimal species subset selection, based on maximizing power to detect conserved sites. Analysis of a phylogenetic star topology shows theoretically that the optimal species subset is not in general the most evolutionarily diverged subset. We then demonstrate this finding empirically in a study of vertebrate species. Our results suggest that marsupials are prime sequencing candidates.
引用
收藏
页码:7900 / 7905
页数:6
相关论文
共 21 条
[1]  
Abramowitz M., 1974, HDB MATH FUNCTIONS
[2]   Comparative genomics at the vertebrate extremes [J].
Boffelli, D ;
Nobrega, MA ;
Rubin, EM .
NATURE REVIEWS GENETICS, 2004, 5 (06) :456-465
[3]   Phylogenetic shadowing of primate sequences to find functional regions of the human genome [J].
Boffelli, D ;
McAuliffe, J ;
Ovcharenko, D ;
Lewis, KD ;
Ovcharenko, I ;
Pachter, L ;
Rubin, EM .
SCIENCE, 2003, 299 (5611) :1391-1394
[4]   MAVID: Constrained ancestral alignment of multiple sequences [J].
Bray, N ;
Pachter, L .
GENOME RESEARCH, 2004, 14 (04) :693-699
[5]   Analysis of multiple genomic sequence alignments:: A web resource, online tools, and lessons learned from analysis of mammalian SCL loci [J].
Chapman, MA ;
Donaldson, IJ ;
Gilbert, J ;
Grafham, D ;
Rogers, J ;
Green, AR ;
Göttgens, B .
GENOME RESEARCH, 2004, 14 (02) :313-318
[6]   Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes [J].
Cooper, GM ;
Brudno, M ;
Green, ED ;
Batzoglou, S ;
Sidow, A .
GENOME RESEARCH, 2003, 13 (05) :813-820
[7]   Evolutionary discrimination of mammalian conserved non-genic sequences (CNGs) [J].
Dermitzakis, ET ;
Reymond, A ;
Scamuffa, N ;
Ucla, C ;
Kirkness, E ;
Rossier, C ;
Antonarakis, SE .
SCIENCE, 2003, 302 (5647) :1033-1035
[8]   A hidden Markov Model approach to variation among sites in rate of evolution [J].
Felsenstein, J ;
Churchill, GA .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (01) :93-104
[9]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[10]   Comparative genome analysis delimits a chromosomal domain and identifies key regulatory elements in the α globin cluster [J].
Flint, J ;
Tufarelli, C ;
Peden, J ;
Clark, K ;
Daniels, RJ ;
Haudison, R ;
Miller, W ;
Philipsen, S ;
Tan-Un, KC ;
NcMorrow, T ;
Frampton, J ;
Alter, BP ;
Frischauf, AM ;
Higgs, DR .
HUMAN MOLECULAR GENETICS, 2001, 10 (04) :371-382