Statistical power of phylo-HMM for evolutionarily conserved element detection

被引:10
作者
Fan, Xiaodan [1 ]
Zhu, Jun [2 ]
Schadt, Eric E. [2 ]
Liu, Jun S. [1 ]
机构
[1] Harvard Univ, Dept Stat, Boston, MA 02115 USA
[2] Merck & Co Inc, Seattle, WA USA
来源
BMC BIOINFORMATICS | 2007年 / 8卷
关键词
MULTIPLE SEQUENCE ALIGNMENT; RATE VARIATION MODELS; HIDDEN MARKOV MODEL; MAXIMUM-LIKELIHOOD; REGIONS; CODON;
D O I
10.1186/1471-2105-8-374
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: An important goal of comparative genomics is the identification of functional elements through conservation analysis. Phylo-HMM was recently introduced to detect conserved elements based on multiple genome alignments, but the method has not been rigorously evaluated. Results: We report here a simulation study to investigate the power of phylo-HMM. We show that the power of the phylo-HMM approach depends on many factors, the most important being the number of species-specific genomes used and evolutionary distances between pairs of species. This finding is consistent with results reported by other groups for simpler comparative genomics models. In addition, the conservation ratio of conserved elements and the expected length of the conserved elements are also major factors. In contrast, the influence of the topology and the nucleotide substitution model are relatively minor factors. Conclusion: Our results provide for general guidelines on how to select the number of genomes and their evolutionary distance in comparative genomics studies, as well as the level of power we can expect under different parameter settings.
引用
收藏
页数:13
相关论文
共 47 条
[1]   Aligning multiple genomic sequences with the threaded blockset aligner [J].
Blanchette, M ;
Kent, WJ ;
Riemer, C ;
Elnitski, L ;
Smit, AFA ;
Roskin, KM ;
Baertsch, R ;
Rosenbloom, K ;
Clawson, H ;
Green, ED ;
Haussler, D ;
Miller, W .
GENOME RESEARCH, 2004, 14 (04) :708-715
[2]   Phylogenetic shadowing of primate sequences to find functional regions of the human genome [J].
Boffelli, D ;
McAuliffe, J ;
Ovcharenko, D ;
Lewis, KD ;
Ovcharenko, I ;
Pachter, L ;
Rubin, EM .
SCIENCE, 2003, 299 (5611) :1391-1394
[3]   MAVID: Constrained ancestral alignment of multiple sequences [J].
Bray, N ;
Pachter, L .
GENOME RESEARCH, 2004, 14 (04) :693-699
[4]   LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA [J].
Brudno, M ;
Do, CB ;
Cooper, GM ;
Kim, MF ;
Davydov, E ;
Green, ED ;
Sidow, A ;
Batzoglou, S .
GENOME RESEARCH, 2003, 13 (04) :721-731
[5]   Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes [J].
Cooper, GM ;
Brudno, M ;
Green, ED ;
Batzoglou, S ;
Sidow, A .
GENOME RESEARCH, 2003, 13 (05) :813-820
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]   Numerous potentially functional but non-genic conserved sequences on human chromosome 21 [J].
Dermitzakis, ET ;
Reymond, A ;
Lyle, R ;
Scamuffa, N ;
Ucla, C ;
Deutsch, S ;
Stevenson, BJ ;
Flegel, V ;
Bucher, P ;
Jongeneel, CV ;
Antonarakis, SE .
NATURE, 2002, 420 (6915) :578-582
[8]   A model of the statistical power of comparative genome sequence analysis [J].
Eddy, SR .
PLOS BIOLOGY, 2005, 3 (01) :95-102
[9]  
Efron B., 1994, INTRO BOOTSTRAP, DOI DOI 10.1007/978-1-4899-4541-9
[10]   A hidden Markov Model approach to variation among sites in rate of evolution [J].
Felsenstein, J ;
Churchill, GA .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (01) :93-104