Scaling up accurate phylogenetic reconstruction from gene-order data

被引:40
作者
Tang, Jijun [1 ]
Moret, Bernard M. E. [1 ]
机构
[1] Univ New Mexico, Dept Comp Sci, Albuquerque, NM 87131 USA
关键词
D O I
10.1093/bioinformatics/btg1042
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Phylogenetic reconstruction from gene-order data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distance-based methods (such as neighbor-joining), parsimony methods using sequence-based encodings, Bayesian approaches, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but cannot handle more than about 15 genomes of limited size (e.g. organelles). Results: We report here on our successful efforts to scale up direct optimization through a two-step approach: the first step decomposes the dataset into smaller pieces and runs the direct optimization (GRAPPA) on the smaller pieces, while the second step builds a tree from the results obtained on the smaller pieces. We used the sophisticated disk-covering method (DCM) pioneered by Warnow and her group, suitably modified to take into account the computational limitations of GRAPPA. We find that DCM-GRAPPA scales gracefully to at least 1000 genomes of a few hundred genes each and retains surprisingly high accuracy throughout the range: in our experiments, the topological error rate rarely exceeded a few percent. Thus, reconstruction based on gene-order data can now be accomplished with high accuracy on datasets of significant size.
引用
收藏
页码:i305 / i312
页数:8
相关论文
共 39 条
[1]   Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today [J].
Aldous, DJ .
STATISTICAL SCIENCE, 2001, 16 (01) :23-34
[2]   A linear-time algorithm for computing inversion distance between signed permutations with an experimental study [J].
Bader, DA ;
Moret, BME ;
Yan, M .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (05) :483-491
[3]  
Blanchette, 1997, Genome Inform Ser Workshop Genome Inform, V8, P25
[4]  
Bourque G, 2002, GENOME RES, V12, P26
[5]  
CAPRARA A, 2001, LECT NOTES COMPUTER, V2149, P238
[6]  
Caprara A, 1999, P 3 ANN INT C COMP M, P84
[7]  
Cosner M. E., 2000, COMP GENOMICS, P99
[8]  
Downie SR., 1992, Molecular Systematics of Plants, P14, DOI DOI 10.1007/978-1-4615-3276-7_2
[9]  
HUSON D, 1999, ACM J EXP ALGORITHMI, V4
[10]  
Huson D H, 1999, Proc Int Conf Intell Syst Mol Biol, P118