The Parsimony Ratchet, a new method for rapid parsimony analysis

被引:1488
作者
Nixon, KC [1 ]
机构
[1] Cornell Univ, Dept Plant Pathol, LH Bailey Hortorium, Ithaca, NY 14853 USA
来源
CLADISTICS-THE INTERNATIONAL JOURNAL OF THE WILLI HENNIG SOCIETY | 1999年 / 15卷 / 04期
关键词
D O I
10.1111/j.1096-0031.1999.tb00277.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The Parsimony Ratchet(1) is presented as a new method for analysis of large data sets. The method can be easily implemented with existing phylogenetic software by generating batch command files. Such an approach has been implemented in the programs DADA (Nixon, 1998) and Winclada (Nixon, 1999). The Parsimony Ratchet has also been implemented in the most recent versions of NONA (Goloboff, 1998). These implementations of the ratchet use the following steps: (1) Generate a starting tree (e.g., a "Wagner" tree followed by some level of branch swapping or not). (2) Randomly select a subset of characters, each of which is given additional weight (e.g., add 1 to the weight of each selected character). (3) Perform branch swapping (e.g., "branch-breaking" or TBR) on the current tree using the reweighted matrix, keeping only one (or few) tree. (4) Set all weights for the characters to the "original" weights (typically, equal weights). (5) Perform branch swapping (e.g., branch-breaking or TBR) on the current tree (from step 3) holding one (or few) tree. (6) Return to step 2. Steps 2-6 are considered to be one iteration, and typically, 50-200 or more iterations are performed. The number of characters to be sampled for reweighting in step 2 is determined by the user; I have found that between 5 and 25% of the characters provide good results in most cases. The performance of the ratchet for large data sets is outstanding, and the results of analyses of the 500 taxon seed plant rbcL data set (Chase et al., 1993) are presented here. A separate analysis of a three-gene data set for 567 taxa will be presented elsewhere (Soltis et al., in preparation) demonstrating the same extraordinary power. With the 500-taxon data set, shortest frees are typically found within 22 h (four runs of 200 iterations) on a 200-MHz Pentium Pro. These analyses indicate efficiency increases of 20x-80x over "traditional methods" such as varying taxon order randomly and holding few trees, followed by more complete analyses of the best trees found, and thousands of times faster than nonstrategic searches with PAUP. Because the ratchet samples many tree islands with fewer trees from each island, it provides much more accurate estimates of the "true" consensus than collecting many trees from few islands. With the ratchet, Goloboff's NONA, and existing computer hardware, data sets that were previously intractable or required months or years of analysis with PAUP* can now be adequately analyzed in a few hours or days. (C) 1999 The Willi Hennig Society.
引用
收藏
页码:407 / 414
页数:8
相关论文
共 9 条
[1]   PHYLOGENETICS OF SEED PLANTS - AN ANALYSIS OF NUCLEOTIDE-SEQUENCES FROM THE PLASTID GENE RBCL [J].
CHASE, MW ;
SOLTIS, DE ;
OLMSTEAD, RG ;
MORGAN, D ;
LES, DH ;
MISHLER, BD ;
DUVALL, MR ;
PRICE, RA ;
HILLS, HG ;
QIU, YL ;
KRON, KA ;
RETTIG, JH ;
CONTI, E ;
PALMER, JD ;
MANHART, JR ;
SYTSMA, KJ ;
MICHAELS, HJ ;
KRESS, WJ ;
KAROL, KG ;
CLARK, WD ;
HEDREN, M ;
GAUT, BS ;
JANSEN, RK ;
KIM, KJ ;
WIMPEE, CF ;
SMITH, JF ;
FURNIER, GR ;
STRAUSS, SH ;
XIANG, QY ;
PLUNKETT, GM ;
SOLTIS, PS ;
SWENSEN, SM ;
WILLIAMS, SE ;
GADEK, PA ;
QUINN, CJ ;
EGUIARTE, LE ;
GOLENBERG, E ;
LEARN, GH ;
GRAHAM, SW ;
BARRETT, SCH ;
DAYANANDAN, S ;
ALBERT, VA .
ANNALS OF THE MISSOURI BOTANICAL GARDEN, 1993, 80 (03) :528-580
[2]  
FARRIS SJ, 1988, HENNIG86 SOFTWARE MA
[3]  
GOLOBOFF P, 1998, NONA COMPUTER PROGRA
[4]   A report on "One Day Symposium on Numerical Cladistics" [J].
Horovitz, I .
CLADISTICS-THE INTERNATIONAL JOURNAL OF THE WILLI HENNIG SOCIETY, 1999, 15 (02) :177-182
[5]   THE DISCOVERY AND IMPORTANCE OF MULTIPLE ISLANDS OF MOST-PARSIMONIOUS TREES [J].
MADDISON, DR .
SYSTEMATIC ZOOLOGY, 1991, 40 (03) :315-328
[6]  
Nixon K.C., 1999, WINCLADA BETA VER 09
[7]  
NIXON KC, 1998, DADA VER 1 9 SOFTWAR
[8]   Analyzing large data sets: rbcL 500 revisited [J].
Rice, KA ;
Donoghue, MJ ;
Olmstead, RG .
SYSTEMATIC BIOLOGY, 1997, 46 (03) :554-563
[9]  
SWOFFORD DL, 1990, PAUP PHYLOGENETIC AN