Phylogenetic supermatrix analysis of GenBank sequences from 2228 papilionoid legumes

被引:132
作者
McMahon, Michelle M. [1 ]
Sanderson, Michael J. [1 ]
机构
[1] Univ Calif Davis, Sect Evolut & Ecol, Davis, CA 95616 USA
基金
美国国家科学基金会;
关键词
D O I
10.1080/10635150600999150
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A comprehensive phylogeny of papilionoid legumes was inferred from sequences of 2228 taxa in GenBank release 147. A semiautomated analysis pipeline was constructed to download, parse, assemble, align, combine, and build trees from a pool of 11,881 sequences. Initial steps included all-against-all BLAST similarity searches coupled with assembly, using a novel strategy for building length-homogeneous primary sequence clusters. This was followed by a combination of global and local alignment protocols to build larger secondary clusters of locally aligned sequences, thus taking into account the dramatic differences in length of the heterogeneous coding and noncoding sequence data present in GenBank. Next, clusters were checked for the presence of duplicate genes and other potentially misleading sequences and examined for combinability with other clusters on the basis of taxon overlap. Finally, two supermatrices were constructed: a "sparse" matrix based on the primary clusters alone ( 1794 taxa x 53,977 characters), and a somewhat more "dense" matrix based on the secondary clusters ( 2228 taxa x 33,168 characters). Both matrices were very sparse, with 95% of their cells containing gaps or question marks. These were subjected to extensive heuristic parsimony analyses using deterministic and stochastic heuristics, including bootstrap analyses. A "reduced consensus" bootstrap analysis was also performed to detect cryptic signal in a subtree of the data set corresponding to a "backbone" phylogeny proposed in previous studies. Overall, the dense supermatrix appeared to provide much more satisfying results, indicated by better resolution of the bootstrap tree, excellent agreement with the backbone papilionoid tree in the reduced bootstrap consensus analysis, few problematic large polytomies in the strict consensus, and less fragmentation of conventionally recognized genera. Nevertheless, at lower taxonomic levels several problems were identified and diagnosed. A large number of methodological issues in supermatrix construction at this scale are discussed, including detection of annotation errors in GenBank sequences; the shortage of effective algorithms and software for local multiple sequence alignment; the difficulty of overcoming effects of fragmentation of data into nearly disjoint blocks in sparse supermatrices; and the lack of informative tools to assess confidence limits in very large trees.
引用
收藏
页码:818 / 836
页数:19
相关论文
共 105 条
[1]  
Allan G.J., 2003, Advances in Legume Systematics, P371
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]  
ANE C, 2006, 1123 U WISC DEP STAT, P1
[4]   Analysis of the genome sequence of the flowering plant Arabidopsis thaliana [J].
Kaul, S ;
Koo, HL ;
Jenkins, J ;
Rizzo, M ;
Rooney, T ;
Tallon, LJ ;
Feldblyum, T ;
Nierman, W ;
Benito, MI ;
Lin, XY ;
Town, CD ;
Venter, JC ;
Fraser, CM ;
Tabata, S ;
Nakamura, Y ;
Kaneko, T ;
Sato, S ;
Asamizu, E ;
Kato, T ;
Kotani, H ;
Sasamoto, S ;
Ecker, JR ;
Theologis, A ;
Federspiel, NA ;
Palm, CJ ;
Osborne, BI ;
Shinn, P ;
Conway, AB ;
Vysotskaia, VS ;
Dewar, K ;
Conn, L ;
Lenz, CA ;
Kim, CJ ;
Hansen, NF ;
Liu, SX ;
Buehler, E ;
Altafi, H ;
Sakano, H ;
Dunn, P ;
Lam, B ;
Pham, PK ;
Chao, Q ;
Nguyen, M ;
Yu, GX ;
Chen, HM ;
Southwick, A ;
Lee, JM ;
Miranda, M ;
Toriumi, MJ ;
Davis, RW .
NATURE, 2000, 408 (6814) :796-815
[5]   BlastAlign:: a program that uses blast to align problematic nucleotide sequences [J].
Belshaw, R ;
Katzourakis, A .
BIOINFORMATICS, 2005, 21 (01) :122-123
[6]  
Bininda-Emonds O., 2004, PHYLOGENETIC SUPERTR
[7]   Building large trees by combining phylogenetic information: a complete phylogeny of the extant Carnivora (Mammalia) [J].
Bininda-Emonds, ORP ;
Gittleman, JL ;
Purvis, A .
BIOLOGICAL REVIEWS, 1999, 74 (02) :143-175
[8]   Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis [J].
Castresana, J .
MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (04) :540-552
[9]   Toward automatic reconstruction of a highly resolved tree of life [J].
Ciccarelli, FD ;
Doerks, T ;
von Mering, C ;
Creevey, CJ ;
Snel, B ;
Bork, P .
SCIENCE, 2006, 311 (5765) :1283-1287
[10]  
Crisp MD, 2003, ADV LEGUME SYSTEMATI, P253