A comparison of algorithms for the identification of specimens using DNA barcodes: examples from gymnosperms

被引:132
作者
Little, Damon P. [1 ]
Stevenson, Dennis Wm. [1 ]
机构
[1] New York Bot Garden, Lewis B & Dorothy Cullman Program Mol Systemat St, Bronx, NY 10458 USA
关键词
D O I
10.1111/j.1096-0031.2006.00126.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In order to use DNA sequences for specimen identification (e.g., barcoding, fingerprinting) an algorithm to compare query sequences with a reference database is needed. Precision and accuracy of query sequence identification was estimated for hierarchical clustering (parsimony and neighbor joining), similarity methods (BLAST, BLAT and megaBLAST), combined clustering/similarity methods (BLAST/parsimony and BLAST/neighbor joining), diagnostic methods (DNA-BAR and DOME ID), and a new method (ATIM). We offer two novel alignment-free algorithmic solutions (DOME ID and ATIM) to identify query sequences for the purposes of DNA barcoding. Publicly available gymnosperm nrITS 2 and plastid matK sequences were used as test data sets. On the test data sets, almost all of the methods were able to accurately identify sequences to genus; however, no method was able to accurately identify query sequences to species at a frequency that would be considered useful for routine specimen identification (42-71% unambiguously correct). Clustering methods performed the worst (perhaps due to alignment issues). Similarity methods, ATIM, DNA-BAR, and DOME ID all performed at approximately the same level. Given the relative precision of the algorithms (median = 67% unambiguous), the low accuracy of species-level identification observed could be ascribed to the lack of correspondence between patterns of allelic similarity and species delimitations. Application of DNA barcoding to sequences of CITES listed cycads (Cycadopsida) provides an example of the potential application of DNA barcoding to enforcement of conservation laws. (c) The Willi Hennig Society 2006.
引用
收藏
页码:1 / 21
页数:21
相关论文
共 92 条
[1]   Comparative accuracy of methods for protein sequence similarity search [J].
Agarwal, P ;
States, DJ .
BIOINFORMATICS, 1998, 14 (01) :40-47
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]   Rediscovery of Roosevelt's barking deer (Muntiacus rooseveltorum) [J].
Amato, G ;
Egan, MG ;
Schaller, GB ;
Baker, RH ;
Rosenbaum, HC ;
Robichaud, WG ;
DeSalle, R .
JOURNAL OF MAMMALOGY, 1999, 80 (02) :639-643
[5]   Searching DNA databases for similarities to DNA sequences: when is a match significant? [J].
Anderson, I ;
Brass, A .
BIOINFORMATICS, 1998, 14 (04) :349-356
[6]   PAQ: Partition analysis of quasispecies [J].
Baccam, P ;
Thompson, RJ ;
Fedrigo, O ;
Carpenter, S ;
Cornette, JL .
BIOINFORMATICS, 2001, 17 (01) :16-22
[7]   Biological identifications of mayflies (Ephemeroptera) using DNA barcodes [J].
Ball, SL ;
Hebert, PDN ;
Burian, SK ;
Webb, JM .
JOURNAL OF THE NORTH AMERICAN BENTHOLOGICAL SOCIETY, 2005, 24 (03) :508-524
[8]   Identifying spiders through DNA barcodes [J].
Barrett, RDH ;
Hebert, PDN .
CANADIAN JOURNAL OF ZOOLOGY, 2005, 83 (03) :481-491
[9]   Defining operational taxonomic units using DNA barcode data [J].
Blaxter, M ;
Mann, J ;
Chapman, T ;
Thomas, F ;
Whitton, C ;
Floyd, R ;
Abebe, E .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2005, 360 (1462) :1935-1943
[10]   The promise of a DNA taxonomy [J].
Blaxter, ML .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2004, 359 (1444) :669-679