Rapid DNA barcoding analysis of large datasets using the composition vector method

被引:25
作者
Chu, Ka Hou [1 ,2 ]
Xu, Minli [2 ,3 ]
Li, Chi Pang [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Biol, Hong Kong, Hong Kong, Peoples R China
[2] Chinese Univ Hong Kong, Mol Biotechnol Programme, Hong Kong, Hong Kong, Peoples R China
[3] Univ N Carolina, Dept Bioinformat & Genom, Charlotte, NC 28223 USA
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
MULTIPLE SEQUENCE ALIGNMENT; MOLECULAR BARCODES; PHYLOGENY;
D O I
10.1186/1471-2105-10-S14-S8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Sequence alignment is the rate-limiting step in constructing profile trees for DNA barcoding purposes. We recently demonstrated the feasibility of using unaligned rRNA sequences as barcodes based on a composition vector (CV) approach without sequence alignment (Bioinformatics 22: 1690). Here, we further explored the grouping effectiveness of the CV method in large DNA barcode datasets (COI, 18S and 16S rRNA) from a variety of organisms, including birds, fishes, nematodes and crustaceans. Results: Our results indicate that the grouping of taxa at the genus/species levels based on the CV/NJ approach is invariably consistent with the trees generated by traditional approaches, although in some cases the clustering among higher groups might differ. Furthermore, the CV method is always much faster than the K2P method routinely used in constructing profile trees for DNA barcoding. For instance, the alignment of 754 COI sequences (average length 649 bp) from fishes took more than ten hours to complete, while the whole tree construction process using the CV/NJ method required no more than five minutes on the same computer. Conclusion: The CV method performs well in grouping effectiveness of DNA barcode sequences, as compared to K2P analysis of aligned sequences. It was also able to reduce the time required for analysis by over 15-fold, making it a far superior method for analyzing large datasets. We conclude that the CV method is a fast and reliable method for analyzing large datasets for DNA barcoding purposes.
引用
收藏
页数:9
相关论文
共 32 条
[1]   Ribosomal RNA as molecular barcodes: a simple correlation analysis without sequence alignment [J].
Chu, K. H. ;
Li, C. P. ;
Qi, J. .
BIOINFORMATICS, 2006, 22 (14) :1690-1701
[2]   Origin and phylogeny of chloroplasts revealed by a simple correlation analysis of complete genomes [J].
Chu, KH ;
Qi, J ;
Yu, ZG ;
Anh, V .
MOLECULAR BIOLOGY AND EVOLUTION, 2004, 21 (01) :200-206
[3]   What is dynamic programming? [J].
Eddy, SR .
NATURE BIOTECHNOLOGY, 2004, 22 (07) :909-910
[4]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[5]  
FELSENSTEIN J, 1989, CLADISTICS, V5, P166
[6]   Molecular barcodes for soil nematode identification [J].
Floyd, R ;
Abebe, E ;
Papert, A ;
Blaxter, M .
MOLECULAR ECOLOGY, 2002, 11 (04) :839-850
[7]   Four years of DNA barcoding: Current advances and prospects [J].
Frezal, Lise ;
Leblois, Raphael .
INFECTION GENETICS AND EVOLUTION, 2008, 8 (05) :727-736
[8]   Stretch coding and block coding: Two new strategies to represent questionably aligned DNA sequences [J].
Geiger, DL .
JOURNAL OF MOLECULAR EVOLUTION, 2002, 54 (02) :191-199
[9]   Critical factors for assembling a high volume of DNA barcodes [J].
Hajibabaei, M ;
DeWaard, JR ;
Ivanova, NV ;
Ratnasingham, S ;
Dooh, RT ;
Kirk, SL ;
Mackie, PM ;
Hebert, PDN .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2005, 360 (1462) :1959-1967
[10]   Identification of birds through DNA barcodes [J].
Hebert, PDN ;
Stoeckle, MY ;
Zemlak, TS ;
Francis, CM .
PLOS BIOLOGY, 2004, 2 (10) :1657-1663