A comparative evaluation of sequence classification programs

被引:70
作者
Bazinet, Adam L. [1 ]
Cummings, Michael P. [1 ]
机构
[1] Univ Maryland, Lab Mol Evolut, Ctr Bioinformat & Computat Biol, College Pk, MD 20874 USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
关键词
PHYLOGENETIC CLASSIFICATION; TAXONOMIC CLASSIFICATION; DNA-SEQUENCES; ASSIGNMENT; PLACEMENT; ALGORITHM; DATABASE; READS; TOOLS;
D O I
10.1186/1471-2105-13-92
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for 'barcoding genes' like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis. Results: We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. Conclusions: We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs.
引用
收藏
页数:13
相关论文
共 46 条
[1]  
[Anonymous], 2011, R: A Language and Environment for Statistical Computing
[2]  
[Anonymous], BMC BIOINF
[3]  
[Anonymous], NUCL ACIDS RES
[4]  
[Anonymous], NUCL ACIDS RES
[5]  
[Anonymous], 2005, PHYLIP (phylogeny inference package) version 3.6
[6]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[7]  
Benson DA, 2013, NUCLEIC ACIDS RES, V41, pD36, DOI [10.1093/nar/gkn723, 10.1093/nar/gkp1024, 10.1093/nar/gkw1070, 10.1093/nar/gkr1202, 10.1093/nar/gkx1094, 10.1093/nar/gkl986, 10.1093/nar/gkq1079, 10.1093/nar/gks1195, 10.1093/nar/gkg057]
[8]   Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood [J].
Berger, Simon A. ;
Krompass, Denis ;
Stamatakis, Alexandros .
SYSTEMATIC BIOLOGY, 2011, 60 (03) :291-302
[9]  
Brady A, 2009, NAT METHODS, V6, P673, DOI [10.1038/nmeth.1358, 10.1038/NMETH.1358]
[10]  
Chatterji S, 2008, LECT N BIOINFORMAT, V4955, P17