V-REVCOMP: automated high-throughput detection of reverse complementary 16S rRNA gene sequences in large environmental and taxonomic datasets

被引:10
作者
Hartmann, Martin [1 ]
Howes, Charles G. [1 ]
Veldre, Vilmar [2 ]
Schneider, Salome [3 ]
Vaishampayan, Parag A. [4 ]
Yannarell, Anthony C. [5 ]
Quince, Christopher [6 ]
Johansson, Per [7 ]
Bjorkroth, K. Johanna [7 ]
Abarenkov, Kessy [2 ]
Hallam, Steven J. [1 ,8 ]
Mohn, William W. [1 ]
Nilsson, R. Henrik [2 ,9 ]
机构
[1] Univ British Columbia, Dept Microbiol & Immunol, Vancouver, BC V6T 1Z3, Canada
[2] Univ Tartu, Inst Ecol & Earth Sci, Dept Bot, EE-50090 Tartu, Estonia
[3] Agroscope Reckenholz Tanikon Res Stn ART, Zurich, Switzerland
[4] CALTECH, Jet Prop Lab, Biotechnol & Planetary Protect Grp, Pasadena, CA USA
[5] Univ Illinois, Dept Nat Resources & Environm Sci, Urbana, IL 61801 USA
[6] Univ Glasgow, Dept Civil Engn, Glasgow G12 8QQ, Lanark, Scotland
[7] Univ Helsinki, Dept Food & Environm Hyg, Helsinki, Finland
[8] Univ British Columbia, Grad Program Bioinformat, Vancouver, BC V6T 1Z3, Canada
[9] Univ Gothenburg, Dept Plant & Environm Sci, Gothenburg, Sweden
基金
英国工程与自然科学研究理事会;
关键词
software; SSU rRNA gene; 16S sequence; reverse complementary; Hidden Markov Models; hmmer; DATABASE; TOOLS;
D O I
10.1111/j.1574-6968.2011.02274.x
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Reverse complementary DNA sequences - sequences that are inadvertently given backwards with all purines and pyrimidines transposed - can affect sequence analysis detrimentally unless taken into account. We present an open-source, high-throughput software tool - v-revcomp (http://www.cmde.science.ubc.ca/mohn/software.html) - to detect and reorient reverse complementary entries of the small-subunit rRNA (16S) gene from sequencing datasets, particularly from environmental sources. The software supports sequence lengths ranging from full length down to the short reads that are characteristic of next-generation sequencing technologies. We evaluated the reliability of v-revcomp by screening all 406 781 16S sequences deposited in release 102 of the curated SILVA database and demonstrated that the tool has a detection accuracy of virtually 100%. We subsequently used v-revcomp to analyse 1 171 646 16S sequences deposited in the International Nucleotide Sequence Databases and found that about 1% of these user-submitted sequences were reverse complementary. In addition, a nontrivial proportion of the entries were otherwise anomalous, including reverse complementary chimeras, sequences associated with wrong taxa, nonribosomal genes, sequences of poor quality or otherwise erroneous sequences without a reasonable match to any other entry in the database. Thus, v-revcomp is highly efficient in detecting and reorienting reverse complementary 16S sequences of almost any length and can be used to detect various sequence anomalies.
引用
收藏
页码:140 / 145
页数:6
相关论文
共 29 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], NUCLEIC ACIDS RES S1
[3]   At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies [J].
Ashelford, KE ;
Chuzhanova, NA ;
Fry, JC ;
Jones, AJ ;
Weightman, AJ .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2005, 71 (12) :7724-7736
[4]   New screening software shows that most recent large 16S rRNA gene clone libraries contain chimeras [J].
Ashelford, Kevin E. ;
Chuzhanova, Nadia A. ;
Fry, John C. ;
Jones, Antonia J. ;
Weightman, Andrew J. .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2006, 72 (09) :5734-5741
[5]   EvoPipes.net: Bioinformatic Tools for Ecological and Evolutionary Genomics [J].
Barker, Michael S. ;
Dlugosch, Katrina M. ;
Dinh, Louie ;
Challa, R. Sashikiran ;
Kane, Nolan C. ;
King, Matthew G. ;
Rieseberg, Loren H. .
EVOLUTIONARY BIOINFORMATICS, 2010, 6 :143-149
[6]  
Benson DA, 2013, NUCLEIC ACIDS RES, V41, pD36, DOI [10.1093/nar/gkn723, 10.1093/nar/gkp1024, 10.1093/nar/gkw1070, 10.1093/nar/gkr1202, 10.1093/nar/gkx1094, 10.1093/nar/gkl986, 10.1093/nar/gkq1079, 10.1093/nar/gks1195, 10.1093/nar/gkg057]
[7]  
Bidartondo MI, 2008, SCIENCE, V319, P1616, DOI 10.1126/science.319.5870.1616a
[8]   On the unreliability of published DNA sequences [J].
Bridge, PD ;
Roberts, PJ ;
Spooner, BM ;
Panchal, G .
NEW PHYTOLOGIST, 2003, 160 (01) :43-48
[9]   QIIME allows analysis of high-throughput community sequencing data [J].
Caporaso, J. Gregory ;
Kuczynski, Justin ;
Stombaugh, Jesse ;
Bittinger, Kyle ;
Bushman, Frederic D. ;
Costello, Elizabeth K. ;
Fierer, Noah ;
Pena, Antonio Gonzalez ;
Goodrich, Julia K. ;
Gordon, Jeffrey I. ;
Huttley, Gavin A. ;
Kelley, Scott T. ;
Knights, Dan ;
Koenig, Jeremy E. ;
Ley, Ruth E. ;
Lozupone, Catherine A. ;
McDonald, Daniel ;
Muegge, Brian D. ;
Pirrung, Meg ;
Reeder, Jens ;
Sevinsky, Joel R. ;
Tumbaugh, Peter J. ;
Walters, William A. ;
Widmann, Jeremy ;
Yatsunenko, Tanya ;
Zaneveld, Jesse ;
Knight, Rob .
NATURE METHODS, 2010, 7 (05) :335-336
[10]   Global Sequencing: A Review of Current Molecular Data and New Methods Available to Assess Microbial Diversity [J].
Christen, Richard .
MICROBES AND ENVIRONMENTS, 2008, 23 (04) :253-268