Swarm v2: highly-scalable and high-resolution amplicon clustering

被引:422
作者
Mahe, Frederic [1 ]
Rognes, Torbjorn [2 ,3 ]
Quince, Christopher [4 ]
de Vargas, Colomban [5 ,6 ]
Dunthorn, Micah [1 ]
机构
[1] Tech Univ Kaiserslautern, Dept Ecol, Kaiserslautern, Germany
[2] Univ Oslo, Dept Informat, N-0316 Oslo, Norway
[3] Natl Hosp Norway, Oslo Univ Hosp, Dept Microbiol, Oslo, Norway
[4] Univ Warwick, Warwick Med Sch, Warwick, England
[5] CNRS, Stn Biol Roscoff, EPEP Evolut Protistes & Ecosyst Pelag, UMR 7144, Roscoff, France
[6] Univ Paris 06, Sorbonne Univ, Stn Biol Roscoff UMR7144, Roscoff, France
基金
英国工程与自然科学研究理事会;
关键词
Environmental diversity; Barcoding; Molecular operational taxonomic units; OPERATIONAL TAXONOMIC UNITS; CILIATE ENVIRONMENTAL DIVERSITY; SEQUENCING DATA; RARE BIOSPHERE; COMMUNITIES; WRINKLES; ACCURATE; REGIONS;
D O I
10.7717/peerj.1420
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
070301 [无机化学]; 070403 [天体物理学]; 070507 [自然资源与国土空间规划学]; 090105 [作物生产系统与生态工程];
摘要
Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.
引用
收藏
页数:12
相关论文
共 24 条
[1]
Depicting more accurate pictures of protistan community complexity using pyrosequencing of hypervariable SSU rRNA gene regions [J].
Behnke, Anke ;
Engel, Matthias ;
Christen, Richard ;
Nebel, Markus ;
Klein, Rolf R. ;
Stoeck, Thorsten .
ENVIRONMENTAL MICROBIOLOGY, 2011, 13 (02) :340-349
[2]
Divergence thresholds and divergent biodiversity estimates: can metabarcoding reliably describe zooplankton communities? [J].
Brown, Emily A. ;
Chain, Frederic J. J. ;
Crease, Teresa J. ;
MacIsaac, Hugh J. ;
Cristescu, Melania E. .
ECOLOGY AND EVOLUTION, 2015, 5 (11) :2234-2251
[3]
QIIME allows analysis of high-throughput community sequencing data [J].
Caporaso, J. Gregory ;
Kuczynski, Justin ;
Stombaugh, Jesse ;
Bittinger, Kyle ;
Bushman, Frederic D. ;
Costello, Elizabeth K. ;
Fierer, Noah ;
Pena, Antonio Gonzalez ;
Goodrich, Julia K. ;
Gordon, Jeffrey I. ;
Huttley, Gavin A. ;
Kelley, Scott T. ;
Knights, Dan ;
Koenig, Jeremy E. ;
Ley, Ruth E. ;
Lozupone, Catherine A. ;
McDonald, Daniel ;
Muegge, Brian D. ;
Pirrung, Meg ;
Reeder, Jens ;
Sevinsky, Joel R. ;
Tumbaugh, Peter J. ;
Walters, William A. ;
Widmann, Jeremy ;
Yatsunenko, Tanya ;
Zaneveld, Jesse ;
Knight, Rob .
NATURE METHODS, 2010, 7 (05) :335-336
[4]
Defining DNA-Based Operational Taxonomic Units for Microbial-Eukaryote Ecology [J].
Caron, David A. ;
Countway, Peter D. ;
Savai, Pratik ;
Gast, Rebecca J. ;
Schnetzer, Astrid ;
Moorthi, Stefanie D. ;
Dennett, Mark R. ;
Moran, Dawn M. ;
Jones, Adriane C. .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2009, 75 (18) :5797-5808
[5]
Eukaryotic plankton diversity in the sunlit ocean [J].
de Vargas, Colomban ;
Audic, Stephane ;
Henry, Nicolas ;
Decelle, Johan ;
Mahe, Frederic ;
Logares, Ramiro ;
Lara, Enrique ;
Berney, Cedric ;
Le Bescot, Noan ;
Probert, Ian ;
Carmichael, Margaux ;
Poulain, Julie ;
Romac, Sarah ;
Colin, Sebastien ;
Aury, Jean-Marc ;
Bittner, Lucie ;
Chaffron, Samuel ;
Dunthorn, Micah ;
Engelen, Stefan ;
Flegontova, Olga ;
Guidi, Lionel ;
Horak, Ales ;
Jaillon, Olivier ;
Lima-Mendez, Gipsi ;
Lukes, Julius ;
Malviya, Shruti ;
Morard, Raphael ;
Mulot, Matthieu ;
Scalco, Eleonora ;
Siano, Raffaele ;
Vincent, Flora ;
Zingone, Adriana ;
Dimier, Celine ;
Picheral, Marc ;
Searson, Sarah ;
Kandels-Lewis, Stefanie ;
Acinas, Silvia G. ;
Bork, Peer ;
Bowler, Chris ;
Gorsky, Gabriel ;
Grimsley, Nigel ;
Hingamp, Pascal ;
Iudicone, Daniele ;
Not, Fabrice ;
Ogata, Hiroyuki ;
Pesant, Stephane ;
Raes, Jeroen ;
Sieracki, Michael E. ;
Speich, Sabrina ;
Stemmann, Lars .
SCIENCE, 2015, 348 (6237)
[6]
Comparing the Hyper-Variable V4 and V9 Regions of the Small Subunit rDNA for Assessment of Ciliate Environmental Diversity [J].
Dunthorn, Micah ;
Klier, Julia ;
Bunge, John ;
Stoeck, Thorsten .
JOURNAL OF EUKARYOTIC MICROBIOLOGY, 2012, 59 (02) :185-187
[7]
Search and clustering orders of magnitude faster than BLAST [J].
Edgar, Robert C. .
BIOINFORMATICS, 2010, 26 (19) :2460-2461
[8]
Deep sequencing uncovers protistan plankton diversity in the Portuguese Ria Formosa solar saltern ponds [J].
Filker, Sabine ;
Gimmler, Anna ;
Dunthorn, Micah ;
Mahe, Frederic ;
Stoeck, Thorsten .
EXTREMOPHILES, 2015, 19 (02) :283-295
[9]
CD-HIT: accelerated for clustering the next-generation sequencing data [J].
Fu, Limin ;
Niu, Beifang ;
Zhu, Zhengwei ;
Wu, Sitao ;
Li, Weizhong .
BIOINFORMATICS, 2012, 28 (23) :3150-3152
[10]
DNACLUST: accurate and efficient clustering of phylogenetic marker genes [J].
Ghodsi, Mohammadreza ;
Liu, Bo ;
Pop, Mihai .
BMC BIOINFORMATICS, 2011, 12