FastGroup: A program to dereplicate libraries of 16S rDNA sequences

被引:38
作者
Seguritan, Victor [1 ]
Rohwer, Forest [2 ]
机构
[1] Univ San Diego, Dept Computat Sci San Diego State, San Diego, CA 92182 USA
[2] Univ San Diego, Dept Biol San Diego State, San Diego, CA 92182 USA
关键词
Query Sequence; Neighbor Join; Ambiguous Basis; Percent Sequence Identity; Microbial Biogeography;
D O I
10.1186/1471-2105-2-9
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Ribosomal 16S DNA sequences are an essential tool for identifying and classifying microbes. High-throughput DNA sequencing now makes it economically possible to produce very large datasets of 16S rDNA sequences in short time periods, necessitating new computer tools for analyses. Here we describe FastGroup, a Java program designed to dereplicate libraries of 16S rDNA sequences. By dereplication we mean to: 1) compare all the sequences in a data set to each other, 2) group similar sequences together, and 3) output a representative sequence from each group. In this way, duplicate sequences are removed from a library. Results: FastGroup was tested using a library of single-pass, bacterial 16S rDNA sequences cloned from coral-associated bacteria. We found that the optimal strategy for dereplicating these sequences was to: 1) trim ambiguous bases from the 5' end of the sequences and all sequence 3' of the conserved Bact517 site, 2) match the sequences from the 3' end, and 3) group sequences >=97% identical to each other. Conclusions: The FastGroup program simplifies the dereplication of 16S rDNA sequence libraries and prepares the raw sequences for subsequent analyses.
引用
收藏
页数:8
相关论文
共 11 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   GENETIC DIVERSITY IN SARGASSO SEA BACTERIOPLANKTON [J].
GIOVANNONI, SJ ;
BRITSCHGI, TB ;
MOYER, CL ;
FIELD, KG .
NATURE, 1990, 345 (6270) :60-63
[3]   THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS [J].
GOOD, IJ .
BIOMETRIKA, 1953, 40 (3-4) :237-264
[4]  
Gusfield D, 1997, ALGORITHMS STRINGS T
[5]   Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity [J].
Hugenholtz, P ;
Goebel, BM ;
Pace, NR .
JOURNAL OF BACTERIOLOGY, 1998, 180 (18) :4765-4774
[6]  
Moeseneder MM, 1999, APPL ENVIRON MICROB, V65, P3518
[7]   Application of denaturing gradient gel electrophoresis (DGGE) and temperature gradient gel electrophoresis (TGGE) in microbial ecology [J].
Muyzer, G ;
Smalla, K .
ANTONIE VAN LEEUWENHOEK INTERNATIONAL JOURNAL OF GENERAL AND MOLECULAR MICROBIOLOGY, 1998, 73 (01) :127-141
[8]  
Rohwer F, 2001, CORAL REEFS, V20, P85
[9]   A PLACE FOR DNA-DNA REASSOCIATION AND 16S RIBOSOMAL-RNA SEQUENCE-ANALYSIS IN THE PRESENT SPECIES DEFINITION IN BACTERIOLOGY [J].
STACKEBRANDT, E ;
GOEBEL, BM .
INTERNATIONAL JOURNAL OF SYSTEMATIC BACTERIOLOGY, 1994, 44 (04) :846-849
[10]   The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools [J].
Thompson, JD ;
Gibson, TJ ;
Plewniak, F ;
Jeanmougin, F ;
Higgins, DG .
NUCLEIC ACIDS RESEARCH, 1997, 25 (24) :4876-4882