At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies

被引:621
作者
Ashelford, KE
Chuzhanova, NA
Fry, JC
Jones, AJ
Weightman, AJ
机构
[1] Univ Cardiff Wales, Cardiff Sch Biosci, Cardiff CF10 3TL, S Glam, Wales
[2] Univ Cardiff Wales, Cardiff Sch Comp Sci, Cardiff CF24 3AA, S Glam, Wales
[3] Univ Cardiff Wales, Cardiff Sch Med, Biostat & Bioinformat Unit, Cardiff CF14 4XN, S Glam, Wales
[4] Univ Cardiff Wales, Cardiff Sch Med, Inst Med Genet, Cardiff CF14 4XN, S Glam, Wales
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1128/AEM.71.12.7724-7736.2005
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
A new method for detecting chimeras and other anomalies within 16S rRNA sequence records is presented. Using this method, we screened 1,399 sequences from 19 phyla, as defined by the Ribosomal Database Project, release 9, update 22, and found 5.0% to harbor substantial errors. Of these, 64.3% were obvious chimeras, 14.3% were unidentified sequencing errors, and 21.4% were highly degenerate. In all, 11 phyla contained obvious chimeras, accounting for 0.8 to 11% of the records for these phyla. Many chimeras (43.1%) were formed from parental sequences belonging to different phyla. While most comprised two fragments, 13.7% were composed of at least three fragments, often from three different sources. A separate analysis of the Bacteroidetes phylum (2,739 sequences) also revealed 5.8% records to be anomalous, of which 65.4% were apparently chimeric. Overall, we conclude that, as a conservative estimate, I in every 20 public database records is likely to be corrupt. Our results support concerns recently expressed over the quality of the public repositories. With 16S rRNA sequence data increasingly playing a dominant role in bacterial systematics and environmental biodiversity studies, it is vital that steps be taken to improve screening of sequences prior to submission. To this end, we have implemented our method as a program with a simple-to-use graphic user interface that is capable of running on a range of computer platforms. The program is called Pintail, is released under the terms of the GNU General Public License open source license, and is freely available from our website at http://www.cardiff.ac.uk/biosi/research/biosoft/.
引用
收藏
页码:7724 / 7736
页数:13
相关论文
共 21 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] GenBank
    Benson, DA
    Karsch-Mizrachi, I
    Lipman, DJ
    Ostell, J
    Rapp, BA
    Wheeler, DL
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 15 - 18
  • [3] The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy
    Cole, JR
    Chai, B
    Marsh, TL
    Farris, RJ
    Wang, Q
    Kulam, SA
    Chandra, S
    McGarrell, DM
    Schmidt, TM
    Garrity, GM
    Tiedje, JM
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 442 - 443
  • [4] Fox JL, 2005, ASM NEWS, V71, P6
  • [5] GARRITY GM, 2002, BERGEYS MANUAL SYSTE, P49
  • [6] Evaluating putative chimeric sequences from PCR-amplified products
    Gonzalez, JM
    Zimmermann, J
    Saiz-Jimenez, C
    [J]. BIOINFORMATICS, 2005, 21 (03) : 333 - 337
  • [7] Bellerophon: a program to detect chimeric sequences in multiple sequence alignments
    Huber, T
    Faulkner, G
    Hugenholtz, P
    [J]. BIOINFORMATICS, 2004, 20 (14) : 2317 - 2319
  • [8] Chimeric 16S rDNA sequences of diverse origin are accumulating in the public databases
    Hugenholtz, P
    Huber, T
    [J]. INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY, 2003, 53 : 289 - 293
  • [9] High overall diversity and dominance of microdiverse relationships in salt marsh sulphate-reducing bacteria
    Klepac-Ceraj, V
    Bahr, M
    Crump, BC
    Teske, AP
    Hobbie, JE
    Polz, MF
    [J]. ENVIRONMENTAL MICROBIOLOGY, 2004, 6 (07) : 686 - 698
  • [10] A new computational method for detection of chimeric 16S rRNA artifacts generated by PCR amplification from mixed bacterial populations
    Komatsoulis, GA
    Waterman, MS
    [J]. APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 1997, 63 (06) : 2338 - 2346