Diverse and Widespread Contamination Evident in the Unmapped Depths of High Throughput Sequencing Data

被引:140
作者
Lusk, Richard W. [1 ]
机构
[1] Univ Michigan, Dept Ecol & Evolutionary Biol, Ann Arbor, MI 48109 USA
来源
PLOS ONE | 2014年 / 9卷 / 10期
关键词
DNA-SEQUENCES; PCR; CELLS;
D O I
10.1371/journal.pone.0110808
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Trace quantities of contaminating DNA are widespread in the laboratory environment, but their presence has received little attention in the context of high throughput sequencing. This issue is highlighted by recent works that have rested controversial claims upon sequencing data that appear to support the presence of unexpected exogenous species. I used reads that preferentially aligned to alternate genomes to infer the distribution of potential contaminant species in a set of independent sequencing experiments. I confirmed that dilute samples are more exposed to contaminating DNA, and, focusing on four single-cell sequencing experiments, found that these contaminants appear to originate from a wide diversity of clades. Although negative control libraries prepared from 'blank' samples recovered the highest-frequency contaminants, low-frequency contaminants, which appeared to make heterogeneous contributions to samples prepared in parallel within a single experiment, were not well controlled for. I used these results to show that, despite heavy replication and plausible controls, contamination can explain all of the observations used to support a recent claim that complete genes pass from food to human blood. Contamination must be considered a potential source of signals of exogenous species in sequencing data, even if these signals are replicated in independent experiments, vary across conditions, or indicate a species which seems a priori unlikely to contaminate. Negative control libraries processed in parallel are essential to control for contaminant DNAs, but their limited ability to recover low-frequency contaminants must be recognized.
引用
收藏
页数:8
相关论文
共 42 条
[11]   Contamination of Qiagen DNA extraction kits with Legionella DNA [J].
Evans, GE ;
Murdoch, DR ;
Anderson, TP ;
Potter, HC ;
George, PM ;
Chambers, ST .
JOURNAL OF CLINICAL MICROBIOLOGY, 2003, 41 (07) :3452-3453
[12]  
Falconer E, 2012, NAT METHODS, V9, P1107, DOI [10.1038/nmeth.2206, 10.1038/NMETH.2206]
[13]   An investigation of the rigor of interpretation rules for STRs derived from less than 100 pg of DNA [J].
Gill, P ;
Whitaker, J ;
Flaxman, C ;
Brown, N ;
Buckleton, J .
FORENSIC SCIENCE INTERNATIONAL, 2000, 112 (01) :17-40
[14]   PREDOMINANT GRAM-POSITIVE BACTERIA IN HUMAN FECES - NUMBERS, VARIETY, AND PERSISTENCE [J].
GOSSLING, J ;
SLACK, JM .
INFECTION AND IMMUNITY, 1974, 9 (04) :719-729
[15]   Plant and Fungal Diversity in Gut Microbiota as Revealed by Molecular and Culture Investigations (Publication with Expression of Concern) [J].
Gouba, Nina ;
Raoult, Didier ;
Drancourt, Michel .
PLOS ONE, 2013, 8 (03)
[16]   IDENTIFICATION AND ELIMINATION OF DNA-SEQUENCES IN TAQ DNA-POLYMERASE [J].
HUGHES, MS ;
BECK, LA ;
SKUCE, RA .
JOURNAL OF CLINICAL MICROBIOLOGY, 1994, 32 (08) :2007-2008
[17]   Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq [J].
Islam, Saiful ;
Kjallquist, Una ;
Moliner, Annalena ;
Zajac, Pawel ;
Fan, Jian-Bing ;
Lonnerberg, Peter ;
Linnarsson, Sten .
GENOME RESEARCH, 2011, 21 (07) :1160-1167
[18]   Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform [J].
Kircher, Martin ;
Sawyer, Susanna ;
Meyer, Matthias .
NUCLEIC ACIDS RESEARCH, 2012, 40 (01) :e3
[19]   Initial sequencing and analysis of the human genome [J].
Lander, ES ;
Int Human Genome Sequencing Consortium ;
Linton, LM ;
Birren, B ;
Nusbaum, C ;
Zody, MC ;
Baldwin, J ;
Devon, K ;
Dewar, K ;
Doyle, M ;
FitzHugh, W ;
Funke, R ;
Gage, D ;
Harris, K ;
Heaford, A ;
Howland, J ;
Kann, L ;
Lehoczky, J ;
LeVine, R ;
McEwan, P ;
McKernan, K ;
Meldrim, J ;
Mesirov, JP ;
Miranda, C ;
Morris, W ;
Naylor, J ;
Raymond, C ;
Rosetti, M ;
Santos, R ;
Sheridan, A ;
Sougnez, C ;
Stange-Thomann, N ;
Stojanovic, N ;
Subramanian, A ;
Wyman, D ;
Rogers, J ;
Sulston, J ;
Ainscough, R ;
Beck, S ;
Bentley, D ;
Burton, J ;
Clee, C ;
Carter, N ;
Coulson, A ;
Deadman, R ;
Deloukas, P ;
Dunham, A ;
Dunham, I ;
Durbin, R ;
French, L .
NATURE, 2001, 409 (6822) :860-921
[20]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)