vipR: variant identification in pooled DNA using R

被引：31

作者：

Altmann, Andre ^{[1
]}

Weber, Peter ^{[2
]}

Quast, Carina ^{[3
]}

Rex-Haffner, Monika ^{[3
]}

Binder, Elisabeth B. ^{[3
]}

Mueller-Myhsok, Bertram ^{[1
]}

机构：

[1] Max Planck Inst Psychiat, Dept Stat Genet, Munich, Germany

[2] Max Planck Inst Psychiat, Dept Mol Neurogenet, Munich, Germany

[3] Max Planck Inst Psychiat, Dept Mol Genet Affect Disorder, Munich, Germany

来源：

BIOINFORMATICS | 2011年 / 27卷 / 13期

关键词：

HERITABILITY;

D O I：

10.1093/bioinformatics/btr205

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: High-throughput-sequencing (HTS) technologies are the method of choice for screening the human genome for rare sequence variants causing susceptibility to complex diseases. Unfortunately, preparation of samples for a large number of individuals is still very cost-and labor intensive. Thus, recently, screens for rare sequence variants were carried out in samples of pooled DNA, in which equimolar amounts of DNA from multiple individuals are mixed prior to sequencing with HTS. The resulting sequence data, however, poses a bioinformatics challenge: the discrimination of sequencing errors from real sequence variants present at a low frequency in the DNA pool. Results: Our method vipR uses data from multiple DNA pools in order to compensate for differences in sequencing error rates along the sequenced region. More precisely, instead of aiming at discriminating sequence variants from sequencing errors, vipR identifies sequence positions that exhibit significantly different minor allele frequencies in at least two DNA pools using the Skellam distribution. The performance of vipR was compared with three other models on data from a targeted resequencing study of the TMEM132D locus in 600 individuals distributed over four DNA pools. Performance of the methods was computed on SNPs that were also genotyped individually using a MALDI-TOF technique. On a set of 82 sequence variants, vipR achieved an average sensitivity of 0.80 at an average specificity of 0.92, thus outperforming the reference methods by at least 0.17 in specificity at comparable sensitivity.

引用

页码：I77 / I84

页数：8

共 23 条

[1] A map of human genome variation from population-scale sequencing [J].

Altshuler, David ;

Durbin, Richard M. ;

Abecasis, Goncalo R. ;

Bentley, David R. ;

Chakravarti, Aravinda ;

Clark, Andrew G. ;

Collins, Francis S. ;

De la Vega, Francisco M. ;

Donnelly, Peter ;

Egholm, Michael ;

Flicek, Paul ;

Gabriel, Stacey B. ;

Gibbs, Richard A. ;

Knoppers, Bartha M. ;

Lander, Eric S. ;

Lehrach, Hans ;

Mardis, Elaine R. ;

McVean, Gil A. ;

Nickerson, DebbieA. ;

Peltonen, Leena ;

Schafer, Alan J. ;

Sherry, Stephen T. ;

Wang, Jun ;

Wilson, Richard K. ;

Gibbs, Richard A. ;

Deiros, David ;

Metzker, Mike ;

Muzny, Donna ;

Reid, Jeff ;

Wheeler, David ;

Wang, Jun ;

Li, Jingxiang ;

Jian, Min ;

Li, Guoqing ;

Li, Ruiqiang ;

Liang, Huiqing ;

Tian, Geng ;

Wang, Bo ;

Wang, Jian ;

Wang, Wei ;

Yang, Huanming ;

Zhang, Xiuqing ;

Zheng, Huisong ;

Lander, Eric S. ;

Altshuler, David L. ;

Ambrogio, Lauren ;

Bloom, Toby ;

Cibulskis, Kristian ;

Fennell, Tim J. ;

Gabriel, Stacey B. .

NATURE, 2010, 467 (7319) :1061-1073

[2] A statistical method for the detection of variants from next-generation resequencing of DNA pools [J].

Bansal, Vikas .

BIOINFORMATICS, 2010, 26 (12) :i318-i324

[3] Genome variation discovery with high-throughput sequencing data [J].

Dalca, Adrian V. ;

Brudno, Michael .

BRIEFINGS IN BIOINFORMATICS, 2010, 11 (01) :3-14

[4] Substantial biases in ultra-short read data sets from high-throughput DNA sequencing [J].

Dohm, Juliane C. ;

Lottaz, Claudio ;

Borodina, Tatiana ;

Himmelbauer, Heinz .

NUCLEIC ACIDS RESEARCH, 2008, 36 (16)

[5]

Druley TE, 2009, NAT METHODS, V6, P263, DOI [10.1038/NMETH.1307, 10.1038/nmeth.1307]

[6] TMEM132D, a new candidate for anxiety phenotypes: evidence from human and mouse studies [J].

Erhardt, A. ;

Czibere, L. ;

Roeske, D. ;

Lucae, S. ;

Unschuld, P. G. ;

Ripke, S. ;

Specht, M. ;

Kohli, M. A. ;

Kloiber, S. ;

Ising, M. ;

Heck, A. ;

Pfister, H. ;

Zimmermann, P. ;

Lieb, R. ;

Puetz, B. ;

Uhr, M. ;

Weber, P. ;

Deussing, J. M. ;

Gonik, M. ;

Bunck, M. ;

Kessler, M. S. ;

Frank, E. ;

Hohoff, C. ;

Domschke, K. ;

Krakowitzky, P. ;

Maier, W. ;

Bandelow, B. ;

Jacob, C. ;

Deckert, J. ;

Schreiber, S. ;

Strohmaier, J. ;

Noethen, M. ;

Cichon, S. ;

Rietschel, M. ;

Bettecken, T. ;

Keck, M. E. ;

Landgraf, R. ;

Mueller-Myhsok, B. ;

Holsboer, F. ;

Binder, E. B. .

MOLECULAR PSYCHIATRY, 2011, 16 (06) :647-663

[7] VarScan: variant detection in massively parallel sequencing of individual and pooled samples [J].

Koboldt, Daniel C. ;

Chen, Ken ;

Wylie, Todd ;

Larson, David E. ;

McLellan, Michael D. ;

Mardis, Elaine R. ;

Weinstock, George M. ;

Wilson, Richard K. ;

Ding, Li .

BIOINFORMATICS, 2009, 25 (17) :2283-2285

[8] Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].

Li, Heng ;

Ruan, Jue ;

Durbin, Richard .

GENOME RESEARCH, 2008, 18 (11) :1851-1858

[9] The Sequence Alignment/Map format and SAMtools [J].

Li, Heng ;

Handsaker, Bob ;

Wysoker, Alec ;

Fennell, Tim ;

Ruan, Jue ;

Homer, Nils ;

Marth, Gabor ;

Abecasis, Goncalo ;

Durbin, Richard .

BIOINFORMATICS, 2009, 25 (16) :2078-2079

[10]

Li H, 2009, BIOINFORMATICS, V25, P1094, DOI [10.1093/bioinformatics/btp324, 10.1093/bioinformatics/btp100]

← 1 2 3 →