SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors

被引:151
作者
Goya, Rodrigo [1 ,2 ]
Sun, Mark G. F. [1 ]
Morin, Ryan D. [2 ]
Leung, Gillian [1 ]
Ha, Gavin [1 ]
Wiegand, Kimberley C. [3 ,4 ]
Senz, Janine [3 ,4 ]
Crisan, Anamaria [1 ]
Marra, Marco A. [2 ]
Hirst, Martin [2 ]
Huntsman, David [3 ,4 ]
Murphy, Kevin P. [5 ]
Aparicio, Sam [1 ]
Shah, Sohrab P. [1 ,3 ,4 ]
机构
[1] British Columbia Canc Res Ctr, Breast Canc Res Program, Dept Mol Oncol, Vancouver, BC V5Z 1L3, Canada
[2] British Columbia Canc Agcy, Genome Sci Ctr, Vancouver, BC V5Z 4E6, Canada
[3] British Columbia Canc Agcy, Ctr Translat & Appl Genom, Vancouver, BC V5Z 4E6, Canada
[4] Prov Hlth Serv Author Labs, Vancouver, BC, Canada
[5] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1W5, Canada
关键词
MYELOID-LEUKEMIA GENOME; RNA-SEQ; ALIGNMENT;
D O I
10.1093/bioinformatics/btq040
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Next-generation sequencing (NGS) has enabled whole genome and transcriptome single nucleotide variant (SNV) discovery in cancer. NGS produces millions of short sequence reads that, once aligned to a reference genome sequence, can be interpreted for the presence of SNVs. Although tools exist for SNV discovery from NGS data, none are specifically suited to work with data from tumors, where altered ploidy and tumor cellularity impact the statistical expectations of SNV discovery. Results: We developed three implementations of a probabilistic Binomial mixture model, called SNVMix, designed to infer SNVs from NGS data from tumors to address this problem. The first models allelic counts as observations and infers SNVs and model parameters using an expectation maximization (EM) algorithm and is therefore capable of adjusting to deviation of allelic frequencies inherent in genomically unstable tumor genomes. The second models nucleotide and mapping qualities of the reads by probabilistically weighting the contribution of a read/nucleotide to the inference of a SNV based on the confidence we have in the base call and the read alignment. The third combines filtering out low-quality data in addition to probabilistic weighting of the qualities. We quantitatively evaluated these approaches on 16 ovarian cancer RNASeq datasets with matched genotyping arrays and a human breast cancer genome sequenced to > 40x (haploid) coverage with ground truth data and show systematically that the SNVMix models outperform competing approaches.
引用
收藏
页码:730 / 736
页数:7
相关论文
共 16 条
[1]   Core signaling pathways in human pancreatic cancers revealed by global genomic analyses [J].
Jones, Sian ;
Zhang, Xiaosong ;
Parsons, D. Williams ;
Lin, Jimmy Cheng-Ho ;
Leary, Rebecca J. ;
Angenendt, Philipp ;
Mankoo, Parminder ;
Carter, Hannah ;
Kamiyama, Hirohiko ;
Jimeno, Antonio ;
Hong, Seung-Mo ;
Fu, Baojin ;
Lin, Ming-Tseh ;
Calhoun, Eric S. ;
Kamiyama, Mihoko ;
Walter, Kimberly ;
Nikolskaya, Tatiana ;
Nikolsky, Yuri ;
Hartigan, James ;
Smith, Douglas R. ;
Hidalgo, Manuel ;
Leach, Steven D. ;
Klein, Alison P. ;
Jaffee, Elizabeth M. ;
Goggins, Michael ;
Maitra, Anirban ;
Iacobuzio-Donahue, Christine ;
Eshleman, James R. ;
Kern, Scott E. ;
Hruban, Ralph H. ;
Karchin, Rachel ;
Papadopoulos, Nickolas ;
Parmigiani, Giovanni ;
Vogelstein, Bert ;
Velculescu, Victor E. ;
Kinzler, Kenneth W. .
SCIENCE, 2008, 321 (5897) :1801-1806
[2]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[3]   The first human acute myeloid leukemia genome ever fully sequenced [J].
Falini, Brunangelo .
HAEMATOLOGICA, 2024, 109 (01) :1-2
[4]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858
[5]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[6]   SOAP: short oligonucleotide alignment program [J].
Li, Ruiqiang ;
Li, Yingrui ;
Kristiansen, Karsten ;
Wang, Jun .
BIOINFORMATICS, 2008, 24 (05) :713-714
[7]   Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays [J].
Lin, Shin ;
Carvalho, Benilton ;
Cutler, David J. ;
Arking, Dan E. ;
Chakravarti, Aravinda ;
Irizarry, Rafael A. .
GENOME BIOLOGY, 2008, 9 (04)
[8]   Recurring Mutations Found by Sequencing an Acute Myeloid Leukemia Genome [J].
Mardis, Elaine R. ;
Ding, Li ;
Dooling, David J. ;
Larson, David E. ;
McLellan, Michael D. ;
Chen, Ken ;
Koboldt, Daniel C. ;
Fulton, Robert S. ;
Delehaunty, Kim D. ;
McGrath, Sean D. ;
Fulton, Lucinda A. ;
Locke, Devin P. ;
Magrini, Vincent J. ;
Abbott, Rachel M. ;
Vickery, Tammi L. ;
Reed, Jerry S. ;
Robinson, Jody S. ;
Wylie, Todd ;
Smith, Scott M. ;
Carmichael, Lynn ;
Eldred, James M. ;
Harris, Christopher C. ;
Walker, Jason ;
Peck, Joshua B. ;
Du, Feiyu ;
Dukes, Adam F. ;
Sanderson, Gabriel E. ;
Brummett, Anthony M. ;
Clark, Eric ;
McMichael, Joshua F. ;
Meyer, Rick J. ;
Schindler, Jonathan K. ;
Pohl, Craig S. ;
Wallis, John W. ;
Shi, Xiaoqi ;
Lin, Ling ;
Schmidt, Heather ;
Tang, Yuzhu ;
Haipek, Carrie ;
Wiechert, Madeline E. ;
Ivy, Jolynda V. ;
Kalicki, Joelle ;
Elliott, Glendoria ;
Ries, Rhonda E. ;
Payton, Jacqueline E. ;
Westervelt, Peter ;
Tomasson, Michael H. ;
Watson, Mark A. ;
Baty, Jack ;
Heath, Sharon .
NEW ENGLAND JOURNAL OF MEDICINE, 2009, 361 (11) :1058-1066
[9]   RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays [J].
Marioni, John C. ;
Mason, Christopher E. ;
Mane, Shrikant M. ;
Stephens, Matthew ;
Gilad, Yoav .
GENOME RESEARCH, 2008, 18 (09) :1509-1517
[10]   Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing [J].
Morin, Ryan D. ;
Bainbridge, Matthew ;
Fejes, Anthony ;
Hirst, Martin ;
Krzywinski, Martin ;
Pugh, Trevor J. ;
McDonald, Helen ;
Varhol, Richard ;
Jones, Steven J. M. ;
Marra, Marco A. .
BIOTECHNIQUES, 2008, 45 (01) :81-+