Feature-based classifiers for somatic mutation detection in tumour-normal paired sequencing data

被引:105
作者
Ding, Jiarui [1 ,2 ]
Bashashati, Ali [1 ]
Roth, Andrew [1 ]
Oloumi, Arusha [1 ]
Tse, Kane [3 ]
Zeng, Thomas [3 ]
Haffari, Gholamreza [1 ]
Hirst, Martin [3 ]
Marra, Marco A. [3 ]
Condon, Anne [2 ]
Aparicio, Samuel [1 ,4 ]
Shah, Sohrab P. [1 ,2 ,4 ]
机构
[1] BC Canc Agcy, Dept Mol Oncol, Vancouver, BC, Canada
[2] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1W5, Canada
[3] Canadas Michael Smith Genome Sci Ctr, Vancouver, BC, Canada
[4] Univ British Columbia, Dept Pathol, Vancouver, BC V6T 1W5, Canada
关键词
FREQUENT MUTATION; GENOME; IDENTIFICATION; TOOLKIT;
D O I
10.1093/bioinformatics/btr629
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Results: We present the comparison of four standard supervised machine learning algorithms for the purpose of somatic SNV prediction in tumour/normal NGS experiments. To evaluate these approaches (random forest, Bayesian additive regression tree, support vector machine and logistic regression), we constructed 106 features representing 3369 candidate somatic SNVs from 48 breast cancer genomes, originally predicted with naive methods and subsequently revalidated to establish ground truth labels. We trained the classifiers on this data (consisting of 1015 true somatic mutations and 2354 non-somatic mutation positions) and conducted a rigorous evaluation of these methods using a cross-validation framework and hold-out test NGS data from both exome capture and whole genome shotgun platforms. All learning algorithms employing predictive discriminative approaches with feature selection improved the predictive accuracy over standard approaches by statistically significant margins. In addition, using unsupervised clustering of the ground truth 'false positive' predictions, we noted several distinct classes and present evidence suggesting non-overlapping sources of technical artefacts illuminating important directions for future study.
引用
收藏
页码:167 / 175
页数:9
相关论文
共 24 条
[11]   The Sequence Alignment/Map format and SAMtools [J].
Li, Heng ;
Handsaker, Bob ;
Wysoker, Alec ;
Fennell, Tim ;
Ruan, Jue ;
Homer, Nils ;
Marth, Gabor ;
Abecasis, Goncalo ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (16) :2078-2079
[12]   SNP detection for massively parallel whole-genome resequencing [J].
Li, Ruiqiang ;
Li, Yingrui ;
Fang, Xiaodong ;
Yang, Huanming ;
Wang, Jian ;
Kristiansen, Karsten ;
Wang, Jun .
GENOME RESEARCH, 2009, 19 (06) :1124-1132
[13]   Subtype-specific mutation of PPP2R1A in endometrial and ovarian carcinomas [J].
McConechy, Melissa K. ;
Anglesio, Michael S. ;
Kalloger, Steve E. ;
Yang, Winnie ;
Senz, Janine ;
Chow, Christine ;
Heravi-Moussavi, Alireza ;
Morin, Gregg B. ;
Mes-Masson, Anne-Marie ;
Carey, Mark S. ;
McAlpine, Jessica N. ;
Kwon, Janice S. ;
Prentice, Leah M. ;
Boyd, Niki ;
Shah, Sohrab P. ;
Gilks, C. Blake ;
Huntsman, David G. .
JOURNAL OF PATHOLOGY, 2011, 223 (05) :567-573
[14]   The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data [J].
McKenna, Aaron ;
Hanna, Matthew ;
Banks, Eric ;
Sivachenko, Andrey ;
Cibulskis, Kristian ;
Kernytsky, Andrew ;
Garimella, Kiran ;
Altshuler, David ;
Gabriel, Stacey ;
Daly, Mark ;
DePristo, Mark A. .
GENOME RESEARCH, 2010, 20 (09) :1297-1303
[15]  
Meacham F., 2011, NATURE P JUN
[16]   Identification and correction of systematic error in high-throughput sequence data [J].
Meacham, Frazer ;
Boffelli, Dario ;
Dhahbi, Joseph ;
Martin, David I. K. ;
Singer, Meromit ;
Pachter, Lior .
BMC BIOINFORMATICS, 2011, 12
[17]   Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma [J].
Morin, Ryan D. ;
Mendez-Lago, Maria ;
Mungall, Andrew J. ;
Goya, Rodrigo ;
Mungall, Karen L. ;
Corbett, Richard D. ;
Johnson, Nathalie A. ;
Severson, Tesa M. ;
Chiu, Readman ;
Field, Matthew ;
Jackman, Shaun ;
Krzywinski, Martin ;
Scott, David W. ;
Trinh, Diane L. ;
Tamura-Wells, Jessica ;
Li, Sa ;
Firme, Marlo R. ;
Rogic, Sanja ;
Griffith, Malachi ;
Chan, Susanna ;
Yakovenko, Oleksandr ;
Meyer, Irmtraud M. ;
Zhao, Eric Y. ;
Smailus, Duane ;
Moksa, Michelle ;
Chittaranjan, Suganthi ;
Rimsza, Lisa ;
Brooks-Wilson, Angela ;
Spinelli, John J. ;
Ben-Neriah, Susana ;
Meissner, Barbara ;
Woolcock, Bruce ;
Boyle, Merrill ;
McDonald, Helen ;
Tam, Angela ;
Zhao, Yongjun ;
Delaney, Allen ;
Zeng, Thomas ;
Tse, Kane ;
Butterfield, Yaron ;
Birol, Inanc ;
Holt, Rob ;
Schein, Jacqueline ;
Horsman, Douglas E. ;
Moore, Richard ;
Jones, Steven J. M. ;
Connors, Joseph M. ;
Hirst, Martin ;
Gascoyne, Randy D. ;
Marra, Marco A. .
NATURE, 2011, 476 (7360) :298-303
[18]   Somatic mutations altering EZH2 (Tyr641) in follicular and diffuse large B-cell lymphomas of germinal-center origin [J].
Morin, Ryan D. ;
Johnson, Nathalie A. ;
Severson, Tesa M. ;
Mungall, Andrew J. ;
An, Jianghong ;
Goya, Rodrigo ;
Paul, Jessica E. ;
Boyle, Merrill ;
Woolcock, Bruce W. ;
Kuchenbauer, Florian ;
Yap, Damian ;
Humphries, R. Keith ;
Griffith, Obi L. ;
Shah, Sohrab ;
Zhu, Henry ;
Kimbara, Michelle ;
Shashkin, Pavel ;
Charlot, Jean F. ;
Tcherpakov, Marianna ;
Corbett, Richard ;
Tam, Angela ;
Varhol, Richard ;
Smailus, Duane ;
Moksa, Michelle ;
Zhao, Yongjun ;
Delaney, Allen ;
Qian, Hong ;
Birol, Inanc ;
Schein, Jacqueline ;
Moore, Richard ;
Holt, Robert ;
Horsman, Doug E. ;
Connors, Joseph M. ;
Jones, Steven ;
Aparicio, Samuel ;
Hirst, Martin ;
Gascoyne, Randy D. ;
Marra, Marco A. .
NATURE GENETICS, 2010, 42 (02) :181-U124
[19]   Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia [J].
Puente, Xose S. ;
Pinyol, Magda ;
Quesada, Victor ;
Conde, Laura ;
Ordonez, Gonzalo R. ;
Villamor, Neus ;
Escaramis, Georgia ;
Jares, Pedro ;
Bea, Silvia ;
Gonzalez-Diaz, Marcos ;
Bassaganyas, Laia ;
Baumann, Tycho ;
Juan, Manel ;
Lopez-Guerra, Monica ;
Colomer, Dolors ;
Tubio, Jose M. C. ;
Lopez, Cristina ;
Navarro, Alba ;
Tornador, Cristian ;
Aymerich, Marta ;
Rozman, Maria ;
Hernandez, Jesus M. ;
Puente, Diana A. ;
Freije, Jose M. P. ;
Velasco, Gloria ;
Gutierrez-Fernandez, Ana ;
Costa, Dolors ;
Carrio, Anna ;
Guijarro, Sara ;
Enjuanes, Anna ;
Hernandez, Lluis ;
Yaguee, Jordi ;
Nicolas, Pilar ;
Romeo-Casabona, Carlos M. ;
Himmelbauer, Heinz ;
Castillo, Ester ;
Dohm, Juliane C. ;
de Sanjose, Silvia ;
Piris, Miguel A. ;
de Alava, Enrique ;
Miguel, Jesus San ;
Royo, Romina ;
Gelpi, Josep L. ;
Torrents, David ;
Orozco, Modesto ;
Pisano, David G. ;
Valencia, Alfonso ;
Guigo, Roderic ;
Bayes, Monica ;
Heath, Simon .
NATURE, 2011, 475 (7354) :101-105
[20]   Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution [J].
Shah, Sohrab P. ;
Morin, Ryan D. ;
Khattra, Jaswinder ;
Prentice, Leah ;
Pugh, Trevor ;
Burleigh, Angela ;
Delaney, Allen ;
Gelmon, Karen ;
Guliany, Ryan ;
Senz, Janine ;
Steidl, Christian ;
Holt, Robert A. ;
Jones, Steven ;
Sun, Mark ;
Leung, Gillian ;
Moore, Richard ;
Severson, Tesa ;
Taylor, Greg A. ;
Teschendorff, Andrew E. ;
Tse, Kane ;
Turashvili, Gulisa ;
Varhol, Richard ;
Warren, Rene L. ;
Watson, Peter ;
Zhao, Yongjun ;
Caldas, Carlos ;
Huntsman, David ;
Hirst, Martin ;
Marra, Marco A. ;
Aparicio, Samuel .
NATURE, 2009, 461 (7265) :809-U67