Prediction of inherited genomic susceptibility to 20 common cancer types by a supervised machine-learning method

被引:22
作者
Kim, Byung-Ju [1 ,2 ,3 ]
Kim, Sung-Hou [1 ,2 ,3 ,4 ]
机构
[1] Univ Calif Berkeley, Dept Chem, Berkeley, CA 94720 USA
[2] Yonsei Univ, Dept Integrat Omics Biomed Sci, Grad Sch, Seoul, South Korea
[3] Lawrence Berkeley Natl Lab, Mol Biophys & Integrated Bioimaging Div, Berkeley, CA 94720 USA
[4] Univ Calif Berkeley, Ctr Computat Biol, Berkeley, CA 94720 USA
关键词
genomic/environmental factors; k nearest neighbor method; SNP syntax; multiple assortment model; cancer risk;
D O I
10.1073/pnas.1717960115
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
070301 [无机化学]; 070403 [天体物理学]; 070507 [自然资源与国土空间规划学]; 090105 [作物生产系统与生态工程];
摘要
Prevention and early intervention are the most effective ways of avoiding or minimizing psychological, physical, and financial suffering from cancer. However, such proactive action requires the ability to predict the individual's susceptibility to cancer with a measure of probability. Of the triad of cancer-causing factors (inherited genomic susceptibility, environmental factors, and lifestyle factors), the inherited genomic component may be derivable from the recent public availability of a large body of whole-genome variation data. However, genome-wide association studies have so far showed limited success in predicting the inherited susceptibility to common cancers. We present here a multiple classification approach for predicting individuals' inherited genomic susceptibility to acquire the most likely phenotype among a panel of 20 major common cancer types plus 1 "healthy" type by application of a supervised machine-learning method under competing conditions among the cohorts of the 21 types. This approach suggests that, depending on the phenotypes of 5,919 individuals of "white" ethnic population in this study, (i) the portion of the cohort of a cancer type who acquired the observed type due to mostly inherited genomic susceptibility factors ranges from about 33 to 88% (or its corollary: the portion due to mostly environmental and lifestyle factors ranges from 12 to 67%), and (ii) on an individual level, the method also predicts individuals' inherited genomic susceptibility to acquire the other types ranked with associated probabilities. These probabilities may provide practical information for individuals, heath professionals, and health policy-makers related to prevention and/or early intervention of cancer.
引用
收藏
页码:1322 / 1327
页数:6
相关论文
共 26 条
[1]
Data quality control in genetic case-control association studies [J].
Anderson, Carl A. ;
Pettersson, Fredrik H. ;
Clarke, Geraldine M. ;
Cardon, Lon R. ;
Morris, Andrew P. ;
Zondervan, Krina T. .
NATURE PROTOCOLS, 2010, 5 (09) :1564-1573
[2]
[Anonymous], CANC RISK STAT
[3]
[Anonymous], CANC PREV OV
[4]
[Anonymous], 2015, Nature, DOI [DOI 10.1038/NATURE15393, 10.1038/nature15393]
[5]
[Anonymous], 2015, BRCA1 and BRCA2: Cancer risk and genetic testing
[6]
[Anonymous], OPENINTRO STAT
[7]
Bahcall O., 2013, Nat Genet, DOI [10.1038/ngicogs.1, DOI 10.1038/NGICOGS.1]
[8]
Second-generation PLINK: rising to the challenge of larger and richer datasets [J].
Chang, Christopher C. ;
Chow, Carson C. ;
Tellier, Laurent C. A. M. ;
Vattikuti, Shashaank ;
Purcell, Shaun M. ;
Lee, James J. .
GIGASCIENCE, 2015, 4
[9]
Hallmarks of Cancer: The Next Generation [J].
Hanahan, Douglas ;
Weinberg, Robert A. .
CELL, 2011, 144 (05) :646-674
[10]
Cancer statistics, 2008 [J].
Jemal, Ahmedin ;
Siegel, Rebecca ;
Ward, Elizabeth ;
Hao, Yongping ;
Xu, Jiaquan ;
Murray, Taylor ;
Thun, Michael J. .
CA-A CANCER JOURNAL FOR CLINICIANS, 2008, 58 (02) :71-96