Predicting pathogenicity of missense variants with weakly supervised regression

被引:6
作者
Cao, Yue [1 ]
Sun, Yuanfei [1 ]
Karimi, Mostafa [1 ]
Chen, Haoran [1 ]
Moronfoye, Oluwaseyi [1 ]
Shen, Yang [1 ]
机构
[1] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA
关键词
clinical significance; genetic variation; genome medicine; machine learning; model interpretability; molecular mechanism; weak supervision; MOLECULAR-MECHANISMS; EVOLUTIONARY ACTION; MUTATIONS; DATABASE; IDENTIFICATION; ASSOCIATION; PHENOTYPE; GENOTYPE; EQUATION; BRCA1;
D O I
10.1002/humu.23826
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Quickly growing genetic variation data of unknown clinical significance demand computational methods that can reliably predict clinical phenotypes and deeply unravel molecular mechanisms. On the platform enabled by the Critical Assessment of Genome Interpretation (CAGI), we develop a novel "weakly supervised" regression (WSR) model that not only predicts precise clinical significance (probability of pathogenicity) from inexact training annotations (class of pathogenicity) but also infers underlying molecular mechanisms in a variant-specific manner. Compared to multiclass logistic regression, a representative multiclass classifier, our kernelized WSR improves the performance for the ENIGMA Challenge set from 0.72 to 0.97 in binary area under the receiver operating characteristic curve (AUC) and from 0.64 to 0.80 in ordinal multiclass AUC. WSR model interpretation and protein structural interpretation reach consensus in corroborating the most probable molecular mechanisms by which some pathogenic BRCA1 variants confer clinical significance, namely metal-binding disruption for p.C44F and p.C47Y, protein-binding disruption for p.M18T, and structure destabilization for p.S1715N.
引用
收藏
页码:1579 / 1592
页数:14
相关论文
共 57 条
[1]  
Adzhubei Ivan, 2013, Curr Protoc Hum Genet, VChapter 7, DOI 10.1002/0471142905.hg0720s76
[2]  
AGRESTI A, 2002, CATEGORICAL DATA ANA
[3]  
[Anonymous], 1964, Automation and Remote Control
[4]  
ANTAL E, 2016, ACTA CYBERNET, V22, P5, DOI DOI 10.1002/HUMU.23826
[5]   Ordinal Regression Methods: Survey and Experimental Study [J].
Antonio Gutierrez, Pedro ;
Perez-Ortiz, Maria ;
Sanchez-Monedero, Javier ;
Fernandez-Navarro, Francisco ;
Hervas-Martinez, Cesar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) :127-146
[6]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
[7]   MULTINOMIAL LOGISTIC-REGRESSION ALGORITHM [J].
BOHNING, D .
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1992, 44 (01) :197-200
[8]   SNAP: predict effect of non-synonymous polymorphisms on function [J].
Bromberg, Yana ;
Rost, Burkhard .
NUCLEIC ACIDS RESEARCH, 2007, 35 (11) :3823-3835
[9]  
Chakravarty Debyani, 2017, JCO Precis Oncol, V2017, DOI 10.1200/PO.17.00011
[10]   Using bioinformatics to predict the functional impact of SNVs [J].
Cline, Melissa S. ;
Karchin, Rachel .
BIOINFORMATICS, 2011, 27 (04) :441-448