Prioritization Of Nonsynonymous Single Nucleotide Variants For Exome Sequencing Studies Via Integrative Learning On Multiple Genomic Data

被引:6
作者
Wu, Mengmeng [1 ,2 ,3 ]
Wu, Jiaxin [1 ,2 ]
Chen, Ting [3 ,4 ]
Jiang, Rui [1 ,2 ,5 ]
机构
[1] Tsinghua Univ, Bioinformat Div, MOE Key Lab Bioinformat, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Automat, Ctr Synthet & Syst Biol, TNLIST, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci, Beijing 100084, Peoples R China
[4] Univ So Calif, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
[5] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
来源
SCIENTIFIC REPORTS | 2015年 / 5卷
基金
中国国家自然科学基金; 国家高技术研究发展计划(863计划);
关键词
INTELLECTUAL DISABILITY; MUTATIONS; DISEASE; TOOL; PATHOGENICITY; CHROMATIN; PROTEINS; SERVER; GENES; SNVS;
D O I
10.1038/srep14955
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The rapid advancement of next generation sequencing technology has greatly accelerated the progress for understanding human inherited diseases via such innovations as exome sequencing. Nevertheless, the identification of causative variants from sequencing data remains a great challenge. Traditional statistical genetics approaches such as linkage analysis and association studies have limited power in analyzing exome sequencing data, while relying on simply filtration strategies and predicted functional implications of mutations to pinpoint pathogenic variants are prone to produce false positives. To overcome these limitations, we herein propose a supervised learning approach, termed snvForest, to prioritize candidate nonsynonymous single nucleotide variants for a specific type of disease by integrating 11 functional scores at the variant level and 8 association scores at the gene level. We conduct a series of large-scale in silico validation experiments, demonstrating the effectiveness of snvForest across 2,511 diseases of different inheritance styles and the superiority of our approach over two state-of-the-art methods. We further apply snvForest to three real exome sequencing data sets of epileptic encephalophathies and intellectual disability to show the ability of our approach to identify causative de novo mutations for these complex diseases. The online service and standalone software of snvForest are found at http://bioinfo.au.tsinghua.edu.cn/jianglab/snvforest.
引用
收藏
页数:15
相关论文
共 59 条
[21]   An integrated encyclopedia of DNA elements in the human genome [J].
Dunham, Ian ;
Kundaje, Anshul ;
Aldred, Shelley F. ;
Collins, Patrick J. ;
Davis, CarrieA. ;
Doyle, Francis ;
Epstein, Charles B. ;
Frietze, Seth ;
Harrow, Jennifer ;
Kaul, Rajinder ;
Khatun, Jainab ;
Lajoie, Bryan R. ;
Landt, Stephen G. ;
Lee, Bum-Kyu ;
Pauli, Florencia ;
Rosenbloom, Kate R. ;
Sabo, Peter ;
Safi, Alexias ;
Sanyal, Amartya ;
Shoresh, Noam ;
Simon, Jeremy M. ;
Song, Lingyun ;
Trinklein, Nathan D. ;
Altshuler, Robert C. ;
Birney, Ewan ;
Brown, James B. ;
Cheng, Chao ;
Djebali, Sarah ;
Dong, Xianjun ;
Dunham, Ian ;
Ernst, Jason ;
Furey, Terrence S. ;
Gerstein, Mark ;
Giardine, Belinda ;
Greven, Melissa ;
Hardison, Ross C. ;
Harris, Robert S. ;
Herrero, Javier ;
Hoffman, Michael M. ;
Iyer, Sowmya ;
Kellis, Manolis ;
Khatun, Jainab ;
Kheradpour, Pouya ;
Kundaje, Anshul ;
Lassmann, Timo ;
Li, Qunhua ;
Lin, Xinying ;
Marinov, Georgi K. ;
Merkel, Angelika ;
Mortazavi, Ali .
NATURE, 2012, 489 (7414) :57-74
[22]   Correlating Information Contents of Gene Ontology Terms to Infer Semantic Similarity of Gene Products [J].
Gan, Mingxin .
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2014, 2014
[23]   Identifying novel constrained elements by exploiting biased substitution patterns [J].
Garber, Manuel ;
Guttman, Mitchell ;
Clamp, Michele ;
Zody, Michael C. ;
Friedman, Nir ;
Xie, Xiaohui .
BIOINFORMATICS, 2009, 25 (12) :I54-I62
[24]   Improving the Assessment of the Outcome of Nonsynonymous SNVs with a Consensus Deleteriousness Score, Condel [J].
Gonzalez-Perez, Abel ;
Lopez-Bigas, Nuria .
AMERICAN JOURNAL OF HUMAN GENETICS, 2011, 88 (04) :440-449
[25]  
Hamosh A, 2005, NUCLEIC ACIDS RES, V33, pD514
[26]  
Javed A, 2014, NAT METHODS, V11, P935, DOI [10.1038/NMETH.3046, 10.1038/nmeth.3046]
[27]   Sequence-based prioritization of nonsynonymous single-nucleotide polymorphisms for the study of disease mutations [J].
Jiang, Rui ;
Yang, Hua ;
Zhou, Linqi ;
Kuo, C.-C. Jay ;
Sun, Fengzhu ;
Chen, Ting .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (02) :346-360
[28]   Constructing a gene semantic similarity network for the inference of disease genes [J].
Jiang, Rui ;
Gan, Mingxin ;
He, Peng .
BMC SYSTEMS BIOLOGY, 2011, 5
[29]   KEGG: Kyoto Encyclopedia of Genes and Genomes [J].
Kanehisa, M ;
Goto, S .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :27-30
[30]   Exome sequencing and the genetic basis of complex traits [J].
Kiezun, Adam ;
Garimella, Kiran ;
Do, Ron ;
Stitziel, Nathan O. ;
Neale, Benjamin M. ;
McLaren, Paul J. ;
Gupta, Namrata ;
Sklar, Pamela ;
Sullivan, Patrick F. ;
Moran, Jennifer L. ;
Hultman, Christina M. ;
Lichtenstein, Paul ;
Magnusson, Patrik ;
Lehner, Thomas ;
Shugart, Yin Yao ;
Price, Alkes L. ;
de Bakker, Paul I. W. ;
Purcell, Shaun M. ;
Sunyaev, Shamil R. .
NATURE GENETICS, 2012, 44 (06) :623-630