Prioritization Of Nonsynonymous Single Nucleotide Variants For Exome Sequencing Studies Via Integrative Learning On Multiple Genomic Data

被引:6
作者
Wu, Mengmeng [1 ,2 ,3 ]
Wu, Jiaxin [1 ,2 ]
Chen, Ting [3 ,4 ]
Jiang, Rui [1 ,2 ,5 ]
机构
[1] Tsinghua Univ, Bioinformat Div, MOE Key Lab Bioinformat, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Automat, Ctr Synthet & Syst Biol, TNLIST, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci, Beijing 100084, Peoples R China
[4] Univ So Calif, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
[5] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
来源
SCIENTIFIC REPORTS | 2015年 / 5卷
基金
中国国家自然科学基金; 国家高技术研究发展计划(863计划);
关键词
INTELLECTUAL DISABILITY; MUTATIONS; DISEASE; TOOL; PATHOGENICITY; CHROMATIN; PROTEINS; SERVER; GENES; SNVS;
D O I
10.1038/srep14955
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The rapid advancement of next generation sequencing technology has greatly accelerated the progress for understanding human inherited diseases via such innovations as exome sequencing. Nevertheless, the identification of causative variants from sequencing data remains a great challenge. Traditional statistical genetics approaches such as linkage analysis and association studies have limited power in analyzing exome sequencing data, while relying on simply filtration strategies and predicted functional implications of mutations to pinpoint pathogenic variants are prone to produce false positives. To overcome these limitations, we herein propose a supervised learning approach, termed snvForest, to prioritize candidate nonsynonymous single nucleotide variants for a specific type of disease by integrating 11 functional scores at the variant level and 8 association scores at the gene level. We conduct a series of large-scale in silico validation experiments, demonstrating the effectiveness of snvForest across 2,511 diseases of different inheritance styles and the superiority of our approach over two state-of-the-art methods. We further apply snvForest to three real exome sequencing data sets of epileptic encephalophathies and intellectual disability to show the ability of our approach to identify causative de novo mutations for these complex diseases. The online service and standalone software of snvForest are found at http://bioinfo.au.tsinghua.edu.cn/jianglab/snvforest.
引用
收藏
页数:15
相关论文
共 59 条
[1]  
Abecasis G.R., 2012, NATURE, V491, P56, DOI DOI 10.1038/nature11632
[2]   A method and server for predicting damaging missense mutations [J].
Adzhubei, Ivan A. ;
Schmidt, Steffen ;
Peshkin, Leonid ;
Ramensky, Vasily E. ;
Gerasimova, Anna ;
Bork, Peer ;
Kondrashov, Alexey S. ;
Sunyaev, Shamil R. .
NATURE METHODS, 2010, 7 (04) :248-249
[3]   Gene prioritization through genomic data fusion [J].
Aerts, S ;
Lambrechts, D ;
Maity, S ;
Van Loo, P ;
Coessens, B ;
De Smet, F ;
Tranchevent, LC ;
De Moor, B ;
Marynen, P ;
Hassan, B ;
Carmeliet, P ;
Moreau, Y .
NATURE BIOTECHNOLOGY, 2006, 24 (05) :537-544
[4]   De novo mutations in epileptic encephalopathies [J].
Allen, Andrew S. ;
Berkovic, Samuel F. ;
Cossette, Patrick ;
Delanty, Norman ;
Dlugos, Dennis ;
Eichler, Evan E. ;
Epstein, Michael P. ;
Glauser, Tracy ;
Goldstein, David B. ;
Han, Yujun ;
Heinzen, Erin L. ;
Hitomi, Yuki ;
Howell, Katherine B. ;
Johnson, Michael R. ;
Kuzniecky, Ruben ;
Lowenstein, Daniel H. ;
Lu, Yi-Fan ;
Madou, Maura R. Z. ;
Marson, Anthony G. ;
Mefford, Heather C. ;
Nieh, Sahar Esmaeeli ;
O'Brien, Terence J. ;
Ottman, Ruth ;
Petrovski, Slave ;
Poduri, Annapurna ;
Ruzzo, Elizabeth K. ;
Scheffer, Ingrid E. ;
Sherr, Elliott H. ;
Yuskaitis, Christopher J. ;
Abou-Khalil, Bassel ;
Alldredge, Brian K. ;
Bautista, Jocelyn F. ;
Berkovic, Samuel F. ;
Boro, Alex ;
Cascino, Gregory D. ;
Consalvo, Damian ;
Crumrine, Patricia ;
Devinsky, Orrin ;
Dlugos, Dennis ;
Epstein, Michael P. ;
Fiol, Miguel ;
Fountain, Nathan B. ;
French, Jacqueline ;
Friedman, Daniel ;
Geller, Eric B. ;
Glauser, Tracy ;
Glynn, Simon ;
Haut, Sheryl R. ;
Hayward, Jean ;
Helmers, Sandra L. .
NATURE, 2013, 501 (7466) :217-+
[5]   Guilt by association [J].
Altshuler, D ;
Daly, M ;
Kruglyak, L .
NATURE GENETICS, 2000, 26 (02) :135-137
[6]  
[Anonymous], 2020, Nucleic Acids Res, DOI [DOI 10.1093/NAR/GKAA1100, 10.1093/nar/gkac1052, DOI 10.1093/nar/gkh131]
[7]   The Universal Protein Resource (UniProt) in 2010 [J].
Apweiler, Rolf ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Antunes, Ricardo ;
Barrell, Daniel ;
Bely, Benoit ;
Bingley, Mark ;
Binns, David ;
Bower, Lawrence ;
Browne, Paul ;
Chan, Wei Mun ;
Dimmer, Emily ;
Eberhardt, Ruth ;
Fedotov, Alexander ;
Foulger, Rebecca ;
Garavelli, John ;
Huntley, Rachael ;
Jacobsen, Julius ;
Kleen, Michael ;
Laiho, Kati ;
Leinonen, Rasko ;
Legge, Duncan ;
Lin, Quan ;
Liu, Wudong ;
Luo, Jie ;
Orchard, Sandra ;
Patient, Samuel ;
Poggioli, Diego ;
Pruess, Manuela ;
Corbett, Matt ;
di Martino, Giuseppe ;
Donnelly, Mike ;
van Rensburg, Pieter ;
Bairoch, Amos ;
Bougueleret, Lydie ;
Xenarios, Ioannis ;
Altairac, Severine ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D142-D148
[8]   An overview of MetaMap: historical perspective and recent advances [J].
Aronson, Alan R. ;
Lang, Francois-Michel .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) :229-236
[9]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[10]   Exome sequencing as a tool for Mendelian disease gene discovery [J].
Bamshad, Michael J. ;
Ng, Sarah B. ;
Bigham, Abigail W. ;
Tabor, Holly K. ;
Emond, Mary J. ;
Nickerson, Deborah A. ;
Shendure, Jay .
NATURE REVIEWS GENETICS, 2011, 12 (11) :745-755