Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in Exome Sequencing Studies

被引:105
作者
Li, Miao-Xin [1 ,2 ,3 ]
Kwan, Johnny S. H. [1 ,4 ]
Bao, Su-Ying [5 ]
Yang, Wanling [6 ]
Ho, Shu-Leong [4 ]
Song, Yong-Qiang [5 ]
Sham, Pak C. [1 ,2 ,3 ,7 ]
机构
[1] Univ Hong Kong, Dept Psychiat, Pokfulam, Hong Kong, Peoples R China
[2] Univ Hong Kong, Ctr Reprod Dev & Growth, Pokfulam, Hong Kong, Peoples R China
[3] Univ Hong Kong, Ctr Genom Sci, Pokfulam, Hong Kong, Peoples R China
[4] Univ Hong Kong, Dept Med, Pokfulam, Hong Kong, Peoples R China
[5] Univ Hong Kong, Dept Biochem, Pokfulam, Hong Kong, Peoples R China
[6] Univ Hong Kong, Dept Paediat & Adolescent Med, Pokfulam, Hong Kong, Peoples R China
[7] Univ Hong Kong, State Key Lab Cognit & Brain Sci, Pokfulam, Hong Kong, Peoples R China
来源
PLOS GENETICS | 2013年 / 9卷 / 01期
关键词
CONSANGUINEOUS MARRIAGES; DELETERIOUS MUTATIONS; FUNCTIONAL ANNOTATION; GENOME ANALYSIS; GENES; DOMINANT; MODIFIER; GALAXY; LEVEL; SCORE;
D O I
10.1371/journal.pgen.1003143
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Exome sequencing is becoming a standard tool for mapping Mendelian disease-causing (or pathogenic) non-synonymous single nucleotide variants (nsSNVs). Minor allele frequency (MAF) filtering approach and functional prediction methods are commonly used to identify candidate pathogenic mutations in these studies. Combining multiple functional prediction methods may increase accuracy in prediction. Here, we propose to use a logit model to combine multiple prediction methods and compute an unbiased probability of a rare variant being pathogenic. Also, for the first time we assess the predictive power of seven prediction methods (including SIFT, PolyPhen2, CONDEL, and logit) in predicting pathogenic nsSNVs from other rare variants, which reflects the situation after MAF filtering is done in exome-sequencing studies. We found that a logit model combining all or some original prediction methods outperforms other methods examined, but is unable to discriminate between autosomal dominant and autosomal recessive disease mutations. Finally, based on the predictions of the logit model, we estimate that an individual has around 5% of rare nsSNVs that are pathogenic and carries,22 pathogenic derived alleles at least, which if made homozygous by consanguineous marriages may lead to recessive diseases.
引用
收藏
页数:11
相关论文
共 43 条
[1]   A method and server for predicting damaging missense mutations [J].
Adzhubei, Ivan A. ;
Schmidt, Steffen ;
Peshkin, Leonid ;
Ramensky, Vasily E. ;
Gerasimova, Anna ;
Bork, Peer ;
Kondrashov, Alexey S. ;
Sunyaev, Shamil R. .
NATURE METHODS, 2010, 7 (04) :248-249
[2]  
[Anonymous], 2006, P 23 INT C MACH LEAR
[3]   THE COSTS OF HUMAN INBREEDING AND THEIR IMPLICATIONS FOR VARIATIONS AT THE DNA LEVEL [J].
BITTLES, AH ;
NEEL, JV .
NATURE GENETICS, 1994, 8 (02) :117-121
[4]   Identification of deleterious mutations within three human genomes [J].
Chun, Sung ;
Fay, Justin C. .
GENOME RESEARCH, 2009, 19 (09) :1553-1561
[5]   Distribution and intensity of constraint in mammalian genomic sequence [J].
Cooper, GM ;
Stone, EA ;
Asimenos, G ;
Green, ED ;
Batzoglou, S ;
Sidow, A .
GENOME RESEARCH, 2005, 15 (07) :901-913
[6]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158
[7]   Differences in the evolutionary history of disease genes affected by dominant or recessive mutations [J].
Furney, Simon J. ;
Alba, M. Mar ;
Lopez-Bigas, Nuria .
BMC GENOMICS, 2006, 7 (1)
[8]   SVA: software for annotating and visualizing sequenced human genomes [J].
Ge, Dongliang ;
Ruzzo, Elizabeth K. ;
Shianna, Kevin V. ;
He, Min ;
Pelak, Kimberly ;
Heinzen, Erin L. ;
Need, Anna C. ;
Cirulli, Elizabeth T. ;
Maia, Jessica M. ;
Dickson, Samuel P. ;
Zhu, Mingfu ;
Singh, Abanish ;
Allen, Andrew S. ;
Goldstein, David B. .
BIOINFORMATICS, 2011, 27 (14) :1998-2000
[9]   Galaxy: A platform for interactive large-scale genome analysis [J].
Giardine, B ;
Riemer, C ;
Hardison, RC ;
Burhans, R ;
Elnitski, L ;
Shah, P ;
Zhang, Y ;
Blankenberg, D ;
Albert, I ;
Taylor, J ;
Miller, W ;
Kent, WJ ;
Nekrutenko, A .
GENOME RESEARCH, 2005, 15 (10) :1451-1455
[10]   Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences [J].
Goecks, Jeremy ;
Nekrutenko, Anton ;
Taylor, James .
GENOME BIOLOGY, 2010, 11 (08)