Evaluating methods for the analysis of rare variants in sequence data

被引:14
作者
Alexander Luedtke
Scott Powers
Ashley Petersen
Alexandra Sitarik
Airat Bekmetjev
Nathan L Tintle
机构
[1] Brown University,Division of Applied Mathematics
[2] University of North Carolina,Department of Statistics and Operations Research
[3] St. Olaf College,Departments of Mathematics, Computer Science, and Statistics
[4] Wittenberg University,Department of Mathematics
[5] Computer Science and Statistics,Department of Mathematics
[6] Dordt College,undefined
关键词
Minor Allele Frequency; Population Stratification; Causal SNPs; Nonsynonymous SNPs; Simulated Phenotype;
D O I
10.1186/1753-6561-5-S9-S119
中图分类号
学科分类号
摘要
A number of rare variant statistical methods have been proposed for analysis of the impending wave of next-generation sequencing data. To date, there are few direct comparisons of these methods on real sequence data. Furthermore, there is a strong need for practical advice on the proper analytic strategies for rare variant analysis. We compare four recently proposed rare variant methods (combined multivariate and collapsing, weighted sum, proportion regression, and cumulative minor allele test) on simulated phenotype and next-generation sequencing data as part of Genetic Analysis Workshop 17. Overall, we find that all analyzed methods have serious practical limitations on identifying causal genes. Specifically, no method has more than a 5% true discovery rate (percentage of truly causal genes among all those identified as significantly associated with the phenotype). Further exploration shows that all methods suffer from inflated false-positive error rates (chance that a noncausal gene will be identified as associated with the phenotype) because of population stratification and gametic phase disequilibrium between noncausal SNPs and causal SNPs. Furthermore, observed true-positive rates (chance that a truly causal gene will be identified as significantly associated with the phenotype) for each of the four methods was very low (<19%). The combination of larger than anticipated false-positive rates, low true-positive rates, and only about 1% of all genes being causal yields poor discriminatory ability for all four methods. Gametic phase disequilibrium and population stratification are important areas for further research in the analysis of rare variant data.
引用
收藏
相关论文
共 23 条
[1]  
Madsen BE(2009)A groupwise association test for rare mutations using a weighted sum statistic PLoS Genet 5 e1000384-321
[2]  
Browning SR(2008)Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data Am J Hum Genet 83 311-193
[3]  
Li B(2010)An evaluation of statistical approaches to rare variant analysis in genetic association studies Genet Epidemiol 34 188-617
[4]  
Leal SM(2010)Extending rare variant testing strategies: analysis of non-coding sequence and imputed genotypes Am J Hum Genet 87 604-17
[5]  
Morris A(2011)Genetic Analysis Workshop 17 mini-exome simulation BMC Proc 5 S2-852
[6]  
Zeggini E(2011)Statistical analysis of rare sequence variants: an overview of collapsing methods Genet Epidemiol 3 12-undefined
[7]  
Zawistowski M(1988)On measures of gametic disequilibrium Genetics 120 849-undefined
[8]  
Gopalakrishnan S(undefined)undefined undefined undefined undefined-undefined
[9]  
Ding J(undefined)undefined undefined undefined undefined-undefined
[10]  
Li Y(undefined)undefined undefined undefined undefined-undefined