Inflated type I error rates when using aggregation methods to analyze rare variants in the 1000 Genomes Project exon sequencing data in unrelated individuals: summary results from Group 7 at Genetic Analysis Workshop 17

被引:14
作者
Tintle, Nathan [1 ]
Aschard, Hugues [2 ]
Hu, Inchi [3 ]
Nock, Nora [4 ]
Wang, Haitian [3 ]
Pugh, Elizabeth [5 ]
机构
[1] Dordt Coll, Dept Math Stat & Comp Sci, Sioux Ctr, IA 51250 USA
[2] Harvard Univ, Sch Publ Hlth, Program Mol & Genet Epidemiol, Boston, MA 02115 USA
[3] Hong Kong Univ Sci & Technol, Dept Informat Syst Business Stat & Operat Managem, Kowloon, Hong Kong, Peoples R China
[4] Case Western Reserve Univ, Dept Epidemiol & Biostat, Div Genet & Mol Epidemiol, Cleveland, OH 44106 USA
[5] Johns Hopkins Univ, Sch Med, Ctr Inherited Dis Res, Baltimore, MD USA
基金
美国国家卫生研究院;
关键词
population structure; correlated markers; next-generation sequencing; WIDE ASSOCIATION;
D O I
10.1002/gepi.20650
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
As part of Genetic Analysis Workshop 17 (GAW17), our group considered the application of novel and standard approaches to the analysis of genotype-phenotype association in next-generation sequencing data. Our group identified a major issue in the analysis of the GAW17 next-generation sequencing data: type I error and false-positive report probability rates higher than those expected based on empirical type I error levels (as high as 90%). Two main causes emerged: population stratification and long-range correlation (gametic phase disequilibrium) between rare variants. Population stratification was expected because of the diverse sample. Correlation between rare variants was attributable to both random causes (e.g., nearly 10,000 of 25,000 markers were private variants, and the sample size was small [n = 697]) and nonrandom causes (more correlation was observed than was expected by random chance). Principal components analysis was used to control for population structure and helped to minimize type I errors, but this was at the expense of identifying fewer causal variants. A novel multiple regression approach showed promise to handle correlation between markers. Further work is needed, first, to identify best practices for the control of type I errors in the analysis of sequencing data and then to explore and compare the many promising new aggregating approaches for identifying markers associated with disease phenotypes. Genet. Epidemiol. 35:S56S60, 2011. (C) 2011 Wiley Periodicals, Inc.
引用
收藏
页码:S56 / S60
页数:5
相关论文
共 21 条
[1]   Genetic Analysis Workshop 17 mini-exome simulation [J].
Laura Almasy ;
Thomas D Dyer ;
Juan Manuel Peralta ;
Jack W Kent ;
Jac C Charlesworth ;
Joanne E Curran ;
John Blangero .
BMC Proceedings, 5 (Suppl 9)
[2]   Combining effects from rare and common genetic variants in an exome-wide association study of sequence data [J].
Hugues Aschard ;
Weiliang Qiu ;
Bogdan Pasaniuc ;
Noah Zaitlen ;
Michael H Cho ;
Vincent Carey .
BMC Proceedings, 5 (Suppl 9)
[3]  
Dering C., 2011, Genet Epidemiol
[4]  
Hindorff L.A., 2011, A catalog of published genome-wide association studies
[5]   Pathway-based joint effects analysis of rare genetic variants using Genetic Analysis Workshop 17 exon sequence data [J].
Pingzhao Hu ;
Wei Xu ;
Lu Cheng ;
Xiang Xing ;
Andrew D Paterson .
BMC Proceedings, 5 (Suppl 9)
[6]   Quality Control and Quality Assurance in Genotypic Data for Genome-Wide Association Studies [J].
Laurie, Cathy C. ;
Doheny, Kimberly F. ;
Mirel, Daniel B. ;
Pugh, Elizabeth W. ;
Bierut, Laura J. ;
Bhangale, Tushar ;
Boehm, Frederick ;
Caporaso, Neil E. ;
Cornelis, Marilyn C. ;
Edenberg, Howard J. ;
Gabriel, Stacy B. ;
Harris, Emily L. ;
Hu, Frank B. ;
Jacobs, Kevin B. ;
Kraft, Peter ;
Landi, Maria Teresa ;
Lumley, Thomas ;
Manolio, Teri A. ;
McHugh, Caitlin ;
Painter, Ian ;
Paschall, Justin ;
Rice, John P. ;
Rice, Kenneth M. ;
Zheng, Xiuwen ;
Weir, Bruce S. .
GENETIC EPIDEMIOLOGY, 2010, 34 (06) :591-602
[7]   Methods for detecting associations with rare variants for common diseases: Application to analysis of sequence data [J].
Li, Bingshan ;
Leal, Suzanne M. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2008, 83 (03) :311-321
[8]   Large-scale risk prediction applied to Genetic Analysis Workshop 17 mini-exome sequence data [J].
Gengxin Li ;
John Ferguson ;
Wei Zheng ;
Joon Sang Lee ;
Xianghua Zhang ;
Lun Li ;
Jia Kang ;
Xiting Yan ;
Hongyu Zhao .
BMC Proceedings, 5 (Suppl 9)
[9]   Genome-Wide Association Scan Meta-Analysis Identifies Three Loci Influencing Adiposity and Fat Distribution [J].
Lindgren, Cecilia M. ;
Heid, Iris M. ;
Randall, Joshua C. ;
Lamina, Claudia ;
Steinthorsdottir, Valgerdur ;
Qi, Lu ;
Speliotes, Elizabeth K. ;
Thorleifsson, Gudmar ;
Willer, Cristen J. ;
Herrera, Blanca M. ;
Jackson, Anne U. ;
Lim, Noha ;
Scheet, Paul ;
Soranzo, Nicole ;
Amin, Najaf ;
Aulchenko, Yurii S. ;
Chambers, John C. ;
Drong, Alexander ;
Luan, Jian'an ;
Lyon, Helen N. ;
Rivadeneira, Fernando ;
Sanna, Serena ;
Timpson, Nicholas J. ;
Zillikens, M. Carola ;
Zhao, Jing Hua ;
Almgren, Peter ;
Bandinelli, Stefania ;
Bennett, Amanda J. ;
Bergman, Richard N. ;
Bonnycastle, Lori L. ;
Bumpstead, Suzannah J. ;
Chanock, Stephen J. ;
Cherkas, Lynn ;
Chines, Peter ;
Coin, Lachlan ;
Cooper, Cyrus ;
Crawford, Gabriel ;
Doering, Angela ;
Dominiczak, Anna ;
Doney, Alex S. F. ;
Ebrahim, Shah ;
Elliott, Paul ;
Erdos, Michael R. ;
Estrada, Karol ;
Ferrucci, Luigi ;
Fischer, Guido ;
Forouhi, Nita G. ;
Gieger, Christian ;
Grallert, Harald ;
Groves, Christopher J. .
PLOS GENETICS, 2009, 5 (06)
[10]   Evaluating methods for the analysis of rare variants in sequence data [J].
Alexander Luedtke ;
Scott Powers ;
Ashley Petersen ;
Alexandra Sitarik ;
Airat Bekmetjev ;
Nathan L Tintle .
BMC Proceedings, 5 (Suppl 9)