Data quality control in genetic case-control association studies

被引:943
作者
Anderson, Carl A. [1 ,2 ]
Pettersson, Fredrik H. [1 ]
Clarke, Geraldine M. [1 ]
Cardon, Lon R. [3 ]
Morris, Andrew P. [1 ]
Zondervan, Krina T. [1 ]
机构
[1] Univ Oxford, Wellcome Trust Ctr Human Genet, Genet & Genom Epidemiol Unit, Oxford, England
[2] Wellcome Trust Sanger Inst, Cambridge, England
[3] GlaxoSmithKline, King Of Prussia, PA USA
基金
英国惠康基金;
关键词
GENOME-WIDE ASSOCIATION; RISK LOCI; STRATIFICATION; TOOL;
D O I
10.1038/nprot.2010.116
中图分类号
Q5 [生物化学];
学科分类号
070307 [化学生物学];
摘要
This protocol details the steps for data quality assessment and control that are typically carried out during case-control association studies. The steps described involve the identification and removal of DNA samples and markers that introduce bias. These critical steps are paramount to the success of a case-control study and are necessary before statistically testing for association. We describe how to use PLINK, a tool for handling SNP data, to perform assessments of failure rate per individual and per SNP and to assess the degree of relatedness between individuals. We also detail other quality-control procedures, including the use of SMARTPCA software for the identification of ancestral outliers. These platforms were selected because they are user-friendly, widely used and computationally efficient. Steps needed to detect and establish a disease association using case-control data are not discussed here. Issues concerning study design and marker selection in case-control studies have been discussed in our earlier protocols. This protocol, which is routinely used in our labs, should take approximately 8 h to complete.
引用
收藏
页码:1564 / 1573
页数:10
相关论文
共 28 条
[1]
A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]
Investigation of Crohn's Disease Risk Loci in Ulcerative Colitis Further Defines Their Molecular Relationship [J].
Anderson, Carl A. ;
Massey, Dunecan C. O. ;
Barrett, Jeffrey C. ;
Prescott, Natalie J. ;
Tremelling, Mark ;
Fisher, Sheila A. ;
Gwilliam, Rhian ;
Jacob, Jemima ;
Nimmo, Elaine R. ;
Drummond, Hazel ;
Lees, Charlie W. ;
Onnie, Clive M. ;
Hanson, Catherine ;
Blaszczyk, Katarzyna ;
Ravindrarajah, Radhi ;
Hunt, Sarah ;
Varma, Dhiraj ;
Hammond, Naomi ;
Lewis, Gregory ;
Attlesey, Heather ;
Watkins, Nick ;
Ouwehand, Willem ;
Strachan, David ;
McArdle, Wendy ;
Lewis, Cathryn M. ;
Lobo, Alan ;
Sanderson, Jeremy ;
Jewell, Derek P. ;
Deloukas, Panos ;
Mansfield, John C. ;
Mathew, Christopher G. ;
Satsangi, Jack ;
Parkes, Miles .
GASTROENTEROLOGY, 2009, 136 (02) :523-529
[3]
GenABEL: an R library for genome-wide association analysis [J].
Aulchenko, Yurii S. ;
Ripke, Stephan ;
Isaacs, Aaron ;
Van Duijn, Cornelia M. .
BIOINFORMATICS, 2007, 23 (10) :1294-1296
[4]
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[5]
Demonstrating stratification in a European American population [J].
Campbell, CD ;
Ogburn, EL ;
Lunetta, KL ;
Lyon, HN ;
Freedman, ML ;
Groop, LC ;
Altshuler, D ;
Ardlie, KG ;
Hirschhorn, JN .
NATURE GENETICS, 2005, 37 (08) :868-872
[6]
Population stratification and spurious allelic association [J].
Cardon, LR ;
Palmer, LJ .
LANCET, 2003, 361 (9357) :598-604
[7]
Population structure, differential bias and genomic control in a large-scale, case-control association study [J].
Clayton, DG ;
Walker, NM ;
Smyth, DJ ;
Pask, R ;
Cooper, JD ;
Maier, LM ;
Smink, LJ ;
Lam, AC ;
Ovington, NR ;
Stevens, HE ;
Nutland, S ;
Howson, JMM ;
Faham, M ;
Moorhead, M ;
Jones, HB ;
Falkowski, M ;
Hardenbol, P ;
Willis, TD ;
Todd, JA .
NATURE GENETICS, 2005, 37 (11) :1243-1246
[8]
Genetic determinants of ulcerative colitis include the ECM1 locus and five loci implicated in Crohn's disease [J].
Fisher, Sheila A. ;
Tremelling, Mark ;
Anderson, Carl A. ;
Gwilliam, Rhian ;
Bumpstead, Suzannah ;
Prescott, Natalie J. ;
Nimmo, Elaine R. ;
Massey, Dunecan ;
Berzuini, Carlo ;
Johnson, Christopher ;
Barrett, Jeffrey C. ;
Cummings, Fraser R. ;
Drummond, Hazel ;
Lees, Charlie W. ;
Onnie, Clive M. ;
Hanson, Catherine E. ;
Blaszczyk, Katarzyna ;
Inouye, Mike ;
Ewels, Philip ;
Ravindrarajah, Radhi ;
Keniry, Andrew ;
Hunt, Sarah ;
Carter, Martyn ;
Watkins, Nick ;
Ouwehand, Willem ;
Lewis, Cathryn M. ;
Cardon, Lon ;
Lobo, Alan ;
Forbes, Alastair ;
Sanderson, Jeremy ;
Jewell, Derek P. ;
Mansfield, John C. ;
Deloukas, Panos ;
Mathew, Christopher G. ;
Parkes, Miles ;
Satsangi, Jack .
NATURE GENETICS, 2008, 40 (06) :710-712
[9]
The International HapMap Project [J].
Gibbs, RA ;
Belmont, JW ;
Hardenbol, P ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Ch'ang, LY ;
Huang, W ;
Liu, B ;
Shen, Y ;
Tam, PKH ;
Tsui, LC ;
Waye, MMY ;
Wong, JTF ;
Zeng, CQ ;
Zhang, QR ;
Chee, MS ;
Galver, LM ;
Kruglyak, S ;
Murray, SS ;
Oliphant, AR ;
Montpetit, A ;
Hudson, TJ ;
Chagnon, F ;
Ferretti, V ;
Leboeuf, M ;
Phillips, MS ;
Verner, A ;
Kwok, PY ;
Duan, SH ;
Lind, DL ;
Miller, RD ;
Rice, JP ;
Saccone, NL ;
Taillon-Miller, P ;
Xiao, M ;
Nakamura, Y ;
Sekine, A ;
Sorimachi, K ;
Tanaka, T ;
Tanaka, Y ;
Tsunoda, T ;
Yoshino, E ;
Bentley, DR ;
Deloukas, P ;
Hunt, S ;
Powell, D ;
Altshuler, D ;
Gabriel, SB ;
Qiu, RZ .
NATURE, 2003, 426 (6968) :789-796
[10]
A new multipoint method for genome-wide association studies by imputation of genotypes [J].
Marchini, Jonathan ;
Howie, Bryan ;
Myers, Simon ;
McVean, Gil ;
Donnelly, Peter .
NATURE GENETICS, 2007, 39 (07) :906-913