A robust clustering algorithm for identifying problematic samples in genome-wide association studies

被引:52
作者
Bellenguez, Celine [1 ]
Strange, Amy [1 ]
Freeman, Colin [1 ]
Donnelly, Peter [1 ,2 ]
Spencer, Chris C. A. [1 ]
机构
[1] Univ Oxford, Wellcome Trust Ctr Human Genet, Oxford OX3 7BN, England
[2] Univ Oxford, Dept Stat, Oxford OX1 3TG, England
基金
英国惠康基金;
关键词
D O I
10.1093/bioinformatics/btr599
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections.
引用
收藏
页码:134 / 135
页数:2
相关论文
共 4 条
  • [1] Genome-wide association study of ulcerative colitis identifies three new susceptibility loci, including the HNF4A region
    Barrett, Jeffrey C.
    Lee, James C.
    Lees, Charles W.
    Prescott, Natalie J.
    Anderson, Carl A.
    Phillips, Anne
    Wesley, Emma
    Parnell, Kirstie
    Zhang, Hu
    Drummond, Hazel
    Nimmo, Elaine R.
    Massey, Dunecan
    Blaszczyk, Kasia
    Elliott, Timothy
    Cotterill, Lynn
    Dallal, Helen
    Lobo, Alan J.
    Mowat, Craig
    Sanderson, Jeremy D.
    Jewell, Derek P.
    Newman, William G.
    Edwards, Cathryn
    Ahmad, Tariq
    Mansfield, John C.
    Satsangi, Jack
    Parkes, Miles
    Mathew, Christopher G.
    Donnelly, Peter
    Peltonen, Leena
    Blackwell, Jenefer M.
    Bramon, Elvira
    Brown, Matthew A.
    Casas, Juan P.
    Corvin, Aiden
    Craddock, Nicholas
    Deloukas, Panos
    Duncanson, Audrey
    Jankowski, Janusz
    Markus, Hugh S.
    McCarthy, Mark I.
    Palmer, Colin N. A.
    Plomin, Robert
    Rautanen, Anna
    Sawcer, Stephen J.
    Samani, Nilesh
    Trembath, Richard C.
    Viswanathan, Ananth C.
    Wood, Nicholas
    Spencer, Chris C. A.
    Bellenguez, Celine
    [J]. NATURE GENETICS, 2009, 41 (12) : 1330 - U99
  • [2] Maximum trimmed likelihood estimators: a unified approach, examples, and algorithms
    Hadi, AS
    Luceno, A
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1997, 25 (03) : 251 - 272
  • [3] Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis
    Sawcer, Stephen
    Hellenthal, Garrett
    Pirinen, Matti
    Spencer, Chris C. A.
    Patsopoulos, Nikolaos A.
    Moutsianas, Loukas
    Dilthey, Alexander
    Su, Zhan
    Freeman, Colin
    Hunt, Sarah E.
    Edkins, Sarah
    Gray, Emma
    Booth, David R.
    Potter, Simon C.
    Goris, An
    Band, Gavin
    Oturai, Annette Bang
    Strange, Amy
    Saarela, Janna
    Bellenguez, Celine
    Fontaine, Bertrand
    Gillman, Matthew
    Hemmer, Bernhard
    Gwilliam, Rhian
    Zipp, Frauke
    Jayakumar, Alagurevathi
    Martin, Roland
    Leslie, Stephen
    Hawkins, Stanley
    Giannoulatou, Eleni
    D'alfonso, Sandra
    Blackburn, Hannah
    Boneschi, Filippo Martinelli
    Liddle, Jennifer
    Harbo, Hanne F.
    Perez, Marc L.
    Spurkland, Anne
    Waller, Matthew J.
    Mycko, Marcin P.
    Ricketts, Michelle
    Comabella, Manuel
    Hammond, Naomi
    Kockum, Ingrid
    McCann, Owen T.
    Ban, Maria
    Whittaker, Pamela
    Kemppinen, Anu
    Weston, Paul
    Hawkins, Clive
    Widaa, Sara
    [J]. NATURE, 2011, 476 (7359) : 214 - 219
  • [4] A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1
    Strange, Amy
    Capon, Francesca
    Spencer, Chris C. A.
    Knight, Jo
    Weale, Michael E.
    Allen, Michael H.
    Barton, Anne
    Band, Gavin
    Bellenguez, Celine
    Bergboer, Judith G. M.
    Blackwell, Jenefer M.
    Bramon, Elvira
    Bumpstead, Suzannah J.
    Casas, Juan P.
    Cork, Michael J.
    Corvin, Aiden
    Deloukas, Panos
    Dilthey, Alexander
    Duncanson, Audrey
    Edkins, Sarah
    Estivill, Xavier
    Fitzgerald, Oliver
    Freeman, Colin
    Giardina, Emiliano
    Gray, Emma
    Hofer, Angelika
    Hueffmeier, Ulrike
    Hunt, Sarah E.
    Irvine, Alan D.
    Jankowski, Janusz
    Kirby, Brian
    Langford, Cordelia
    Lascorz, Jesus
    Leman, Joyce
    Leslie, Stephen
    Mallbris, Lotus
    Markus, Hugh S.
    Mathew, Christopher G.
    McLean, W. H. Irwin
    McManus, Ross
    Moessner, Rotraut
    Moutsianas, Loukas
    Naluai, Asa T.
    Nestle, Frank O.
    Novelli, Giuseppe
    Onoufriadis, Alexandros
    Palmer, Colin N. A.
    Perricone, Carlo
    Pirinen, Matti
    Plomin, Robert
    [J]. NATURE GENETICS, 2010, 42 (11) : 985 - U106