Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips

被引:42
作者
Ritchie, Matthew E. [1 ,3 ]
Liu, Ruijie [1 ]
Carvalho, Benilton S. [4 ]
Irizarry, Rafael A. [2 ]
机构
[1] Walter & Eliza Hall Inst Med Res, Bioinformat Div, Parkville, Vic 3052, Australia
[2] Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA
[3] Univ Melbourne, Dept Med Biol, Parkville, Vic 3010, Australia
[4] Univ Cambridge, Dept Oncol, CRUK Cambridge Res Inst, Li Ka Shing Ctr, Cambridge CB2 0RE, England
来源
BMC BIOINFORMATICS | 2011年 / 12卷
基金
澳大利亚国家健康与医学研究理事会;
关键词
SOFTWARE;
D O I
10.1186/1471-2105-12-68
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Illumina's Infinium SNP BeadChips are extensively used in both small and large-scale genetic studies. A fundamental step in any analysis is the processing of raw allele A and allele B intensities from each SNP into genotype calls (AA, AB, BB). Various algorithms which make use of different statistical models are available for this task. We compare four methods (GenCall, Illuminus, GenoSNP and CRLMM) on data where the true genotypes are known in advance and data from a recently published genome-wide association study. Results: In general, differences in accuracy are relatively small between the methods evaluated, although CRLMM and GenoSNP were found to consistently outperform GenCall. The performance of Illuminus is heavily dependent on sample size, with lower no call rates and improved accuracy as the number of samples available increases. For X chromosome SNPs, methods with sex-dependent models (Illuminus, CRLMM) perform better than methods which ignore gender information (GenCall, GenoSNP). We observe that CRLMM and GenoSNP are more accurate at calling SNPs with low minor allele frequency than GenCall or Illuminus. The sample quality metrics from each of the four methods were found to have a high level of agreement at flagging samples with unusual signal characteristics. Conclusions: CRLMM, GenoSNP and GenCall can be applied with confidence in studies of any size, as their performance was shown to be invariant to the number of samples available. Illuminus on the other hand requires a larger number of samples to achieve comparable levels of accuracy and its use in smaller studies (50 or fewer individuals) is not recommended.
引用
收藏
页数:12
相关论文
共 19 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]  
[Anonymous], 2010, R LANG ENV STAT COMP
[3]  
[Anonymous], VERS 21 NCBI BUILD 3
[4]   Saliva-Derived DNA Performs Well in Large-Scale, High-Density Single-Nucleotide Polymorphism Microarray Studies [J].
Bahlo, Melanie ;
Stankovich, Jim ;
Danoy, Patrick ;
Hickey, Peter F. ;
Taylor, Bruce V. ;
Browning, Sharon R. ;
Brown, Matthew A. ;
Rubio, Justin P. .
CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2010, 19 (03) :794-798
[5]   Genome-wide association study identifies new multiple sclerosis susceptibility loci on chromosomes 12 and 20 [J].
Bahlo, Melanie ;
Booth, David R. ;
Broadley, Simon A. ;
Brown, Matthew A. ;
Foote, Simon J. ;
Griffiths, Lyn R. ;
Kilpatrick, Trevor J. ;
Lechner-Scott, Jeanette ;
Moscato, Pablo ;
Perreau, Victoria M. ;
Rubio, Justin P. ;
Scott, Rodney J. ;
Stankovich, Jim ;
Stewart, Graeme J. ;
Taylor, Bruce V. ;
Wiley, James ;
Clarke, Glynnis ;
Cox, Mathew B. ;
Csurhes, Peter A. ;
Danoy, Patrick ;
Drysdale, Karen ;
Field, Judith ;
Foote, Simon J. ;
Greer, Judith M. ;
Guru, Preethi ;
Hadler, Johanna ;
McMorran, Brendan J. ;
Jensen, Cathy J. ;
Johnson, Laura J. ;
McCallum, Ruth ;
Merriman, Marilyn ;
Merriman, Tony ;
Pryce, Karen ;
Tajouri, Lotfi ;
Wilkins, Ella J. ;
Browning, Brian L. ;
Browning, Sharon R. ;
Perera, Devindri ;
Butzkueven, Helmut ;
Carroll, William M. ;
Chapman, Caron ;
Kermode, Allan G. ;
Marriott, Mark ;
Mason, Deborah ;
Heard, Robert N. ;
Pender, Michael P. ;
Slee, Mark ;
Tubridy, Niall ;
Willoughby, Ernest .
NATURE GENETICS, 2009, 41 (07) :824-U84
[6]   Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies [J].
Browning, Brian L. ;
Yu, Zhaoxia .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 85 (06) :847-861
[7]   Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data [J].
Carvalho, Benilton ;
Bengtsson, Henrik ;
Speed, Terence P. ;
Irizarry, Rafael A. .
BIOSTATISTICS, 2007, 8 (02) :485-499
[8]   Quantifying uncertainty in genotype calls [J].
Carvalho, Benilton S. ;
Louis, Thomas A. ;
Irizarry, Rafael A. .
BIOINFORMATICS, 2010, 26 (02) :242-249
[9]   A second generation human haplotype map of over 3.1 million SNPs [J].
Frazer, Kelly A. ;
Ballinger, Dennis G. ;
Cox, David R. ;
Hinds, David A. ;
Stuve, Laura L. ;
Gibbs, Richard A. ;
Belmont, John W. ;
Boudreau, Andrew ;
Hardenbol, Paul ;
Leal, Suzanne M. ;
Pasternak, Shiran ;
Wheeler, David A. ;
Willis, Thomas D. ;
Yu, Fuli ;
Yang, Huanming ;
Zeng, Changqing ;
Gao, Yang ;
Hu, Haoran ;
Hu, Weitao ;
Li, Chaohua ;
Lin, Wei ;
Liu, Siqi ;
Pan, Hao ;
Tang, Xiaoli ;
Wang, Jian ;
Wang, Wei ;
Yu, Jun ;
Zhang, Bo ;
Zhang, Qingrun ;
Zhao, Hongbin ;
Zhao, Hui ;
Zhou, Jun ;
Gabriel, Stacey B. ;
Barry, Rachel ;
Blumenstiel, Brendan ;
Camargo, Amy ;
Defelice, Matthew ;
Faggart, Maura ;
Goyette, Mary ;
Gupta, Supriya ;
Moore, Jamie ;
Nguyen, Huy ;
Onofrio, Robert C. ;
Parkin, Melissa ;
Roy, Jessica ;
Stahl, Erich ;
Winchester, Ellen ;
Ziaugra, Liuda ;
Altshuler, David ;
Shen, Yan .
NATURE, 2007, 449 (7164) :851-U3
[10]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)