SNiPer-HD: improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays

被引:34
作者
Hua, Jianping
Craig, David W.
Brun, Marcel
Webster, Jennifer
Zismann, Victoria
Tembe, Waibhav
Joshipura, Keta
Huentelman, Matthew J.
Dougherty, Edward R.
Stephan, Dietrich A.
机构
[1] Translat Genom Res Inst, Computat Biol Div, Phoenix, AZ USA
[2] Translat Genom Res Inst, Neurogenom Div, Phoenix, AZ USA
[3] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX USA
关键词
D O I
10.1093/bioinformatics/btl536
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The technology to genotype single nucleotide polymorphisms (SNPs) at extremely high densities provides for hypothesis-free genome-wide scans for common polymorphisms associated with complex disease. However, we find that some errors introduced by commonly employed genotyping algorithms may lead to inflation of false associations between markers and phenotype. Results: We have developed a novel SNP genotype calling program, SNiPer-High Density (SNiPer-HD), for highly accurate genotype calling across hundreds of thousands of SNPs. The program employs an expectation-maximization (EM) algorithm with parameters based on a training sample set. The algorithm choice allows for highly accurate genotyping for most SNPs. Also, we introduce a quality control metric for each assayed SNP, such that poor-behaving SNPs can be filtered using a metric correlating to genotype class separation in the calling algorithm. SNiPer-HD is superior to the standard dynamic modeling algorithm and is complementary and non-redundant to other algorithms, such as BRLMM. Implementing multiple algorithms together may provide highly accurate genotyping calls, without inflation of false positives due to systematically miss-called SNPs. A reliable and accurate set of SNP genotypes for increasingly dense panels will eliminate some false association signals and false negative signals, allowing for rapid identification of disease susceptibility loci for complex traits.
引用
收藏
页码:57 / 63
页数:7
相关论文
共 13 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]   GAUSSIAN PARSIMONIOUS CLUSTERING MODELS [J].
CELEUX, G ;
GOVAERT, G .
PATTERN RECOGNITION, 1995, 28 (05) :781-793
[3]   Applications of whole-genome high-density SNP genotyping [J].
Craig, DW ;
Stephan, DA .
EXPERT REVIEW OF MOLECULAR DIAGNOSTICS, 2005, 5 (02) :159-170
[4]   High-throughput variation detection and genotyping using microarrays [J].
Cutler, DJ ;
Zwick, ME ;
Carrasquillo, MM ;
Yohn, CT ;
Tobin, KP ;
Kashuk, C ;
Mathews, DJ ;
Shah, NA ;
Eichler, EE ;
Warrington, JA ;
Chakravarti, A .
GENOME RESEARCH, 2001, 11 (11) :1913-1925
[5]   Dynamic model based algorithms for screening and genotyping over 100K SNPs on oligonucleotide microarrays [J].
Di, XJ ;
Matsuzaki, H ;
Webster, TA ;
Hubbell, E ;
Liu, GY ;
Dong, SL ;
Bartell, D ;
Huang, J ;
Chiles, R ;
Yang, G ;
Shen, MM ;
Kulp, D ;
Kennedy, GC ;
Mei, R ;
Jones, KW ;
Cawley, S .
BIOINFORMATICS, 2005, 21 (09) :1958-1963
[6]   How many clusters? Which clustering method? Answers via model-based cluster analysis [J].
Fraley, C ;
Raftery, AE .
COMPUTER JOURNAL, 1998, 41 (08) :578-588
[7]   SNiPer: Improved SNP genotype calling for affymetrix 10K GeneChip microarray data [J].
Huentelman, MJ ;
Craig, DW ;
Shieh, AD ;
Corneveaux, JJ ;
Hu-Lince, D ;
Pearson, JV ;
Stephan, DA .
BMC GENOMICS, 2005, 6 (1)
[8]   Complement factor H polymorphism in age-related macular degeneration [J].
Klein, RJ ;
Zeiss, C ;
Chew, EY ;
Tsai, JY ;
Sackler, RS ;
Haynes, C ;
Henning, AK ;
SanGiovanni, JP ;
Mane, SM ;
Mayne, ST ;
Bracken, MB ;
Ferris, FL ;
Ott, J ;
Barnstable, C ;
Hoh, J .
SCIENCE, 2005, 308 (5720) :385-389
[9]   Algorithms for large-scale genotyping microarrays [J].
Liu, WM ;
Di, XJ ;
Yang, G ;
Matsuzaki, H ;
Huang, J ;
Mei, R ;
Ryder, TB ;
Webster, TA ;
Dong, SL ;
Liu, GY ;
Jones, KW ;
Kennedy, GC ;
Kulp, D .
BIOINFORMATICS, 2003, 19 (18) :2397-2403
[10]   A genotype calling algorithm for affymetrix SNP arrays [J].
Rabbee, N ;
Speed, TP .
BIOINFORMATICS, 2006, 22 (01) :7-12