Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows

被引:61
作者
Roberts, Adam
McMillan, Leonard [1 ]
Wang, Wei
Parker, Joel
Rusyn, Ivan
Threadgill, David
机构
[1] Univ N Carolina, Dept Comp Sci, Chapel Hill, NC 27599 USA
[2] Constella Grp, Durham, NC 27713 USA
[3] Univ N Carolina, Dept Environm Sci & Engn, Chapel Hill, NC 27599 USA
[4] Univ N Carolina, Dept Genet, Chapel Hill, NC 27599 USA
关键词
D O I
10.1093/bioinformatics/btm220
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Typical high-throughput genotyping techniques produce numerous missing calls that confound subsequent analyses, such as disease association studies. Common remedies for this problem include removing affected markers and/or samples or, otherwise, imputing the missing data. On small marker sets imputation is frequently based on a vote of the K-nearest-neighbor (KNN) haplotypes, but this technique is neither practical nor justifiable for large datasets. Results: We describe a data structure that supports efficient KNN queries over arbitrarily sized, sliding haplotype windows, and evaluate its use for genotype imputation. The performance of our method enables exhaustive exploration over all window sizes and known sites in large (150K, 8.3M) SNP panels. We also compare the accuracy and performance of our methods with competing imputation approaches.
引用
收藏
页码:I401 / I407
页数:7
相关论文
共 17 条
[1]   Imputation methods to improve inference in SNP association studies [J].
Dai, James Y. ;
Ruczinski, Ingo ;
LeBlanc, Michael ;
Kooperberg, Charles .
GENETIC EPIDEMIOLOGY, 2006, 30 (08) :690-702
[2]  
Eskin Eleazar, 2003, J Bioinform Comput Biol, V1, P1, DOI 10.1142/S0219720003000174
[3]   SNiPer: Improved SNP genotype calling for affymetrix 10K GeneChip microarray data [J].
Huentelman, MJ ;
Craig, DW ;
Shieh, AD ;
Corneveaux, JJ ;
Hu-Lince, D ;
Pearson, JV ;
Stephan, DA .
BMC GENOMICS, 2005, 6 (1)
[4]  
KANG SJ, 2004, PAC S BIOC, V9, P116
[5]   Haplotype inference in random population samples [J].
Lin, S ;
Cutler, DJ ;
Zwick, ME ;
Chakravarti, A .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 71 (05) :1129-1137
[6]   A comparison of phasing algorithms for trios and unrelated individuals [J].
Marchini, J ;
Cutler, D ;
Patterson, N ;
Stephens, M ;
Eskin, E ;
Halperin, E ;
Lin, S ;
Qin, ZS ;
Munro, HM ;
Abecasis, GR ;
Donnelly, P .
AMERICAN JOURNAL OF HUMAN GENETICS, 2006, 78 (03) :437-450
[7]   Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms [J].
Niu, TH ;
Qin, ZHS ;
Xu, XP ;
Liu, JS .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 70 (01) :157-169
[8]   Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms [J].
Qin, ZHS ;
Niu, TH ;
Liu, JS .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 71 (05) :1242-1247
[9]   FORMALIZING SUBJECTIVE NOTIONS ABOUT EFFECT OF NON-RESPONDENTS IN SAMPLE-SURVEYS [J].
RUBIN, DB .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1977, 72 (359) :538-543
[10]   A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase [J].
Scheet, P ;
Stephens, M .
AMERICAN JOURNAL OF HUMAN GENETICS, 2006, 78 (04) :629-644