Finding haplotype tagging SNPs by use of principal components analysis

被引:86
作者
Lin, Z [1 ]
Altman, RB [1 ]
机构
[1] Stanford Univ, Sch Med, Dept Genet, Stanford, CA 94305 USA
关键词
D O I
10.1086/425587
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The immense volume and rapid growth of human genomic data, especially single nucleotide polymorphisms (SNPs), present special challenges for both biomedical researchers and automatic algorithms. One such challenge is to select an optimal subset of SNPs, commonly referred as "haplotype tagging SNPs" (htSNPs), to capture most of the haplotype diversity of each haplotype block or gene-specific region. This information-reduction process facilitates cost-effective genotyping and, subsequently, genotype-phenotype association studies. It also has implications for assessing the risk of identifying research subjects on the basis of SNP information deposited in public domain databases. We have investigated methods for selecting htSNPs by use of principal components analysis (PCA). These methods first identify eigenSNPs and then map them to actual SNPs. We evaluated two mapping strategies, greedy discard and varimax rotation, by assessing the ability of the selected htSNPs to reconstruct genotypes of non-htSNPs. We also compared these methods with two other htSNP finders, one of which is PCA based. We applied these methods to three experimental data sets and found that the PCA-based methods tend to select the smallest set of htSNPs to achieve a 90% reconstruction precision.
引用
收藏
页码:850 / 861
页数:12
相关论文
共 39 条
[1]   Finding haplotype block boundaries by using the minimum-description-length principle [J].
Anderson, EC ;
Novembre, J .
AMERICAN JOURNAL OF HUMAN GENETICS, 2003, 73 (02) :336-354
[2]  
[Anonymous], 1979, Multivariate analysis
[3]  
ASCHER D, 2001, NUMERICAL PHYTHON
[4]  
Avi-Itzhak Hadar I, 2003, Pac Symp Biocomput, P466
[5]  
BAFNA V, 2003, ANN C RES COMP MOL B
[6]   Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium [J].
Carlson, CS ;
Eberle, MA ;
Rieder, MJ ;
Yi, Q ;
Kruglyak, L ;
Nickerson, DA .
AMERICAN JOURNAL OF HUMAN GENETICS, 2004, 74 (01) :106-120
[7]   Detecting disease associations due to linkage disequilibrium using haplotype tags: A class of tests and the determinants of statistical power [J].
Chapman, JM ;
Cooper, JD ;
Todd, JA ;
Clayton, DG .
HUMAN HEREDITY, 2003, 56 (1-3) :18-31
[8]   Variations on a theme: Cataloging human DNA sequence variation [J].
Collins, FS ;
Guyer, MS ;
Chakravarti, A .
SCIENCE, 1997, 278 (5343) :1580-1581
[9]   High-resolution haplotype structure in the human genome [J].
Daly, MJ ;
Rioux, JD ;
Schaffner, SE ;
Hudson, TJ ;
Lander, ES .
NATURE GENETICS, 2001, 29 (02) :229-232
[10]  
Dunteman G. H., 1989, PRINCIPAL COMPONENTS