Haplotype block partitioning and tag SNP selection using genotype data and their applications to associate studies

被引:122
作者
Zhang, K
Qin, ZHS
Liu, JS
Chen, T
Waterman, MS
Sun, FZ [1 ]
机构
[1] Univ So Calif, Dept Sci Biol, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
[2] Univ Alabama Birmingham, Dept Biostat, Sect Stat Genet, Birmingham, AL 35294 USA
[3] Univ Michigan, Dept Biostat, Ctr Stat Genet, Ann Arbor, MI 48109 USA
[4] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
关键词
D O I
10.1101/gr.1837404
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Recent studies have revealed that linkage disequilibrium [LD) patterns vary across the human genome with some regions of high LD interspersed by regions of low LD. A small fraction of SNPs (tag SNPs) is sufficient to capture most of the haplotype structure of the human genome. In this paper, we develop a method to partition haplotypes into blocks and to identify tag SNPs based oil genotype data by combining a dynamic programming algorithm for haplotype block partitioning and tag SNP selection based on haplotype data with a variation of the expectation maximization [EM) algorithm for haplotype inference. We assess the effects of using either haplotype or genotype data in haplotype block identification and tag SNP selection as a function of several factors, including sample size, density or number of SNPs studied, allele frequencies, fraction of missing data, and genotyping error rate, using extensive simulations. We find that a modest number of haplotype or genotype samples will result in consistent block partitions and tag SNP selection. The power of association studies based oil tag SNPs using genotype data is similar to that using haplotype data.
引用
收藏
页码:908 / 916
页数:9
相关论文
共 54 条
[1]   GOLD - Graphical Overview of Linkage Disequilibrium [J].
Abecasis, GR ;
Cookson, WOC .
BIOINFORMATICS, 2000, 16 (02) :182-183
[2]   Finding haplotype block boundaries by using the minimum-description-length principle [J].
Anderson, EC ;
Novembre, J .
AMERICAN JOURNAL OF HUMAN GENETICS, 2003, 73 (02) :336-354
[3]  
[Anonymous], RECOMB ANN INT C RES
[4]  
Cardon LR, 2003, AM J HUM GENET, V73, P216
[5]  
CLARK AG, 1990, MOL BIOL EVOL, V7, P111
[6]   High-resolution haplotype structure in the human genome [J].
Daly, MJ ;
Rioux, JD ;
Schaffner, SE ;
Hudson, TJ ;
Lander, ES .
NATURE GENETICS, 2001, 29 (02) :229-232
[7]   A first-generation linkage disequilibrium map of human chromosome 22 [J].
Dawson, E ;
Abecasis, GR ;
Bumpstead, S ;
Chen, Y ;
Hunt, S ;
Beare, DM ;
Pabial, J ;
Dibling, T ;
Tinsley, E ;
Kirby, S ;
Carter, D ;
Papaspyridonos, M ;
Livingstone, S ;
Ganske, R ;
Lohmmussaar, E ;
Zernant, J ;
Tonisson, N ;
Remm, M ;
Mägi, R ;
Puurand, T ;
Vilo, J ;
Kurg, A ;
Rice, K ;
Deloukas, P ;
Mott, R ;
Metspalu, A ;
Bentley, DR ;
Cardon, LR ;
Dunham, I .
NATURE, 2002, 418 (6897) :544-548
[8]   Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies [J].
Douglas, JA ;
Boehnke, M ;
Gillanders, E ;
Trent, JA ;
Gruber, SB .
NATURE GENETICS, 2001, 28 (04) :361-364
[9]   The extent of linkage disequilibrium in four populations with distinct demographic histories [J].
Dunning, AM ;
Durocher, F ;
Healey, CS ;
Teare, MD ;
McBride, SE ;
Carlomagno, F ;
Xu, CF ;
Dawson, E ;
Rhodes, S ;
Ueda, S ;
Lai, E ;
Luben, RN ;
Van Rensburg, EJ ;
Mannermaa, A ;
Kataja, V ;
Rennart, G ;
Dunham, I ;
Purvis, I ;
Easton, D ;
Ponder, BAJ .
AMERICAN JOURNAL OF HUMAN GENETICS, 2000, 67 (06) :1544-1554
[10]   Long-range sequence composition mirrors linkage disequilibrium pattern in a 1.13 Mb region of human chromosome 22 [J].
Eisenbarth, I ;
Striebel, AM ;
Moschgath, E ;
Vogel, W ;
Assum, G .
HUMAN MOLECULAR GENETICS, 2001, 10 (24) :2833-2839