Reconstructing ancestral haplotypes with a dictionary model

被引:3
作者
Ayers, Kristin L.
Sabatti, Chiara
Lange, Kenneth
机构
[1] Univ Calif Los Angeles, Sch Med, Dept Human Genet, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Dept Biomath, Los Angeles, CA 90095 USA
[3] Univ Calif Los Angeles, Dept Stat, Los Angeles, CA 90095 USA
关键词
linkage disequilibrium; haplotype blocks; minimum description length; forward and backwards algorithms; EM algorithm;
D O I
10.1089/cmb.2006.13.767
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We propose a dictionary model for haplotypes. According to the model, a haplotype is constructed by randomly concatenating haplotype segments from a given dictionary of segments. A haplotype block is defined as a set of haplotype segments that begin and end with the same pair of markers. In this framework, haplotype blocks can overlap, and the model provides a setting for testing the accuracy of simpler models invoking only nonoverlapping blocks. Each haplotype segment in a dictionary has an assigned probability and alternate spellings that account for genotyping errors and mutation. The model also allows for missing data, unphased genotypes, and prior distribution of parameters. Likelihood evaluations rely on forward and backward recurrences similar to the ones encountered in hidden Markov models. Parameter estimation is carried out with an EM algorithm. The search for the optimal dictionary is particularly difficult because of the variable dimension of the model space. We define a minimum description length criteria to evaluate each dictionary and use a combination of greedy search and careful initialization to select a best dictionary for a given dataset. Application of the model to simulated data gives encouraging results. In a real dataset, we are able to reconstruct a parsimonious dictionary that captures patterns of linkage disequilibrium well.
引用
收藏
页码:767 / 785
页数:19
相关论文
共 37 条
[1]   Haplotypes vs single marker linkage disequilibrium tests:: what do we gain? (Reprinted European Journal of Human Genetics, Vol 4, pg 291-300, 2001) [J].
Akey, Joshua ;
Jin, Li ;
Xiong, Momiao .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2017, 25 :S51-S58
[2]   Finding haplotype block boundaries by using the minimum-description-length principle [J].
Anderson, EC ;
Novembre, J .
AMERICAN JOURNAL OF HUMAN GENETICS, 2003, 73 (02) :336-354
[3]   Haplotyping as perfect phylogeny: A direct approach [J].
Bafna, V ;
Gusfield, D ;
Lancia, G ;
Yooseph, S .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (3-4) :323-340
[4]   Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis [J].
Bussemaker, HJ ;
Li, H ;
Siggia, ED .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10096-10100
[5]   Using haplotype blocks to map human complex trait loci [J].
Cardon, LR ;
Abecasis, GR .
TRENDS IN GENETICS, 2003, 19 (03) :135-140
[6]  
Chapman NH, 2002, GENETICS, V162, P449
[7]   High-resolution patterns of meiotic recombination across the human major histocompatibility complex [J].
Cullen, M ;
Perfetto, SP ;
Klitz, W ;
Nelson, G ;
Carrington, M .
AMERICAN JOURNAL OF HUMAN GENETICS, 2002, 71 (04) :759-776
[8]   High-resolution haplotype structure in the human genome [J].
Daly, MJ ;
Rioux, JD ;
Schaffner, SE ;
Hudson, TJ ;
Lander, ES .
NATURE GENETICS, 2001, 29 (02) :229-232
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]  
Eskin Eleazar, 2003, J Bioinform Comput Biol, V1, P1, DOI 10.1142/S0219720003000174