2SNP: Scalable phasing method for trios and unrelated individuals

被引:8
作者
Brinza, Dumitru [1 ]
Zelikovsky, Alexander [1 ]
机构
[1] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30303 USA
基金
美国国家科学基金会;
关键词
SNP; genotype; haplotype; phasing; algorithm;
D O I
10.1109/TCBB.2007.1068
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Emerging microarray technologies allow affordable typing of very long genome sequences. A key challenge in analyzing such a huge amount of data is scalable and accurate computational inferring of haplotypes (that is, splitting of each genotype into a pair of corresponding haplotypes). In this paper, we first phase genotypes consisting only of two SNPs using genotypes frequencies adjusted to the random mating model and then extend the phasing of two-SNP genotypes to the phasing of complete genotypes using maximum spanning trees. The runtime of the proposed 2SNP algorithm is O(nm(n + log m), where n and m are the numbers of genotypes and SNPs, respectively, and it can handle genotypes spanning the entire chromosomes in a matter of hours. On data sets across 23 chromosomal regions from HapMap [ 11], 2SNP is several orders of magnitude faster than GERBIL and PHASE when matching them in quality measured by the number of correctly phased genotypes, single-site, and switching errors. For example, the 2SNP software phases the entire chromosome (10(5) SNPs from HapMap) for 30 individuals in 2 hours with an average switching error of 7.7 percent. We have also enhanced the 2SNP algorithm to phase family trio data and compared it with four other well-known phasing methods on simulated data from [15]. 2SNP is much faster than all of them while losing in quality only to PHASE. 2SNP software is publicly available at http://alla.cs.gsu.edu/similar to software/2SNP.
引用
收藏
页码:313 / 318
页数:6
相关论文
共 20 条
[1]   2SNP: scalable phasing based on 2-SNP haplotypes [J].
Brinza, D ;
Zelikovsky, A .
BIOINFORMATICS, 2006, 22 (03) :371-373
[2]  
BRINZA D, 2006, P INT WORKSH BIOINF, P767
[3]  
CLARK AG, 1990, MOL BIOL EVOL, V7, P111
[4]   High-resolution haplotype structure in the human genome [J].
Daly, MJ ;
Rioux, JD ;
Schaffner, SE ;
Hudson, TJ ;
Lander, ES .
NATURE GENETICS, 2001, 29 (02) :229-232
[5]   A second generation human haplotype map of over 3.1 million SNPs [J].
Frazer, Kelly A. ;
Ballinger, Dennis G. ;
Cox, David R. ;
Hinds, David A. ;
Stuve, Laura L. ;
Gibbs, Richard A. ;
Belmont, John W. ;
Boudreau, Andrew ;
Hardenbol, Paul ;
Leal, Suzanne M. ;
Pasternak, Shiran ;
Wheeler, David A. ;
Willis, Thomas D. ;
Yu, Fuli ;
Yang, Huanming ;
Zeng, Changqing ;
Gao, Yang ;
Hu, Haoran ;
Hu, Weitao ;
Li, Chaohua ;
Lin, Wei ;
Liu, Siqi ;
Pan, Hao ;
Tang, Xiaoli ;
Wang, Jian ;
Wang, Wei ;
Yu, Jun ;
Zhang, Bo ;
Zhang, Qingrun ;
Zhao, Hongbin ;
Zhao, Hui ;
Zhou, Jun ;
Gabriel, Stacey B. ;
Barry, Rachel ;
Blumenstiel, Brendan ;
Camargo, Amy ;
Defelice, Matthew ;
Faggart, Maura ;
Goyette, Mary ;
Gupta, Supriya ;
Moore, Jamie ;
Nguyen, Huy ;
Onofrio, Robert C. ;
Parkin, Melissa ;
Roy, Jessica ;
Stahl, Erich ;
Winchester, Ellen ;
Ziaugra, Liuda ;
Altshuler, David ;
Shen, Yan .
NATURE, 2007, 449 (7164) :851-U3
[6]   The structure of haplotype blocks in the human genome [J].
Gabriel, SB ;
Schaffner, SF ;
Nguyen, H ;
Moore, JM ;
Roy, J ;
Blumenstiel, B ;
Higgins, J ;
DeFelice, M ;
Lochner, A ;
Faggart, M ;
Liu-Cordero, SN ;
Rotimi, C ;
Adeyemo, A ;
Cooper, R ;
Ward, R ;
Lander, ES ;
Daly, MJ ;
Altshuler, D .
SCIENCE, 2002, 296 (5576) :2225-2229
[7]  
Gusfield D, 2003, LECT NOTES COMPUT SC, V2676, P144
[8]   Haplotype reconstruction from genotype data using Imperfect Phylogeny [J].
Halperin, E ;
Eskin, E .
BIOINFORMATICS, 2004, 20 (12) :1842-1849
[9]  
Hudson R.R., 1990, Oxford Surveys in Evolutionary Biology, V7, P1
[10]   Haplotype mapping of the bronchiolitis susceptibility locus near IL8 [J].
Hull, J ;
Rowlands, K ;
Lockhart, E ;
Sharland, M ;
Moore, C ;
Hanchard, N ;
Kwiatkowski, DP .
HUMAN GENETICS, 2004, 114 (03) :272-279