Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome

被引:63
作者
Korbel, Jan O. [1 ]
Urban, Alexander Eckehart
Grubert, Fabian
Du, Jiang
Royce, Thomas E.
Starr, Peter
Zhong, Guoneng
Emanuel, Beverly S.
Weissman, Sherman M.
Snyder, Michael
Gerstein, Mark B.
机构
[1] Yale Univ, Sch Med, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
[2] Yale Univ, Sch Med, Dept Genet, New Haven, CT 06520 USA
[3] European Mol Biol Lab, D-69117 Heidelberg, Germany
[4] Yale Univ, Dept Mol Cellular & Dev Biol, New Haven, CT 06520 USA
[5] Yale Univ, Dept Comp Sci, New Haven, CT 06520 USA
[6] Univ Penn, Sch Med, Dept Pediat, Philadelphia, PA 19104 USA
关键词
copy number polymorphism; human genome variation; structural variants;
D O I
10.1073/pnas.0703834104
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Copy-number variants (CNVs) are an abundant form of genetic variation in humans. However, approaches for determining exact CNV breakpoint sequences (physical deletion or duplication boundaries) across individuals, crucial for associating genotype to phenotype, have been lacking so far, and the vast majority of CNVs have been reported with approximate genomic coordinates only. Here, we report an approach, called BreakPtr, for fine-mapping CNVs (available from http://breakptr.gersteinlab.org). We statistically integrate both sequence characteristics and data from high-resolution comparative genome hybridization experiments in a discrete-valued, bivariate hidden Markov model. Incorporation of nucleotide-sequence information allows us to take into account the fact that recently duplicated sequences (e.g., segmental duplications) often coincide with breakpoints. In anticipation of an upcoming increase in CNV data, we developed an iterative, "active" approach to initially scoring with a preliminary model, performing targeted validations, retraining the model, and then rescoring, and a flexible parameterization system that intuitively collapses from a full model of 2,503 parameters to a core one of only 10. Using our approach, we accurately mapped >400 break-points on chromosome 22 and a region of chromosome 11, refining the boundaries of many previously approximately mapped CNVs. Four predicted breakpoints flanked known disease-associated deletions. We validated an additional four predicted CNV breakpoints by sequencing. Overall, our results suggest a predictive resolution of approximate to 300bp. This level of resolution enables more precise correlations between CNVs and across individuals than previously possible, allowing the study of CNV population frequencies. Further, it enabled us to demonstrate a clear Mendelian pattern of inheritance for one of the CNVs.
引用
收藏
页码:10110 / 10115
页数:6
相关论文
共 34 条
[1]   Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans [J].
Aitman, TJ ;
Dong, R ;
Vyse, TJ ;
Norsworthy, PJ ;
Johnson, MD ;
Smith, J ;
Mangion, J ;
Roberton-Lowe, C ;
Marshall, AJ ;
Petretto, E ;
Hodges, MD ;
Bhangal, G ;
Patel, SG ;
Sheehan-Rooney, K ;
Duda, M ;
Cook, PR ;
Evans, DJ ;
Domin, J ;
Flint, J ;
Boyle, JJ ;
Pusey, CD ;
Cook, HT .
NATURE, 2006, 439 (7078) :851-855
[2]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[3]   Recent segmental duplications in the human genome [J].
Bailey, JA ;
Gu, ZP ;
Clark, RA ;
Reinert, K ;
Samonte, RV ;
Schwartz, S ;
Adams, MD ;
Myers, EW ;
Li, PW ;
Eichler, EE .
SCIENCE, 2002, 297 (5583) :1003-1007
[4]   Global identification of human transcribed sequences with genome tiling arrays [J].
Bertone, P ;
Stolc, V ;
Royce, TE ;
Rozowsky, JS ;
Urban, AE ;
Zhu, XW ;
Rinn, JL ;
Tongprasit, W ;
Samanta, M ;
Weissman, S ;
Gerstein, M ;
Snyder, M .
SCIENCE, 2004, 306 (5705) :2242-2246
[5]   Molecular genetic confirmatory testing from newborn screening samples for the common African-American, Asian Indian, southeast Asian, and Chinese β-thalassemia mutations [J].
Bhardwaj, U ;
Zhang, YH ;
Lorey, F ;
McCabe, LL ;
McCabe, ERB .
AMERICAN JOURNAL OF HEMATOLOGY, 2005, 78 (04) :249-255
[6]  
Chapelle O., 2006, SEMISUPERVISED LEARN, DOI DOI 10.1109/TNN.2009.2015974
[7]   QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data [J].
Colella, Stefano ;
Yau, Christopher ;
Taylor, Jennifer M. ;
Mirza, Ghazala ;
Butler, Helen ;
Clouston, Penny ;
Bassett, Anne S. ;
Seller, Anneke ;
Holmes, Christopher C. ;
Ragoussis, Jiannis .
NUCLEIC ACIDS RESEARCH, 2007, 35 (06) :2013-2025
[8]   A high-resolution survey of deletion polymorphism in the human genome [J].
Conrad, DF ;
Andrews, TD ;
Carter, NP ;
Hurles, ME ;
Pritchard, JK .
NATURE GENETICS, 2006, 38 (01) :75-81
[9]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[10]   Structural variation in the human genome [J].
Feuk, L ;
Carson, AR ;
Scherer, SW .
NATURE REVIEWS GENETICS, 2006, 7 (02) :85-97