Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs

被引:575
作者
Korn, Joshua M. [1 ,2 ,3 ,4 ,5 ,6 ]
Kuruvilla, Finny G. [1 ,2 ,5 ,6 ,7 ]
McCarroll, Steven A. [1 ,2 ,5 ,6 ]
Wysoker, Alec [1 ,2 ]
Nemesh, James [1 ,2 ]
Cawley, Simon [8 ]
Hubbell, Earl [8 ]
Veitch, Jim [8 ]
Collins, Patrick J. [8 ]
Darvishi, Katayoon [9 ]
Lee, Charles [9 ]
Nizzari, Marcia M. [1 ,2 ]
Gabriel, Stacey B. [1 ,2 ]
Purcell, Shaun [1 ,2 ,6 ]
Daly, Mark J. [1 ,2 ,6 ,10 ]
Altshuler, David [1 ,2 ,5 ,6 ,10 ]
机构
[1] Harvard Univ, Broad Inst, Cambridge, MA 02142 USA
[2] MIT, Cambridge, MA 02142 USA
[3] Harvard Univ, MIT, Div Hlth Sci & Technol, Cambridge, MA 02139 USA
[4] Harvard Univ, Grad Program Biophys, Cambridge, MA 02138 USA
[5] Massachusetts Gen Hosp, Dept Mol Biol, Boston, MA 02114 USA
[6] Massachusetts Gen Hosp, Ctr Human Genet Res, Boston, MA 02114 USA
[7] Brigham & Womens Hosp, Dept Pathol, Boston, MA 02115 USA
[8] Affymetrix Inc, Santa Clara, CA 95051 USA
[9] Harvard Univ, Sch Med, Dept Pathol, Boston, MA 02115 USA
[10] Harvard Univ, Sch Med, Dept Med, Boston, MA 02115 USA
关键词
D O I
10.1038/ng.237
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Accurate and complete measurement of single nucleotide (SNP) and copy number (CNV) variants, both common and rare, will be required to understand the role of genetic variation in disease. We present Birdsuite, a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden Markov model (HMM), and generates an integrated sequence and copy number genotype at every locus (for example, including genotypes such as A-null, AAB and BBB in addition to AA, AB and BB calls). Such genotypes more accurately depict the underlying sequence of each individual, reducing the rate of apparent mendelian inconsistencies. The Birdsuite software is applied here to data from the Affymetrix SNP 6.0 array. Additionally, we describe a method, implemented in PLINK, to utilize these combined SNP and CNV genotypes for association testing with a phenotype.
引用
收藏
页码:1253 / 1260
页数:8
相关论文
共 21 条
[1]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[2]   Estimation and assessment of raw copy numbers at the single locus level [J].
Bengtsson, H. ;
Irizarry, R. ;
Carvalho, B. ;
Speed, T. P. .
BIOINFORMATICS, 2008, 24 (06) :759-767
[3]   Population structure, differential bias and genomic control in a large-scale, case-control association study [J].
Clayton, DG ;
Walker, NM ;
Smyth, DJ ;
Pask, R ;
Cooper, JD ;
Maier, LM ;
Smink, LJ ;
Lam, AC ;
Ovington, NR ;
Stevens, HE ;
Nutland, S ;
Howson, JMM ;
Faham, M ;
Moorhead, M ;
Jones, HB ;
Falkowski, M ;
Hardenbol, P ;
Willis, TD ;
Todd, JA .
NATURE GENETICS, 2005, 37 (11) :1243-1246
[4]   A high-resolution survey of deletion polymorphism in the human genome [J].
Conrad, DF ;
Andrews, TD ;
Carter, NP ;
Hurles, ME ;
Pritchard, JK .
NATURE GENETICS, 2006, 38 (01) :75-81
[5]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[6]   Accurate and reliable high-throughput detection of copy number variation in the human genome [J].
Fiegler, Heike ;
Redon, Richard ;
Andrews, Dan ;
Scott, Carol ;
Andrews, Robert ;
Carder, Carol ;
Clark, Richard ;
Dovey, Oliver ;
Ellis, Peter ;
Feuk, Lars ;
French, Lisa ;
Hunt, Paul ;
Kalaitzopoulos, Dimitrios ;
Larkin, James ;
Montgomery, Lyndal ;
Perry, George H. ;
Plumb, Bob W. ;
Porter, Keith ;
Rigby, Rachel E. ;
Rigler, Diane ;
Valsesia, Armand ;
Langford, Cordelia ;
Humphray, Sean J. ;
Scherer, Stephen W. ;
Lee, Charles ;
Hurles, Matthew E. ;
Carter, Nigel P. .
GENOME RESEARCH, 2006, 16 (12) :1566-1574
[7]  
*INT SCHIZ CONS, 2008, NATURE 0830, DOI DOI 10.1038/NATURE07239
[8]   Mapping and sequencing of structural variation from eight human genomes (Reprinted from Nature, vol 453, pg 56-64, 2008) [J].
Kidd, Jeffrey M. ;
Cooper, Gregory M. ;
Donahue, William F. ;
Hayden, Hillary S. ;
Sampas, Nick ;
Graves, Tina ;
Hansen, Nancy ;
Teague, Brian ;
Alkan, Can ;
Antonacci, Francesca ;
Haugen, Eric ;
Zerr, Troy ;
Yamada, N. Alice ;
Tsang, Peter ;
Newman, Tera L. ;
Tuzun, Eray ;
Cheng, Ze ;
Ebling, Heather M. ;
Tusneem, Nadeem ;
David, Robert ;
Gillett, Will ;
Phelps, Karen A. ;
Weaver, Molly ;
Saranga, David ;
Brand, Adrianne ;
Tao, Wei ;
Gustafson, Erik ;
McKernan, Kevin ;
Chen, Lin ;
Malig, Maika ;
Smith, Joshua D. ;
Korn, Joshua M. ;
McCarroll, Steven A. ;
Altshuler, David A. ;
Peiffer, Daniel A. ;
Dorschner, Michael ;
Stamatoyannopoulos, John ;
Schwartz, David ;
Nickerson, Deborah A. ;
Mullikin, James C. ;
Wilson, Richard K. ;
Bruhn, Laurakay ;
Olson, Maynard V. ;
Kaul, Rajinder ;
Smith, Douglas R. ;
Eichler, Evan E. .
NATURE GENETICS, 2009, :S22-S30
[9]   Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays [J].
Komura, Daisuke ;
Shen, Fan ;
Ishikawa, Shumpei ;
Fitch, Karen R. ;
Chen, Wenwei ;
Zhang, Jane ;
Liu, Guoying ;
Ihara, Sigeo ;
Nakamura, Hiroshi ;
Hurles, Matthew E. ;
Lee, Charles ;
Scherer, Stephen W. ;
Jones, Keith W. ;
Shapero, Michael H. ;
Huang, Jing ;
Aburatani, Hiroyuki .
GENOME RESEARCH, 2006, 16 (12) :1575-1584
[10]   PLASQ: a generalized linear model-based procedure to determine allelic dosage in cancer cells from SNP array data [J].
Laframboise, Thomas ;
Harrington, David ;
Weir, Barbara A. .
BIOSTATISTICS, 2007, 8 (02) :323-336