Integrated study of copy number states and genotype calls using high-density SNP arrays

被引:85
作者
Sun, Wei [1 ,2 ]
Wright, Fred A. [1 ]
Tang, Zhengzheng [1 ]
Nordgard, Silje H. [2 ,3 ]
Van Loo, Peter [3 ,4 ,5 ]
Yu, Tianwei [6 ]
Kristensen, Vessela N. [3 ]
Perou, Charles M. [2 ,7 ]
机构
[1] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27515 USA
[2] Univ N Carolina, Dept Genet, Chapel Hill, NC USA
[3] Oslo Univ Hosp, Radiumhosp, Dept Genet, Inst Canc Res, Oslo, Norway
[4] Katholieke Univ Leuven, Dept Mol & Dev Genet, Vlaams Inst Biotechnol, Louvain, Belgium
[5] Katholieke Univ Leuven, Dept Human Genet, Leuven, Belgium
[6] Emory Univ, Dept Biostat & Bioinformat, Atlanta, GA 30322 USA
[7] Univ N Carolina, Lineberger Canc Res Ctr, Chapel Hill, NC 27599 USA
关键词
HIDDEN MARKOV-MODELS; HUMAN GENOME; STRUCTURAL VARIATION; GENE-EXPRESSION; CANCER-CELLS; SOLID TUMORS; ALGORITHM; ABERRATIONS; POPULATION; POLYMORPHISM;
D O I
10.1093/nar/gkp493
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We propose a statistical framework, named genoCN, to simultaneously dissect copy number states and genotypes using high-density SNP (single nucleotide polymorphism) arrays. There are at least two types of genomic DNA copy number differences: copy number variations (CNVs) and copy number aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared with CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. GenoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue. We evaluated genoCN by applications to 162 HapMap individuals and a brain tumor (glioblastoma) dataset and showed that our method can successfully identify both types of copy number differences and produce high-quality genotype calls.
引用
收藏
页码:5365 / 5377
页数:13
相关论文
共 43 条
[1]   Chromosome aberrations in solid tumors [J].
Albertson, DG ;
Collins, C ;
McCormick, F ;
Gray, JW .
NATURE GENETICS, 2003, 34 (04) :369-376
[2]   A haplotype map of the human genome [J].
Altshuler, D ;
Brooks, LD ;
Chakravarti, A ;
Collins, FS ;
Daly, MJ ;
Donnelly, P ;
Gibbs, RA ;
Belmont, JW ;
Boudreau, A ;
Leal, SM ;
Hardenbol, P ;
Pasternak, S ;
Wheeler, DA ;
Willis, TD ;
Yu, FL ;
Yang, HM ;
Zeng, CQ ;
Gao, Y ;
Hu, HR ;
Hu, WT ;
Li, CH ;
Lin, W ;
Liu, SQ ;
Pan, H ;
Tang, XL ;
Wang, J ;
Wang, W ;
Yu, J ;
Zhang, B ;
Zhang, QR ;
Zhao, HB ;
Zhao, H ;
Zhou, J ;
Gabriel, SB ;
Barry, R ;
Blumenstiel, B ;
Camargo, A ;
Defelice, M ;
Faggart, M ;
Goyette, M ;
Gupta, S ;
Moore, J ;
Nguyen, H ;
Onofrio, RC ;
Parkin, M ;
Roy, J ;
Stahl, E ;
Winchester, E ;
Ziaugra, L ;
Shen, Y .
NATURE, 2005, 437 (7063) :1299-1320
[3]   Comprehensive genomic characterization defines human glioblastoma genes and core pathways [J].
Chin, L. ;
Meyerson, M. ;
Aldape, K. ;
Bigner, D. ;
Mikkelsen, T. ;
VandenBerg, S. ;
Kahn, A. ;
Penny, R. ;
Ferguson, M. L. ;
Gerhard, D. S. ;
Getz, G. ;
Brennan, C. ;
Taylor, B. S. ;
Winckler, W. ;
Park, P. ;
Ladanyi, M. ;
Hoadley, K. A. ;
Verhaak, R. G. W. ;
Hayes, D. N. ;
Spellman, Paul T. ;
Absher, D. ;
Weir, B. A. ;
Ding, L. ;
Wheeler, D. ;
Lawrence, M. S. ;
Cibulskis, K. ;
Mardis, E. ;
Zhang, Jinghui ;
Wilson, R. K. ;
Donehower, L. ;
Wheeler, D. A. ;
Purdom, E. ;
Wallis, J. ;
Laird, P. W. ;
Herman, J. G. ;
Schuebel, K. E. ;
Weisenberger, D. J. ;
Baylin, S. B. ;
Schultz, N. ;
Yao, Jun ;
Wiedemeyer, R. ;
Weinstein, J. ;
Sander, C. ;
Gibbs, R. A. ;
Gray, J. ;
Kucherlapati, R. ;
Lander, E. S. ;
Myers, R. M. ;
Perou, C. M. ;
McLendon, Roger .
NATURE, 2008, 455 (7216) :1061-1068
[4]   QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data [J].
Colella, Stefano ;
Yau, Christopher ;
Taylor, Jennifer M. ;
Mirza, Ghazala ;
Butler, Helen ;
Clouston, Penny ;
Bassett, Anne S. ;
Seller, Anneke ;
Holmes, Christopher C. ;
Ragoussis, Jiannis .
NUCLEIC ACIDS RESEARCH, 2007, 35 (06) :2013-2025
[5]  
Durbin R., 1998, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
[6]   Structural variation in the human genome [J].
Feuk, L ;
Carson, AR ;
Scherer, SW .
NATURE REVIEWS GENETICS, 2006, 7 (02) :85-97
[7]   Ploidy status and copy number aberrations in primary glioblastomas defined by integrated analysis of allelic ratios, signal ratios and loss of heterozygosity using 500K SNP Mapping Arrays [J].
Gardina, Paul J. ;
Lo, Ken C. ;
Lee, Walter ;
Cowell, John K. ;
Turpaz, Yaron .
BMC GENOMICS, 2008, 9 (1)
[8]   GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population [J].
Giannoulatou, Eleni ;
Yau, Christopher ;
Colella, Stefano ;
Ragoussis, Jiannis ;
Holmes, Christopher C. .
BIOINFORMATICS, 2008, 24 (19) :2209-2214
[9]   Modeling markers of disease progression by a hidden Markov process: Application to characterizing CD4 cell decline [J].
Guihenneuc-Jouyaux, C ;
Richardson, S ;
Longini, IM .
BIOMETRICS, 2000, 56 (03) :733-741
[10]   A case-control association study between the GRID1 gene and schizophrenia in the Chinese Northern Han population [J].
Guo, Sheng-Zhen ;
Huang, Ke ;
Shi, Yong-Yong ;
Tang, Wei ;
Zhou, Jian ;
Feng, Guo-Yin ;
Zhu, Shao-Min ;
Liu, Hui-Jun ;
Chen, Yi ;
Sun, Xiao-Dong ;
He, Lin .
SCHIZOPHRENIA RESEARCH, 2007, 93 (1-3) :385-390