QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data

被引:453
作者
Colella, Stefano
Yau, Christopher
Taylor, Jennifer M.
Mirza, Ghazala
Butler, Helen
Clouston, Penny
Bassett, Anne S.
Seller, Anneke
Holmes, Christopher C.
Ragoussis, Jiannis
机构
[1] Wellcome Trust Ctr Human Genet, Genom Lab, Oxford OX3 7BN, England
[2] Life Sci Interface Doctoral Training Ctr, Oxford OX1 3QD, England
[3] Univ Oxford, Dept Stat, Henry Wellcome Ctr Gene Funct, Oxford OX1 3TG, England
[4] Churchill Hosp, Oxford Med Genet Labs, Oxford OX3 7LJ, England
[5] Univ Toronto, Ctr Addict & Mental Hlth, Toronto, ON M6J 1H4, Canada
[6] MRC, Mammalian Genet Unit, Didcot OX11 0RD, Oxon, England
基金
英国医学研究理事会; 英国惠康基金;
关键词
D O I
10.1093/nar/gkm076
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Array-based technologies have been used to detect chromosomal copy number changes (aneuploidies) in the human genome. Recent studies identified numerous copy number variants (CNV) and some are common polymorphisms that may contribute to disease susceptibility. We developed, and experimentally validated, a novel computational framework (QuantiSNP) for detecting regions of copy number variation from BeadArray (TM) SNP genotyping data using an Objective Bayes Hidden-Markov Model (OB-HMM). Objective Bayes measures are used to set certain hyperparameters in the priors using a novel re-sampling framework to calibrate the model to a fixed Type I (false positive) error rate. Other parameters are set via maximum marginal likelihood to prior training data of known structure. QuantiSNP provides probabilistic quantification of state classifications and significantly improves the accuracy of segmental aneuploidy identification and mapping, relative to existing analytical tools (Beadstudio, Illumina), as demonstrated by validation of breakpoint boundaries. QuantiSNP identified both novel and validated CNVs. QuantiSNP was developed using BeadArray (TM) SNP data but it can be adapted to other platforms and we believe that the OB-HMM framework has widespread applicability in genomic research. In conclusion, QuantiSNP is a novel algorithm for high-resolution CNV/aneuploidy detection with application to clinical genetics, cancer and disease association studies.
引用
收藏
页码:2013 / 2025
页数:13
相关论文
共 41 条
  • [11] Structural variation in the human genome
    Feuk, L
    Carson, AR
    Scherer, SW
    [J]. NATURE REVIEWS GENETICS, 2006, 7 (02) : 85 - 97
  • [12] A genome-wide scalable SNP genotyping assay using microarray technology
    Gunderson, KL
    Steemers, FJ
    Lee, G
    Mendoza, LG
    Chee, MS
    [J]. NATURE GENETICS, 2005, 37 (05) : 549 - 554
  • [13] High-resolution analysis of chromosomal imbalances using the Affymetrix 10K SNP genotyping chip
    Herr, A
    Grützmann, R
    Matthaei, A
    Artelt, J
    Schröck, E
    Rump, A
    Pilarsky, C
    [J]. GENOMICS, 2005, 85 (03) : 392 - 400
  • [14] Microarray-based genome investigation: molecular karyotyping or segmental aneuploidy profiling?
    Hochstenbach, R
    van Amstel, HKP
    Poot, M
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2006, 14 (03) : 262 - 265
  • [15] Huang J, 2006, BMC BIOINFORMATICS, V7, DOI 10.1186/1471-2105-7-83
  • [16] Jeffreys H., 1998, The Theory of Probability
  • [17] BAYES FACTORS
    KASS, RE
    RAFTERY, AE
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (430) : 773 - 795
  • [18] Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays
    Komura, Daisuke
    Shen, Fan
    Ishikawa, Shumpei
    Fitch, Karen R.
    Chen, Wenwei
    Zhang, Jane
    Liu, Guoying
    Ihara, Sigeo
    Nakamura, Hiroshi
    Hurles, Matthew E.
    Lee, Charles
    Scherer, Stephen W.
    Jones, Keith W.
    Shapero, Michael H.
    Huang, Jing
    Aburatani, Hiroyuki
    [J]. GENOME RESEARCH, 2006, 16 (12) : 1575 - 1584
  • [19] Allele-specific amplification in cancer revealed by SNP array analysis
    LaFramboise, T
    Weir, BA
    Zhao, XJ
    Beroukhim, R
    Li, C
    Harrington, D
    Sellers, WR
    Meyerson, M
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2005, 1 (06) : 507 - 517
  • [20] LAFRAMBOISE T, 2006, BIOSTATISTICS 0620