CoNVEX: copy number variation estimation in exome sequencing data using HMM

被引:52
作者
Amarasinghe, Kaushalya C. [1 ]
Li, Jason [2 ]
Halgamuge, Saman K. [1 ]
机构
[1] Univ Melbourne, Dept Mech Engn, Parkville, Vic 3010, Australia
[2] Peter MacCallum Canc Ctr, Bioinformat Core Facil, Melbourne, Vic 3002, Australia
来源
BMC BIOINFORMATICS | 2013年 / 14卷
基金
澳大利亚研究理事会;
关键词
HIDDEN MARKOV-MODELS; CANCER; CAPTURE; IDENTIFICATION; MUTATIONS; BREAST;
D O I
10.1186/1471-2105-14-S2-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: One of the main types of genetic variations in cancer is Copy Number Variations (CNV). Whole exome sequenicng (WES) is a popular alternative to whole genome sequencing (WGS) to study disease specific genomic variations. However, finding CNV in Cancer samples using WES data has not been fully explored. Results: We present a new method, called CoNVEX, to estimate copy number variation in whole exome sequencing data. It uses ratio of tumour and matched normal average read depths at each exonic region, to predict the copy gain or loss. The useful signal produced by WES data will be hindered by the intrinsic noise present in the data itself. This limits its capacity to be used as a highly reliable CNV detection source. Here, we propose a method that consists of discrete wavelet transform (DWT) to reduce noise. The identification of copy number gains/losses of each targeted region is performed by a Hidden Markov Model (HMM). Conclusion: HMM is frequently used to identify CNV in data produced by various technologies including Array Comparative Genomic Hybridization (aCGH) and WGS. Here, we propose an HMM to detect CNV in cancer exome data. We used modified data from 1000 Genomes project to evaluate the performance of the proposed method. Using these data we have shown that CoNVEX outperforms the existing methods significantly in terms of precision. Overall, CoNVEX achieved a sensitivity of more than 92% and a precision of more than 50%.
引用
收藏
页数:9
相关论文
共 29 条
  • [1] APPLICATIONS OF NEXT-GENERATION SEQUENCING Genome structural variation discovery and genotyping
    Alkan, Can
    Coe, Bradley P.
    Eichler, Evan E.
    [J]. NATURE REVIEWS GENETICS, 2011, 12 (05) : 363 - 375
  • [2] Sequence analysis of mutations and translocations across breast cancer subtypes
    Banerji, Shantanu
    Cibulskis, Kristian
    Rangel-Escareno, Claudia
    Brown, Kristin K.
    Carter, Scott L.
    Frederick, Abbie M.
    Lawrence, Michael S.
    Sivachenko, Andrey Y.
    Sougnez, Carrie
    Zou, Lihua
    Cortes, Maria L.
    Fernandez-Lopez, Juan C.
    Peng, Shouyong
    Ardlie, Kristin G.
    Auclair, Daniel
    Bautista-Pina, Veronica
    Duke, Fujiko
    Francis, Joshua
    Jung, Joonil
    Maffuz-Aziz, Antonio
    Onofrio, Robert C.
    Parkin, Melissa
    Pho, Nam H.
    Quintanar-Jurado, Valeria
    Ramos, Alex H.
    Rebollar-Vega, Rosa
    Rodriguez-Cuevas, Sergio
    Romero-Cordoba, Sandra L.
    Schumacher, Steven E.
    Stransky, Nicolas
    Thompson, Kristin M.
    Uribe-Figueroa, Laura
    Baselga, Jose
    Beroukhim, Rameen
    Polyak, Kornelia
    Sgroi, Dennis C.
    Richardson, Andrea L.
    Jimenez-Sanchez, Gerardo
    Lander, Eric S.
    Gabriel, Stacey B.
    Garraway, Levi A.
    Golub, Todd R.
    Melendez-Zajgla, Jorge
    Toker, Alex
    Getz, Gad
    Hidalgo-Miranda, Alfredo
    Meyerson, Matthew
    [J]. NATURE, 2012, 486 (7403) : 405 - 409
  • [3] Exome sequencing: the expert view
    Biesecker, Leslie G.
    Shianna, Kevin V.
    Mullikin, Jim C.
    [J]. GENOME BIOLOGY, 2011, 12 (09)
  • [4] Boeva V., 2011, BIOINFORMATICS
  • [5] Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing
    Campbell, Peter J.
    Stephens, Philip J.
    Pleasance, Erin D.
    O'Meara, Sarah
    Li, Heng
    Santarius, Thomas
    Stebbings, Lucy A.
    Leroy, Catherine
    Edkins, Sarah
    Hardy, Claire
    Teague, Jon W.
    Menzies, Andrew
    Goodhead, Ian
    Turner, Daniel J.
    Clee, Christopher M.
    Quail, Michael A.
    Cox, Antony
    Brown, Clive
    Durbin, Richard
    Hurles, Matthew E.
    Edwards, Paul A. W.
    Bignell, Graham R.
    Stratton, Michael R.
    Futreal, P. Andrew
    [J]. NATURE GENETICS, 2008, 40 (06) : 722 - 729
  • [6] Genetic diagnosis by whole exome capture and massively parallel DNA sequencing
    Choi, Murim
    Scholl, Ute I.
    Ji, Weizhen
    Liu, Tiewen
    Tikhonova, Irina R.
    Zumbo, Paul
    Nayir, Ahmet
    Bakkaloglu, Aysin
    Ozen, Seza
    Sanjad, Sami
    Nelson-Williams, Carol
    Farhi, Anita
    Mane, Shrikant
    Lifton, Richard P.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (45) : 19096 - 19101
  • [7] Hidden Markov models approach to the analysis of array CGH data
    Fridlyand, J
    Snijders, AM
    Pinkel, D
    Albertson, DG
    Jain, AN
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 90 (01) : 132 - 153
  • [8] The new paradigm of flow cell sequencing
    Holt, Robert A.
    Jones, Steven J. M.
    [J]. GENOME RESEARCH, 2008, 18 (06) : 839 - 846
  • [9] Denoising array-based comparative genomic hybridization data using wavelets
    Hsu, L
    Self, SG
    Grove, D
    Randolph, T
    Wang, K
    Delrow, JJ
    Loo, L
    Porter, P
    [J]. BIOSTATISTICS, 2005, 6 (02) : 211 - 226
  • [10] CNAseg-a novel framework for identification of copy number changes in cancer from second-generation sequencing data
    Ivakhno, Sergii
    Royce, Tom
    Cox, Anthony J.
    Evers, Dirk J.
    Cheetham, R. Keira
    Tavare, Simon
    [J]. BIOINFORMATICS, 2010, 26 (24) : 3051 - 3058