CNAseg-a novel framework for identification of copy number changes in cancer from second-generation sequencing data

被引:77
作者
Ivakhno, Sergii [1 ,2 ]
Royce, Tom [3 ]
Cox, Anthony J. [2 ]
Evers, Dirk J. [2 ]
Cheetham, R. Keira [2 ]
Tavare, Simon [1 ]
机构
[1] Li Ka Shing Ctr, Canc Res UK Cambridge Res Inst, Cambridge CB2 0RE, England
[2] Illumina Cambridge, Saffron Walden CB10 1XL, England
[3] Illumina Inc, Corp Headquarters, San Diego, CA 92121 USA
关键词
REARRANGEMENTS;
D O I
10.1093/bioinformatics/btq587
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Copy number abnormalities (CNAs) represent an important type of genetic mutation that can lead to abnormal cell growth and proliferation. New high-throughput sequencing technologies promise comprehensive characterization of CNAs. In contrast to microarrays, where probe design follows a carefully developed protocol, reads represent a random sample from a library and may be prone to representation biases due to GC content and other factors. The discrimination between true and false positive CNAs becomes an important issue. Results: We present a novel approach, called CNAseg, to identify CNAs from second-generation sequencing data. It uses depth of coverage to estimate copy number states and flowcell-to-flowcell variability in cancer and normal samples to control the false positive rate. We tested the method using the COLO-829 melanoma cell line sequenced to 40-fold coverage. An extensive simulation scheme was developed to recreate different scenarios of copy number changes and depth of coverage by altering a real dataset with spiked-in CNAs. Comparison to alternative approaches using both real and simulated datasets showed that CNAseg achieves superior precision and improved sensitivity estimates.
引用
收藏
页码:3051 / 3058
页数:8
相关论文
共 34 条
  • [11] HAMPTON O, 2007, GENOME RES, V19, P167
  • [12] Whole-genome sequencing and variant discovery in C-elegans
    Hillier, LaDeana W.
    Marth, Gabor T.
    Quinlan, Aaron R.
    Dooling, David
    Fewell, Ginger
    Barnett, Derek
    Fox, Paul
    Glasscock, Jarret I.
    Hickenbotham, Matthew
    Huang, Weichun
    Magrini, Vincent J.
    Richt, Ryan J.
    Sander, Sacha N.
    Stewart, Donald A.
    Stromberg, Michael
    Tsung, Eric F.
    Wylie, Todd
    Schedl, Tim
    Wilson, Richard K.
    Mardis, Elaine R.
    [J]. NATURE METHODS, 2008, 5 (02) : 183 - 188
  • [13] Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes
    Hormozdiari, Fereydoun
    Alkan, Can
    Eichler, Evan E.
    Sahinalp, S. Cenk
    [J]. GENOME RESEARCH, 2009, 19 (07) : 1270 - 1278
  • [14] ILLUMINA LTD, 2009, COMPLETE SECONDARY A
  • [15] Bayesian analysis of the differences of count data
    Karlis, D
    Ntzoufras, I
    [J]. STATISTICS IN MEDICINE, 2006, 25 (11) : 1885 - 1905
  • [16] A robust framework for detecting structural variations in a genome
    Lee, Seunghak
    Cheran, Elango
    Brudno, Michael
    [J]. BIOINFORMATICS, 2008, 24 (13) : I59 - I67
  • [17] MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions
    Lee, Seunghak
    Hormozdiari, Fereydoun
    Alkan, Can
    Brudno, Michael
    [J]. NATURE METHODS, 2009, 6 (07) : 473 - 474
  • [18] The Sequence Alignment/Map format and SAMtools
    Li, Heng
    Handsaker, Bob
    Wysoker, Alec
    Fennell, Tim
    Ruan, Jue
    Homer, Nils
    Marth, Gabor
    Abecasis, Goncalo
    Durbin, Richard
    [J]. BIOINFORMATICS, 2009, 25 (16) : 2078 - 2079
  • [19] Fast and accurate short read alignment with Burrows-Wheeler transform
    Li, Heng
    Durbin, Richard
    [J]. BIOINFORMATICS, 2009, 25 (14) : 1754 - 1760
  • [20] AN EXAMINATION OF PROCEDURES FOR DETERMINING THE NUMBER OF CLUSTERS IN A DATA SET
    MILLIGAN, GW
    COOPER, MC
    [J]. PSYCHOMETRIKA, 1985, 50 (02) : 159 - 179