CNAseg-a novel framework for identification of copy number changes in cancer from second-generation sequencing data

被引：77

作者：

Ivakhno, Sergii ^{[1
,2
]}

Royce, Tom ^{[3
]}

Cox, Anthony J. ^{[2
]}

Evers, Dirk J. ^{[2
]}

Cheetham, R. Keira ^{[2
]}

Tavare, Simon ^{[1
]}

机构：

[1] Li Ka Shing Ctr, Canc Res UK Cambridge Res Inst, Cambridge CB2 0RE, England

[2] Illumina Cambridge, Saffron Walden CB10 1XL, England

[3] Illumina Inc, Corp Headquarters, San Diego, CA 92121 USA

来源：

BIOINFORMATICS | 2010年 / 26卷 / 24期

关键词：

REARRANGEMENTS;

D O I：

10.1093/bioinformatics/btq587

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Copy number abnormalities (CNAs) represent an important type of genetic mutation that can lead to abnormal cell growth and proliferation. New high-throughput sequencing technologies promise comprehensive characterization of CNAs. In contrast to microarrays, where probe design follows a carefully developed protocol, reads represent a random sample from a library and may be prone to representation biases due to GC content and other factors. The discrimination between true and false positive CNAs becomes an important issue. Results: We present a novel approach, called CNAseg, to identify CNAs from second-generation sequencing data. It uses depth of coverage to estimate copy number states and flowcell-to-flowcell variability in cancer and normal samples to control the false positive rate. We tested the method using the COLO-829 melanoma cell line sequenced to 40-fold coverage. An extensive simulation scheme was developed to recreate different scenarios of copy number changes and depth of coverage by altering a real dataset with spiked-in CNAs. Comparison to alternative approaches using both real and simulated datasets showed that CNAseg achieves superior precision and improved sensitivity estimates.

引用

页码：3051 / 3058

页数：8

共 34 条

[11] HAMPTON O, 2007, GENOME RES, V19, P167
[12] Whole-genome sequencing and variant discovery in C-elegans
Hillier, LaDeana W.
Marth, Gabor T.
Quinlan, Aaron R.
Dooling, David
Fewell, Ginger
Barnett, Derek
Fox, Paul
Glasscock, Jarret I.
Hickenbotham, Matthew
Huang, Weichun
Magrini, Vincent J.
Richt, Ryan J.
Sander, Sacha N.
Stewart, Donald A.
Stromberg, Michael
Tsung, Eric F.
Wylie, Todd
Schedl, Tim
Wilson, Richard K.
Mardis, Elaine R.
[J]. NATURE METHODS, 2008, 5 (02) : 183 - 188
[13] Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes
Hormozdiari, Fereydoun
Alkan, Can
Eichler, Evan E.
Sahinalp, S. Cenk
[J]. GENOME RESEARCH, 2009, 19 (07) : 1270 - 1278
[14] ILLUMINA LTD, 2009, COMPLETE SECONDARY A
[15] Bayesian analysis of the differences of count data
Karlis, D
Ntzoufras, I
[J]. STATISTICS IN MEDICINE, 2006, 25 (11) : 1885 - 1905
[16] A robust framework for detecting structural variations in a genome
Lee, Seunghak
Cheran, Elango
Brudno, Michael
[J]. BIOINFORMATICS, 2008, 24 (13) : I59 - I67
[17] MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions
Lee, Seunghak
Hormozdiari, Fereydoun
Alkan, Can
Brudno, Michael
[J]. NATURE METHODS, 2009, 6 (07) : 473 - 474
[18] The Sequence Alignment/Map format and SAMtools
Li, Heng
Handsaker, Bob
Wysoker, Alec
Fennell, Tim
Ruan, Jue
Homer, Nils
Marth, Gabor
Abecasis, Goncalo
Durbin, Richard
[J]. BIOINFORMATICS, 2009, 25 (16) : 2078 - 2079
[19] Fast and accurate short read alignment with Burrows-Wheeler transform
Li, Heng
Durbin, Richard
[J]. BIOINFORMATICS, 2009, 25 (14) : 1754 - 1760
[20] AN EXAMINATION OF PROCEDURES FOR DETERMINING THE NUMBER OF CLUSTERS IN A DATA SET
MILLIGAN, GW
COOPER, MC
[J]. PSYCHOMETRIKA, 1985, 50 (02) : 159 - 179

← 1 2 3 4 →