A faster circular binary segmentation algorithm for the analysis of array CGH data

被引:684
作者
Venkatraman, E. S. [1 ]
Olshen, Adam B. [1 ]
机构
[1] Mem Sloan Kettering Canc Ctr, Dept Epidemiol & Biostat, New York, NY 10021 USA
关键词
D O I
10.1093/bioinformatics/btl646
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number. The algorithm tests for change-points using a maximal t-statistic with a permutation reference distribution to obtain the corresponding P-value. The number of computations required for the maximal test statistic is O(N-2), where N is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster algorithm. Results: We present a hybrid approach to obtain the P-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analyses of array CGH data from breast cancer cell lines to show the impact of the new approaches on the analysis of real data. Availability: An R version of the CBS algorithm has been implemented in the "DNAcopy" package of the Bioconductor project. The proposed hybrid method for the P-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.
引用
收藏
页码:657 / 663
页数:7
相关论文
共 19 条
[1]   High-resolution characterization of the pancreatic adenocarcinoma genome [J].
Aguirre, AJ ;
Brennan, C ;
Bailey, G ;
Sinha, R ;
Feng, B ;
Leo, C ;
Zhang, YY ;
Zhang, J ;
Gans, JD ;
Bardeesy, N ;
Cauwels, C ;
Cordon-Cardo, C ;
Redston, MS ;
DePinho, RA ;
Chin, L .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (24) :9067-9072
[2]  
[Anonymous], 2006, R LANG ENV STAT COMP
[3]   High-resolution global profiling of genomic alterations with long oligonucleotide microarray [J].
Brennan, C ;
Zhang, YY ;
Leo, C ;
Feng, B ;
Cauwels, C ;
Aguirre, AJ ;
Kim, MJ ;
Protopopov, A ;
Chin, L .
CANCER RESEARCH, 2004, 64 (14) :4744-4748
[4]   Array comparative genomic hybridization reveals genomic copy number changes associated with outcome in diffuse large B-cell lymphomas [J].
Chen, WY ;
Houldsworth, J ;
Olshen, AB ;
Nanjangud, G ;
Chaganti, S ;
Venkatraman, ES ;
Halaas, J ;
Teruya-Feldstein, J ;
Zelenetz, AD ;
Chaganti, RSK .
BLOOD, 2006, 107 (06) :2477-2485
[5]   DETECTION OF COMPLETE AND PARTIAL CHROMOSOME GAINS AND LOSSES BY COMPARATIVE GENOMIC INSITU HYBRIDIZATION [J].
DUMANOIR, S ;
SPEICHER, MR ;
JOOS, S ;
SCHROCK, E ;
POPP, S ;
DOHNER, H ;
KOVACS, G ;
ROBERTNICOUD, M ;
LICHTER, P ;
CREMER, T .
HUMAN GENETICS, 1993, 90 (06) :590-610
[6]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[7]   COMPARATIVE GENOMIC HYBRIDIZATION FOR MOLECULAR CYTOGENETIC ANALYSIS OF SOLID TUMORS [J].
KALLIONIEMI, A ;
KALLIONIEMI, OP ;
SUDAR, D ;
RUTOVITZ, D ;
GRAY, JW ;
WALDMAN, F ;
PINKEL, D .
SCIENCE, 1992, 258 (5083) :818-821
[8]   Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data [J].
Lai, WR ;
Johnson, MD ;
Kucherlapati, R ;
Park, PJ .
BIOINFORMATICS, 2005, 21 (19) :3763-3770
[9]   Representational oligonucleotide microarray analysis: A high-resolution method to detect genome copy number variation [J].
Lucito, R ;
Healy, J ;
Alexander, J ;
Reiner, A ;
Esposito, D ;
Chi, MY ;
Rodgers, L ;
Brady, A ;
Sebat, J ;
Troge, J ;
West, JA ;
Rostan, S ;
Nguyen, KCQ ;
Powers, S ;
Ye, KQ ;
Olshen, A ;
Venkatraman, E ;
Norton, L ;
Wigler, M .
GENOME RESEARCH, 2003, 13 (10) :2291-2305
[10]   Circular binary segmentation for the analysis of array-based DNA copy number data [J].
Olshen, AB ;
Venkatraman, ES ;
Lucito, R ;
Wigler, M .
BIOSTATISTICS, 2004, 5 (04) :557-572