Efficient calculation of interval scores for DNA copy number data analysis

被引:122
作者
Lipson, D [1 ]
Aumann, Y
Ben-Dor, A
Linial, N
Yakhini, Z
机构
[1] Technion Israel Inst Technol, Dept Comp Sci, IL-32000 Haifa, Israel
[2] Bar Ilan Univ, Dept Comp Sci, IL-52900 Ramat Gan, Israel
[3] Agilent Technol Israel, IL-49527 Petah Tiqwa, Israel
[4] Hebrew Univ Jerusalem, Dept Comp Sci, IL-91904 Jerusalem, Israel
关键词
CGH; cancer; microarray analysis; optimization; approximation;
D O I
10.1089/cmb.2006.13.215
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
DNA amplifications and deletions characterize cancer genome and are often related to disease evolution. Microarray-based techniques for measuring these DNA copy-number changes use fluorescence ratios at arrayed DNA elements ( BACs, cDNA, or oligonucleotides) to provide signals at high resolution, in terms of genomic locations. These data are then further analyzed to map aberrations and boundaries and identify biologically significant structures. We develop a statistical framework that enables the casting of several DNA copy number data analysis questions as optimization problems over real-valued vectors of signals. The simplest form of the optimization problem seeks to maximize phi(I) = Sigma v(i)/root vertical bar I vertical bar over all subintervals I in the input vector. We present and prove a linear time approximation scheme for this problem, namely, a process with time complexity O (n epsilon(-2)) that outputs an interval for which phi(I) is at least Opt/alpha(epsilon), where Opt is the actual optimum and alpha(epsilon) -> 1 as epsilon -> 0. We further develop practical implementations that improve the performance of the naive quadratic approach by orders of magnitude. We discuss properties of optimal intervals and how they apply to the algorithm performance. We benchmark our algorithms on synthetic as well as publicly available DNA copy number data. We demonstrate the use of these methods for identifying aberrations in single samples as well as common alterations in fixed sets and subsets of breast cancer samples.
引用
收藏
页码:215 / 228
页数:14
相关论文
共 20 条
[1]   Chromosomal imbalances in human lung cancer [J].
Balsara, BR ;
Testa, JR .
ONCOGENE, 2002, 21 (45) :6877-6883
[2]   High-resolution analysis of DNA copy number using oligonucleotide microarrays [J].
Bignell, GR ;
Huang, J ;
Greshock, J ;
Watt, S ;
Butler, A ;
West, S ;
Grigorova, M ;
Jones, KW ;
Wei, W ;
Stratton, MR ;
Futreal, PA ;
Weber, B ;
Shapero, MH ;
Wooster, R .
GENOME RESEARCH, 2004, 14 (02) :287-295
[3]   High-resolution global profiling of genomic alterations with long oligonucleotide microarray [J].
Brennan, C ;
Zhang, YY ;
Leo, C ;
Feng, B ;
Cauwels, C ;
Aguirre, AJ ;
Kim, MJ ;
Protopopov, A ;
Chin, L .
CANCER RESEARCH, 2004, 64 (14) :4744-4748
[4]  
DEGROOT MH, 1989, PROBABILITY STAT, P275
[5]  
FELLER W, 1970, INTRO PROBABILITY TH, V1, P193
[6]   A SURVEY ON IMAGE SEGMENTATION [J].
FU, KS ;
MUI, JK .
PATTERN RECOGNITION, 1981, 13 (01) :3-16
[7]   Molecular classification of familial non-BRCA1/BRCA2 breast cancer [J].
Hedenfalk, I ;
Ringnér, M ;
Ben-Dor, A ;
Yakhini, Z ;
Chen, Y ;
Chebil, G ;
Ach, R ;
Loman, N ;
Olsson, H ;
Meltzer, P ;
Borg, Å ;
Trent, J .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (05) :2532-2537
[8]   Time series segmentation for context recognition in mobile devices [J].
Himberg, J ;
Korpiaho, K ;
Mannila, H ;
Tikanmäki, J ;
Toivonen, HTT .
2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, :203-210
[9]  
HUPE P, 2004, BIOINFORMATICS
[10]  
Hyman E, 2002, CANCER RES, V62, P6240