A segmental maximum a posteriori approach to genome-wide copy number profiling

被引:26
作者
Andersson, Robin [1 ]
Bruder, Carl E. G. [2 ]
Piotrowski, Arkadiusz [2 ]
Menzel, Uwe [3 ]
Nord, Helena [3 ]
Sandgren, Johanna [4 ]
Hvidsten, Torgeir R. [1 ]
de Stahl, Teresita Diaz [3 ]
Dumanski, Jan P. [2 ,3 ]
Komorowski, Jan [1 ,5 ]
机构
[1] Uppsala Univ, Linnaeus Ctr Bioinformat, S-75124 Uppsala, Sweden
[2] Univ Alabama Birmingham, Dept Genet, Birmingham, AL 35294 USA
[3] Uppsala Univ, Dept Genet & Pathol, Rubbeck Lab, S-75124 Uppsala, Sweden
[4] Univ Uppsala Hosp, Dept Surg Sci, S-75185 Uppsala, Sweden
[5] Warsaw Univ, Interdisciplinaire Ctr Math & Comp Modelling, PL-02106 Warsaw, Poland
关键词
D O I
10.1093/bioinformatics/btn003
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Copy number profiling methods aim at assigning DNA copy numbers to chromosomal regions using measurements from microarray-based comparative genomic hybridizations. Among the proposed methods to this end, Hidden Markov Model (HMM)-based approaches seem promising since DNA copy number transitions are naturally captured in the model. Current discrete-index HMM-based approaches do not, however, take into account heterogeneous information regarding the genomic overlap between clones. Moreover, the majority of existing methods are restricted to chromosome-wise analysis. Results: We introduce a novel Segmental Maximum A Posteriori approach, SMAP, for DNA copy number profiling. Our method is based on discrete-index Hidden Markov Modeling and incorporates genomic distance and overlap between clones. We exploit a priori information through user-controllable parameterization that enables the identification of copy number deviations of various lengths and amplitudes. The model parameters may be inferred at a genome-wide scale to avoid overfitting of model parameters often resulting from chromosome-wise model inference. We report superior performances of SMAP on synthetic data when compared with two recent methods. When applied on our new experimental data, SMAP readily recognizes already known genetic aberrations including both large-scale regions with aberrant DNA copy number and changes affecting only single features on the array. We highlight the differences between the prediction of SMAP and the compared methods and show that SMAP accurately determines copy number changes and benefits from overlap consideration.
引用
收藏
页码:751 / 758
页数:8
相关论文
共 38 条
[1]   The LCB Data Warehouse [J].
Ameur, A ;
Yankovski, V ;
Enroth, S ;
Spjuth, O ;
Komorowski, J .
BIOINFORMATICS, 2006, 22 (08) :1024-1026
[2]  
[Anonymous], 2012, Probability Theory: The Logic Of Science
[3]   Array comparative genomic hybridization reveals genomic copy number changes associated with outcome in diffuse large B-cell lymphomas [J].
Chen, WY ;
Houldsworth, J ;
Olshen, AB ;
Nanjangud, G ;
Chaganti, S ;
Venkatraman, ES ;
Halaas, J ;
Teruya-Feldstein, J ;
Zelenetz, AD ;
Chaganti, RSK .
BLOOD, 2006, 107 (06) :2477-2485
[4]   QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data [J].
Colella, Stefano ;
Yau, Christopher ;
Taylor, Jennifer M. ;
Mirza, Ghazala ;
Butler, Helen ;
Clouston, Penny ;
Bassett, Anne S. ;
Seller, Anneke ;
Holmes, Christopher C. ;
Ragoussis, Jiannis .
NUCLEIC ACIDS RESEARCH, 2007, 35 (06) :2013-2025
[5]   Chromosome 22 tiling-path array-CGH analysis identifies germ-line- and tumor-specific aberrations in patients with glioblastoma multiforme [J].
de Ståhl, TD ;
Hartmann, C ;
Bustos, C ;
Piotrowski, A ;
Benetkiewicz, M ;
Mantripragada, KK ;
Tykwinski, T ;
von Deimling, A ;
Dumanski, JP .
GENES CHROMOSOMES & CANCER, 2005, 44 (02) :161-169
[6]   Quantile smoothing of array CGH data [J].
Eilers, PHC ;
de Menezes, RX .
BIOINFORMATICS, 2005, 21 (07) :1146-1153
[7]   A pseudolikelihood approach for simultaneous analysis of array comparative genomic hybridizations [J].
Engler, David A. ;
Mohapatra, Gayatry ;
Louis, David N. ;
Betensky, Rebecca A. .
BIOSTATISTICS, 2006, 7 (03) :399-421
[8]   Hidden Markov models approach to the analysis of array CGH data [J].
Fridlyand, J ;
Snijders, AM ;
Pinkel, D ;
Albertson, DG ;
Jain, AN .
JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 90 (01) :132-153
[9]  
GAUVAIN JL, 1992, DARPA SP NAT LONG WO
[10]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)