Data reduction for spectral clustering to analyze high throughput flow cytometry data

被引:122
作者
Zare, Habil [1 ,2 ]
Shooshtari, Parisa [1 ,2 ]
Gupta, Arvind [3 ]
Brinkman, Ryan R. [1 ,4 ]
机构
[1] British Columbia Canc Agcy, Terry Fox Lab, Vancouver, BC V5Z 1L3, Canada
[2] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1W5, Canada
[3] Univ British Columbia, Fac Sci, Vancouver, BC V5Z 1M9, Canada
[4] Univ British Columbia, Dept Med Genet, Vancouver, BC, Canada
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
IDENTIFICATION;
D O I
10.1186/1471-2105-11-403
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Recent biological discoveries have shown that clustering large datasets is essential for better understanding biology in many areas. Spectral clustering in particular has proven to be a powerful tool amenable for many applications. However, it cannot be directly applied to large datasets due to time and memory limitations. To address this issue, we have modified spectral clustering by adding an information preserving sampling procedure and applying a post-processing stage. We call this entire algorithm SamSPECTRAL. Results: We tested our algorithm on flow cytometry data as an example of large, multidimensional data containing potentially hundreds of thousands of data points (i.e., "events" in flow cytometry, typically corresponding to cells). Compared to two state of the art model-based flow cytometry clustering methods, SamSPECTRAL demonstrates significant advantages in proper identification of populations with non-elliptical shapes, low density populations close to dense ones, minor subpopulations of a major population and rare populations. Conclusions: This work is the first successful attempt to apply spectral methodology on flow cytometry data. An implementation of our algorithm as an R package is freely available through BioConductor.
引用
收藏
页数:16
相关论文
共 51 条
[1]  
AGHAEEPOUR N, 2009, P NIPS WORKSH CLUST
[2]  
ALTEROVITZ MRG, 2009, AUTOMATION PROTEOMIC
[3]  
[Anonymous], 1997, NUMERICAL LINEAR ALG
[4]  
[Anonymous], 1997, C BOARD MATH SCI
[5]  
[Anonymous], 2007, Lecture Notes in Mathematics
[6]  
AZRAN A., 2006, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, V1, P190
[7]  
Bach FR, 2006, J MACH LEARN RES, V7, P1963
[8]   Flow cytometry and FISH to measure the average length of telomeres (flow FISH) [J].
Baerlocher, Gabriela M. ;
Vulto, Irma ;
de Jong, Gary ;
Lansdorp, Peter M. .
NATURE PROTOCOLS, 2006, 1 (05) :2365-2376
[9]  
Bashashati Ali., 2009, Advances in Bioinformatics, V2009, P1, DOI DOI 10.1155/2009/584603
[10]  
BIGGS N, 2007, TOPICS ALGEBRAIC GRA, V16, P171