Simcluster:: clustering enumeration gene expression data on the simplex space

被引:8
作者
Vencio, Ricardo Z. N. [1 ]
Varuzza, Leonardo [2 ]
Pereira, Carlos A. de B. [2 ]
Brentani, Helena [3 ]
Shmulevich, Ilya [1 ]
机构
[1] Inst Syst Biol, Seattle, WA 98103 USA
[2] Univ Sao Paulo, BIOINFO USP, Sao Paulo, Brazil
[3] Hosp Canc AC Camargo, Sao Paulo, Brazil
关键词
D O I
10.1186/1471-2105-8-246
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results: Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion: Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.
引用
收藏
页数:10
相关论文
共 27 条
[1]  
AITCHISON J, 1986, STAT ANN COMPOSITION
[2]  
AITCHISON J, 2001, CONT MATH SERIES, V287, P1
[3]   The significance of digital gene expression profiles [J].
Audic, S ;
Claverie, JM .
GENOME RESEARCH, 1997, 7 (10) :986-995
[4]   Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach [J].
Bainbridge, Matthew N. ;
Warren, Rene L. ;
Hirst, Martin ;
Romanuik, Tammy ;
Zeng, Thomas ;
Go, Anne ;
Delaney, Allen ;
Griffith, Malachi ;
Hickenbotham, Matthew ;
Magrini, Vincent ;
Mardis, Elaine R. ;
Sadar, Marianne D. ;
Siddiqui, Asim S. ;
Marra, Marco A. ;
Jones, Steven J. M. .
BMC GENOMICS, 2006, 7 (1)
[5]   An integrated tool for microarray data clustering and cluster validity assessment [J].
Bolshakova, N ;
Azuaje, F ;
Cunningham, P .
BIOINFORMATICS, 2005, 21 (04) :451-455
[6]   Sequence information can be obtained from single DNA molecules [J].
Braslavsky, I ;
Hebert, B ;
Kartalov, E ;
Quake, SR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (07) :3960-3964
[7]   Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays [J].
Brenner, S ;
Johnson, M ;
Bridgham, J ;
Golda, G ;
Lloyd, DH ;
Johnson, D ;
Luo, SJ ;
McCurdy, S ;
Foy, M ;
Ewan, M ;
Roth, R ;
George, D ;
Eletr, S ;
Albrecht, G ;
Vermaas, E ;
Williams, SR ;
Moon, K ;
Burcham, T ;
Pallas, M ;
DuBridge, RB ;
Kirchner, J ;
Fearon, K ;
Mao, J ;
Corcoran, K .
NATURE BIOTECHNOLOGY, 2000, 18 (06) :630-634
[8]   Model-based evaluation of clustering validation measures [J].
Brun, Marcel ;
Sima, Chao ;
Hua, Jianping ;
Lowey, James ;
Carroll, Brent ;
Suh, Edward ;
Dougherty, Edward R. .
PATTERN RECOGNITION, 2007, 40 (03) :807-824
[9]   Clustering analysis of SAGE data using a Poisson approach [J].
Cai, L ;
Huang, HY ;
Blackshaw, S ;
Liu, JS ;
Cepko, C ;
Wong, WH .
GENOME BIOLOGY, 2004, 5 (07)
[10]   Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes [J].
Datta, Susmita ;
Datta, Somnath .
BMC BIOINFORMATICS, 2006, 7 (1)