clValid: An R package for cluster validation

被引:497
作者
Brock, Guy [1 ]
Datta, Susmita [1 ]
Pihur, Vasyl [1 ]
Datta, Somnath [1 ]
机构
[1] Univ Louisville, Sch Publ Hlth & Informat Sci, Dept Bioinformat & Biostat, Louisville, KY 40292 USA
来源
JOURNAL OF STATISTICAL SOFTWARE | 2008年 / 25卷 / 04期
基金
美国国家科学基金会;
关键词
clustering; validation; R package; stability measures; biological annotation;
D O I
10.18637/jss.v025.i04
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The R package clValid contains functions for validating the results of a clustering analysis. There are three main types of cluster validation measures available, "internal", "stability", and "biological". The user can choose from nine clustering algorithms in existing R packages, including hierarchial, K-means, self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing maps (SOM), and model-based clustering. In addition, we provide a function to perform the self-organizing tree algorithm (SOTA) method of clustering. Any combination of validation measures and clustering methods can be requested in a single function call. This allows the user to simultaneouly evaluate several clustering algorithms while varying the number of clusters, to help determine the most appropriate method and number of clusters for the dataset of interest. Additionally, the package can automatically make use of the biological information contained in the Gene Ontology (GO) database to calculate the biological validation measures, via the annotation packages available in Bioconductor. The function returns an object of S4 class "clValid", which has summary, plot, print, and additional methods which allow the user to display the optimal validation scores and extract clustering results.
引用
收藏
页码:1 / 22
页数:22
相关论文
共 40 条
  • [1] FatiGO:: a web tool for finding significant associations of Gene Ontology terms with groups of genes
    Al-Shahrour, F
    Díaz-Uriarte, R
    Dopazo, J
    [J]. BIOINFORMATICS, 2004, 20 (04) : 578 - 580
  • [2] [Anonymous], J AM STAT ASS
  • [3] [Anonymous], SELF ORGANIZING MAPS
  • [4] Neural crest and mesoderm lineage-dependent gene expression in orofacial development
    Bhattacherjee, Vasker
    Mukhopadhyay, Partha
    Singh, Saurabh
    Johnson, Charles
    Philipose, John T.
    Warner, Courtney P.
    Greene, Robert M.
    Pisano, M. Michele
    [J]. DIFFERENTIATION, 2007, 75 (05) : 463 - 477
  • [5] A knowledge-driven approach to cluster validity assessment
    Bolshakova, N
    Azuaje, F
    Cunningham, P
    [J]. BIOINFORMATICS, 2005, 21 (10) : 2546 - 2547
  • [6] The transcriptional program of sporulation in budding yeast
    Chu, S
    DeRisi, J
    Eisen, M
    Mulholland, J
    Botstein, D
    Brown, PO
    Herskowitz, I
    [J]. SCIENCE, 1998, 282 (5389) : 699 - 705
  • [7] Comparisons and validation of statistical clustering techniques for microarray gene expression data
    Datta, S
    Datta, S
    [J]. BIOINFORMATICS, 2003, 19 (04) : 459 - 466
  • [8] Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes
    Datta, Susmita
    Datta, Somnath
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [9] Fuzzy C-means method for clustering microarray data
    Dembélé, D
    Kastner, P
    [J]. BIOINFORMATICS, 2003, 19 (08) : 973 - 980
  • [10] Exploring the metabolic and genetic control of gene expression on a genomic scale
    DeRisi, JL
    Iyer, VR
    Brown, PO
    [J]. SCIENCE, 1997, 278 (5338) : 680 - 686