Computational cluster validation in post-genomic data analysis

被引:612
作者
Handl, J [1 ]
Knowles, J [1 ]
Kell, DB [1 ]
机构
[1] Univ Manchester, Sch Chem, Manchester M60 1QD, Lancs, England
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1093/bioinformatics/bti517
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The discovery of novel biological knowledge from the ab initio analysis of post-genomic data relies upon the use of unsupervised processing methods, in particular clustering techniques. Much recent research in bioinformatics has therefore been focused on the transfer of clustering methods introduced in other scientific fields and on the development of novel algorithms specifically designed to tackle the challenges posed by post-genomic data. The partitions returned by a clustering algorithm are commonly validated using visual inspection and concordance with prior biological knowledge-whether the clusters actually correspond to the real structure in the data is somewhat less frequently considered. Suitable computational cluster validation techniques are available in the general data-mining literature, but have been given only a fraction of the same attention in bioinformatics. Results: This review paper aims to familiarize the reader with the battery of techniques available for the validation of clustering results, with a particular focus on their application to post-genomic data analysis. Synthetic and real biological datasets are used to demonstrate the benefits, and also some of the perils, of analytical clustervalidation. Availability: The software used in the experiments is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/ Contact: J.Handl@postgrad.manchester.ac.uk Supplementary information: Enlarged colour plots are provided in the Supplementary Material, which is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/
引用
收藏
页码:3201 / 3212
页数:12
相关论文
共 75 条
  • [1] Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
  • [2] [Anonymous], 1996, P AAAI INT C KNOWL D
  • [3] Nonparametric genetic clustering: Comparison of validity indices
    Bandyopadhyay, S
    Maulik, U
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2001, 31 (01): : 120 - 125
  • [4] Ben-Dor Amir., 2002, Overabundance analysis and class discovery in gene expression data
  • [5] BENHUR A, 2002, PAC S BIOC NEW JERS
  • [6] Some new indexes of cluster validity
    Bezdek, JC
    Pal, NR
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1998, 28 (03): : 301 - 315
  • [7] The advantage of functional prediction based on clustering of yeast genes and its correlation with non-sequence based classifications
    Bilu, Y
    Linial, M
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (02) : 193 - 210
  • [8] Molecular classification of cutaneous malignant melanoma by gene expression profiling
    Bittner, M
    Meitzer, P
    Chen, Y
    Jiang, Y
    Seftor, E
    Hendrix, M
    Radmacher, M
    Simon, R
    Yakhini, Z
    Ben-Dor, A
    Sampas, N
    Dougherty, E
    Wang, E
    Marincola, F
    Gooden, C
    Lueders, J
    Glatfelter, A
    Pollock, P
    Carpten, J
    Gillanders, E
    Leja, D
    Dietrich, K
    Beaudry, C
    Berens, M
    Alberts, D
    Sondak, V
    Hayward, N
    Trent, J
    [J]. NATURE, 2000, 406 (6795) : 536 - 540
  • [9] An integrated tool for microarray data clustering and cluster validity assessment
    Bolshakova, N
    Azuaje, F
    Cunningham, P
    [J]. BIOINFORMATICS, 2005, 21 (04) : 451 - 455
  • [10] Cluster validation techniques for genome expression data
    Bolshakova, N
    Azuaje, F
    [J]. SIGNAL PROCESSING, 2003, 83 (04) : 825 - 833