Trustworthiness and metrics in visualizing similarity of gene expression -: art. no. 48

被引:84
作者
Kaski, S
Nikkilä, J
Oja, M
Venna, J
Törönen, P
Castrén, E
机构
[1] Aalto Univ, Neural Networks Res Ctr, FIN-02015 Helsinki, Finland
[2] Univ Kuopio, AI Virtanen Inst, FIN-70211 Kuopio, Finland
[3] Univ Helsinki, Ctr Neurosci, FIN-00014 Helsinki, Finland
关键词
D O I
10.1186/1471-2105-4-48
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i. e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets. Results: The trustworthiness of hierarchical clustering, multidimensional scaling, and the self-organizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric. Conclusions: The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it.
引用
收藏
页数:13
相关论文
共 25 条
  • [1] Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses
    Bhattacharjee, A
    Richards, WG
    Staunton, J
    Li, C
    Monti, S
    Vasa, P
    Ladd, C
    Beheshti, J
    Bueno, R
    Gillette, M
    Loda, M
    Weber, G
    Mark, EJ
    Lander, ES
    Wong, W
    Johnson, BE
    Golub, TR
    Sugarbaker, DJ
    Meyerson, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) : 13790 - 13795
  • [2] Borg I., 1997, MODERN MULTIDIMENSIO
  • [3] Cluster analysis and display of genome-wide expression patterns
    Eisen, MB
    Spellman, PT
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868
  • [4] A unifying objective function for topographic mappings
    Goodhill, GJ
    Sejnowski, TJ
    [J]. NEURAL COMPUTATION, 1997, 9 (06) : 1291 - 1303
  • [5] HASTIE T, 1995, NEURAL NETWORKS STAT
  • [6] Functional discovery via a compendium of expression profiles
    Hughes, TR
    Marton, MJ
    Jones, AR
    Roberts, CJ
    Stoughton, R
    Armour, CD
    Bennett, HA
    Coffey, E
    Dai, HY
    He, YDD
    Kidd, MJ
    King, AM
    Meyer, MR
    Slade, D
    Lum, PY
    Stepaniants, SB
    Shoemaker, DD
    Gachotte, D
    Chakraburtty, K
    Simon, J
    Bard, M
    Friend, SH
    [J]. CELL, 2000, 102 (01) : 109 - 126
  • [7] Jain K, 1988, Algorithms for clustering data
  • [8] Bankruptcy analysis with self-organizing maps in learning metrics
    Kaski, S
    Sinkkonen, J
    Peltonen, J
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2001, 12 (04): : 936 - 947
  • [9] KASKI S, 2001, P NSIP 01 IEEE EURAS
  • [10] Kohonen T., 2001, SELF ORG MAP, VThird Extend