Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering

被引:19
作者
Dotan-Cohen, Dikla [1 ]
Kasif, Simon [2 ,3 ,4 ,5 ]
Melkman, Avraham A. [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Comp Sci, IL-84105 Beer Sheva, Israel
[2] Boston Univ, Dept Biomed Engn, Boston, MA 02215 USA
[3] Boston Univ, Ctr Adv Genom Technol, Boston, MA 02215 USA
[4] Boston Univ, Bioinformat Program, Boston, MA 02215 USA
[5] Harvard MIT Program Hlth Sci & Technol, Childrens Hosp Boston, Boston, MA 02115 USA
关键词
SEMANTIC SIMILARITY; MICROARRAY; KNOWLEDGE;
D O I
10.1093/bioinformatics/btp327
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: There is a growing interest in improving the cluster analysis of expression data by incorporating into it prior knowledge, such as the Gene Ontology (GO) annotations of genes, in order to improve the biological relevance of the clusters that are subjected to subsequent scrutiny. The structure of the GO is another source of background knowledge that can be exploited through the use of semantic similarity. Results: We propose here a novel algorithm that integrates semantic similarities (derived from the ontology structure) into the procedure of deriving clusters from the dendrogram constructed during expression-based hierarchical clustering. Our approach can handle the multiple annotations, from different levels of the GO hierarchy, which most genes have. Moreover, it treats annotated and unannotated genes in a uniform manner. Consequently, the clusters obtained by our algorithm are characterized by significantly enriched annotations. In both cross-validation tests and when using an external index such as protein-protein interactions, our algorithm performs better than previous approaches. When applied to human cancer expression data, our algorithm identifies, among others, clusters of genes related to immune response and glucose metabolism. These clusters are also supported by protein-protein interaction data.
引用
收藏
页码:1789 / 1795
页数:7
相关论文
共 29 条
  • [1] [Anonymous], 1997, P 10 RES COMP LING I
  • [2] Towards knowledge-based gene expression data mining
    Bellazzi, Riccardo
    Zupan, Blaz
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2007, 40 (06) : 787 - 802
  • [3] The CRASSS plug-in for integrating annotation data with hierarchical clustering results
    Buehler, EC
    Sachs, JR
    Shao, K
    Bagchi, A
    Ungar, LH
    [J]. BIOINFORMATICS, 2004, 20 (17) : 3266 - 3269
  • [4] Cheng Jill, 2004, J Biopharm Stat, V14, P687, DOI 10.1081/BIP-200025659
  • [5] Siglecs and their roles in the immune system
    Crocker, Paul R.
    Paulson, James C.
    Varki, Ajit
    [J]. NATURE REVIEWS IMMUNOLOGY, 2007, 7 (04) : 255 - 266
  • [6] Pathways to the analysis of microarray data
    Curtis, RK
    Oresic, M
    Vidal-Puig, A
    [J]. TRENDS IN BIOTECHNOLOGY, 2005, 23 (08) : 429 - 435
  • [7] GOurmet: A tool for quantitative comparison and visualization of gene expression profiles based on gene ontology (GO) distributions
    Doherty, JM
    Carmichael, LK
    Mills, JC
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [8] Hierarchical tree snipping: clustering guided by prior knowledge
    Dotan-Cohen, Dikla
    Melkman, Avraham A.
    Kasif, Simon
    [J]. BIOINFORMATICS, 2007, 23 (24) : 3335 - 3342
  • [9] Knowledge guided analysis of microarray data
    Fang, Zhuo
    Yang, Hong
    Li, Yixue
    Luo, Qingming
    Liu, Lei
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2006, 39 (04) : 401 - 411
  • [10] Why do cancers have high aerobic glycolysis?
    Gatenby, RA
    Gillies, RJ
    [J]. NATURE REVIEWS CANCER, 2004, 4 (11) : 891 - 899