Hierarchical tree snipping: clustering guided by prior knowledge

被引:15
作者
Dotan-Cohen, Dikla [1 ]
Melkman, Avraham A.
Kasif, Simon
机构
[1] Ben Gurion Univ Negev, Dept Comp Sci, IL-84105 Beer Sheva, Israel
[2] Childrens Hosp, Dept Biomed Engn, Boston, MA 02115 USA
[3] Childrens Hosp, Ctr Adv Genom Technol, Boston, MA 02115 USA
[4] Childrens Hosp, Bioinformat Program, Boston, MA 02115 USA
[5] Childrens Hosp, Harvard MIT Program Hlth Sci & Technol, Boston, MA 02115 USA
关键词
D O I
10.1093/bioinformatics/btm526
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Hierarchical clustering is widely used to cluster genes into groups based on their expression similarity. This method first constructs a tree. Next this tree is partitioned into subtrees by cutting all edges at some level, thereby inducing a clustering. Unfortunately, the resulting clusters often do not exhibit significant functional coherence. Results: To improve the biological significance of the clustering, we develop a new framework of partitioning by snipping-cutting selected edges at variable levels. The snipped edges are selected to induce clusters that are maximally consistent with partially available background knowledge such as functional classifications. Algorithms for two key applications are presented: functional prediction of genes, and discovery of functionally enriched clusters of co-expressed genes. Simulation results and cross-validation tests indicate that the algorithms perform well even when the actual number of clusters differs considerably from the requested number. Performance is improved compared with a previously proposed algorithm.
引用
收藏
页码:3335 / 3342
页数:8
相关论文
共 47 条
  • [1] Gene-Ontology-based clustering of gene expression data
    Adryan, B
    Schuh, R
    [J]. BIOINFORMATICS, 2004, 20 (16) : 2851 - 2852
  • [2] AGUILERA M, 1999, DROS RES C, V40, pA473
  • [3] Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
    Alizadeh, AA
    Eisen, MB
    Davis, RE
    Ma, C
    Lossos, IS
    Rosenwald, A
    Boldrick, JG
    Sabet, H
    Tran, T
    Yu, X
    Powell, JI
    Yang, LM
    Marti, GE
    Moore, T
    Hudson, J
    Lu, LS
    Lewis, DB
    Tibshirani, R
    Sherlock, G
    Chan, WC
    Greiner, TC
    Weisenburger, DD
    Armitage, JO
    Warnke, R
    Levy, R
    Wilson, W
    Grever, MR
    Byrd, JC
    Botstein, D
    Brown, PO
    Staudt, LM
    [J]. NATURE, 2000, 403 (6769) : 503 - 511
  • [4] [Anonymous], 1988, PROBABILISTIC REASON, DOI DOI 10.1016/C2009-0-27609-4
  • [5] Gene expression during the life cycle of Drosophila melanogaster
    Arbeitman, MN
    Furlong, EEM
    Imam, F
    Johnson, E
    Null, BH
    Baker, BS
    Krasnow, MA
    Scott, MP
    Davis, RW
    White, KP
    [J]. SCIENCE, 2002, 297 (5590) : 2270 - 2275
  • [6] Correlation clustering
    Bansal, N
    Blum, A
    Chawla, S
    [J]. MACHINE LEARNING, 2004, 56 (1-3) : 89 - 113
  • [7] K-ary clustering with optimal leaf ordering for gene expression data
    Bar-Joseph, Z
    Demaine, ED
    Gifford, DK
    Srebro, N
    Hamel, AM
    Jaakkola, TS
    [J]. BIOINFORMATICS, 2003, 19 (09) : 1070 - 1078
  • [8] Robust cluster analysis of microarray gene expression data with the number of clusters determined biologically
    Bickel, DR
    [J]. BIOINFORMATICS, 2003, 19 (07) : 818 - 824
  • [9] Bilenko M., 2004, P 21 INT C MACH LEAR, P11, DOI [DOI 10.1145/1015330.1015360, 10.1145/1015330.1015360]
  • [10] Bolshakova N, 2006, METHOD INFORM MED, V45, P153