Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data

被引:195
作者
Daub, CO [1 ]
Steuer, R
Selbig, J
Kloska, S
机构
[1] Max Planck Inst Mol Plant Physiol, D-14424 Potsdam, Germany
[2] Univ Potsdam, Nonlinear Dynam Grp, Inst Phys, D-14415 Potsdam, Germany
[3] Scien AG, D-12489 Berlin, Germany
[4] Karolinska Inst, Ctr Genom & Bioinformat, S-17177 Stockholm, Sweden
关键词
D O I
10.1186/1471-2105-5-118
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. Results: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non-commercial use from kloska@scienion.de upon request. Conclusion: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended.
引用
收藏
页数:12
相关论文
共 33 条
  • [1] BUTTE AJ, 2000, PAC S BIOCOMPUT, V5, P427
  • [2] D'haeseleer P, 1998, INFORMATION PROCESSING IN CELLS AND TISSUES, P203
  • [3] Genetic network inference: from co-expression clustering to reverse engineering
    D'haeseleer, P
    Liang, SD
    Somogyi, R
    [J]. BIOINFORMATICS, 2000, 16 (08) : 707 - 726
  • [4] De Boor C., 1978, PRACTICAL GUIDE SPLI, DOI DOI 10.1007/978-1-4612-6333-3
  • [5] Cluster analysis and display of genome-wide expression patterns
    Eisen, MB
    Spellman, PT
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868
  • [6] ELLIS DP, 2000, P INT C SPOK LANG PR
  • [7] INDEPENDENT COORDINATES FOR STRANGE ATTRACTORS FROM MUTUAL INFORMATION
    FRASER, AM
    SWINNEY, HL
    [J]. PHYSICAL REVIEW A, 1986, 33 (02): : 1134 - 1140
  • [8] Gorodkin J, 1997, COMPUT APPL BIOSCI, V13, P583
  • [9] GROSSE I, 1996, EVOLUTION STRUKTUREN, P181
  • [10] Microarray standard data set and figures of merit for comparing data processing methods and experiment designs
    He, YDD
    Dai, HY
    Schadt, EE
    Cavet, G
    Edwards, SW
    Stepaniants, SB
    Duenwald, S
    Kleinhanz, R
    Jones, AR
    Shoemaker, DD
    Stoughton, RB
    [J]. BIOINFORMATICS, 2003, 19 (08) : 956 - 965