From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data

被引:262
作者
Opgen-Rhein, Rainer
Strimmer, Korbinian
机构
[1] Univ Munich, Dept Stat, D-80539 Munich, Germany
[2] Univ Leipzig, IMISE, D-04107 Leipzig, Germany
关键词
D O I
10.1186/1752-0509-1-37
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: The use of correlation networks is widespread in the analysis of gene expression and proteomics data, even though it is known that correlations not only confound direct and indirect associations but also provide no means to distinguish between cause and effect. For "causal" analysis typically the inference of a directed graphical model is required. However, this is rather difficult due to the curse of dimensionality. Results: We propose a simple heuristic for the statistical learning of a high-dimensional "causal" network. The method first converts a correlation network into a partial correlation graph. Subsequently, a partial ordering of the nodes is established by multiple testing of the log-ratio of standardized partial variances. This allows identifying a directed acyclic causal network as a subgraph of the partial correlation network. We illustrate the approach by analyzing a large Arabidopsis thaliana expression data set. Conclusion: The proposed approach is a heuristic algorithm that is based on a number of approximations, such as substituting lower order partial correlations by full order partial correlations. Nevertheless, for small samples and for sparse networks the algorithm not only yield sensible first order approximations of the causal structure in high-dimensional genomic data but is also computationally highly efficient. Availability and Requirements: The method is implemented in the "GeneNet" R package (version 1.2.0), available from CRAN and from http://strimmerlab.org/software/genets/. The software includes an R script for reproducing the network analysis of the Arabidopsis thaliana data.
引用
收藏
页数:10
相关论文
共 41 条
  • [1] A different paradigm for the initial colonisation of Sahul
    Allen, Jim
    O'Connell, James F.
    [J]. ARCHAEOLOGY IN OCEANIA, 2020, 55 (01) : 1 - 14
  • [2] [Anonymous], 2006, REVSTAT-STAT J
  • [3] Network biology:: Understanding the cell's functional organization
    Barabási, AL
    Oltvai, ZN
    [J]. NATURE REVIEWS GENETICS, 2004, 5 (02) : 101 - U15
  • [4] Statistical analysis of financial networks
    Boginski, V
    Butenko, S
    Pardalos, PM
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2005, 48 (02) : 431 - 443
  • [5] Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks
    Butte, AJ
    Tamayo, P
    Slonim, D
    Golub, TR
    Kohane, IS
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (22) : 12182 - 12186
  • [6] Castelo R, 2006, J MACH LEARN RES, V7, P2621
  • [7] Learning equivalence classes of Bayesian-network structures
    Chickering, DM
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (03) : 445 - 498
  • [8] LINEAR DEPENDENCIES REPRESENTED BY CHAIN GRAPHS
    COX, DR
    WERMUTH, N
    [J]. STATISTICAL SCIENCE, 1993, 8 (03) : 204 - 218
  • [9] Discovery of meaningful associations in genomic data using partial correlation coefficients
    de la Fuente, A
    Bing, N
    Hoeschele, I
    Mendes, P
    [J]. BIOINFORMATICS, 2004, 20 (18) : 3565 - 3574
  • [10] Sparse graphical models for exploring gene expression data
    Dobra, A
    Hans, C
    Jones, B
    Nevins, JR
    Yao, GA
    West, M
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 90 (01) : 196 - 212