Gene Expression Prediction by Soft Integration and the Elastic Net-Best Performance of the DREAM3 Gene Expression Challenge

被引:18
作者
Gustafsson, Mika [1 ]
Hornquist, Michael [1 ]
机构
[1] Linkoping Univ, Dept Sci & Technol, Norrkoping, Sweden
来源
PLOS ONE | 2010年 / 5卷 / 02期
关键词
EFFECTIVE DIMENSIONALITY; REGULATORY ASSOCIATIONS; NETWORK; REGRESSION; INFERENCE; SELECTION; DATABASE; SYSTEMS; LASSO; TOOLS;
D O I
10.1371/journal.pone.0009134
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: To predict gene expressions is an important endeavour within computational systems biology. It can both be a way to explore how drugs affect the system, as well as providing a framework for finding which genes are interrelated in a certain process. A practical problem, however, is how to assess and discriminate among the various algorithms which have been developed for this purpose. Therefore, the DREAM project invited the year 2008 to a challenge for predicting gene expression values, and here we present the algorithm with best performance. Methodology/Principal Findings: We develop an algorithm by exploring various regression schemes with different model selection procedures. It turns out that the most effective scheme is based on least squares, with a penalty term of a recently developed form called the "elastic net". Key components in the algorithm are the integration of expression data from other experimental conditions than those presented for the challenge and the utilization of transcription factor binding data for guiding the inference process towards known interactions. Of importance is also a cross-validation procedure where each form of external data is used only to the extent it increases the expected performance. Conclusions/Significance: Our algorithm proves both the possibility to extract information from large-scale expression data concerning prediction of gene levels, as well as the benefits of integrating different data sources for improving the inference. We believe the former is an important message to those still hesitating on the possibilities for computational approaches, while the latter is part of an important way forward for the future development of the field of computational systems biology.
引用
收藏
页数:8
相关论文
共 34 条
  • [21] Effective dimensionality of large-scale expression data using principal component analysis
    Hörnquist, M
    Hertz, J
    Wahde, M
    [J]. BIOSYSTEMS, 2002, 65 (2-3) : 147 - 156
  • [22] Functional discovery via a compendium of expression profiles
    Hughes, TR
    Marton, MJ
    Jones, AR
    Roberts, CJ
    Stoughton, R
    Armour, CD
    Bennett, HA
    Coffey, E
    Dai, HY
    He, YDD
    Kidd, MJ
    King, AM
    Meyer, MR
    Slade, D
    Lum, PY
    Stepaniants, SB
    Shoemaker, DD
    Gachotte, D
    Chakraburtty, K
    Simon, J
    Bard, M
    Friend, SH
    [J]. CELL, 2000, 102 (01) : 109 - 126
  • [23] Computational systems biology
    Kitano, H
    [J]. NATURE, 2002, 420 (6912) : 206 - 210
  • [24] YEASTRACT-DISCOVERER:: new tools to improve the analysis of transcriptional regulatory associations in Saccharomyces cerevisiae
    Monteiro, Pedro T.
    Mendes, Nuno D.
    Teixeira, Miguel C.
    d'Orey, Sofia
    Tenreiro, Sandra
    Mira, Nuno P.
    Pais, Helio
    Francisco, Alexandre P.
    Carvalho, Alexandra M.
    Lourenco, Artur B.
    Sa-Correia, Isabel
    Oliveira, Arlindo L.
    Freitas, Ana T.
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D132 - D136
  • [25] THE MINIMUM SUM OF ABSOLUTE ERRORS REGRESSION - A STATE OF THE ART SURVEY
    NARULA, SC
    WELLINGTON, JF
    [J]. INTERNATIONAL STATISTICAL REVIEW, 1982, 50 (03) : 317 - 326
  • [26] A Top-Performing Algorithm for the DREAM3 Gene Expression Prediction Challenge
    Ruan, Jianhua
    [J]. PLOS ONE, 2010, 5 (02):
  • [27] STOLOVITZKY G, 2009, PLOS ONE, V5, DOI DOI 10.1371/JOURNAL.PONE.0009202
  • [28] Lessons from the DREAM2 Challenges A Community Effort to Assess Biological Network Inference
    Stolovitzky, Gustavo
    Prill, Robert J.
    Califano, Andrea
    [J]. CHALLENGES OF SYSTEMS BIOLOGY: COMMUNITY EFFORTS TO HARNESS BIOLOGICAL COMPLEXITY, 2009, 1158 : 159 - 195
  • [29] The YEASTRACT database:: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae
    Teixeira, Miguel C.
    Monteiro, Pedro
    Jain, Pooja
    Tenreiro, Sandra
    Fernandes, Alexandra R.
    Mira, Nuno P.
    Alenquer, Marta
    Freitas, Ana T.
    Oliveira, Arlindo L.
    Sa-Correia, Isabel
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : D446 - D451
  • [30] Thorsson V, 2005, STAT APPL GENET MOL, V4