Predicting protein-protein interactions in unbalanced data using the primary structure of proteins

被引:58
作者
Yu, Chi-Yuan [2 ]
Chou, Lih-Ching [2 ]
Chang, Darby Tien-Hao [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Elect Engn, Tainan 70101, Taiwan
[2] Natl Taiwan Univ, Grad Inst Biomed Elect & Bioinformat, Taipei 106, Taiwan
来源
BMC BIOINFORMATICS | 2010年 / 11卷
关键词
INTERACTION NETWORKS; RESOURCE; COMPLEXES;
D O I
10.1186/1471-2105-11-167
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Elucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary structure. Most of these approaches have achieved satisfactory performance on datasets comprising equal number of interacting and non-interacting protein pairs. However, this ratio is highly unbalanced in nature, and these techniques have not been comprehensively evaluated with respect to the effect of the large number of non-interacting pairs in realistic datasets. Moreover, since highly unbalanced distributions usually lead to large datasets, more efficient predictors are desired when handling such challenging tasks. Results: This study presents a method for PPI prediction based only on sequence information, which contributes in three aspects. First, we propose a probability-based mechanism for transforming protein sequences into feature vectors. Second, the proposed predictor is designed with an efficient classification algorithm, where the efficiency is essential for handling highly unbalanced datasets. Third, the proposed PPI predictor is assessed with several unbalanced datasets with different positive-to-negative ratios (from 1:1 to 1:15). This analysis provides solid evidence that the degree of dataset imbalance is important to PPI predictors. Conclusions: Dealing with data imbalance is a key issue in PPI prediction since there are far fewer interacting protein pairs than non-interacting ones. This article provides a comprehensive study on this issue and develops a practical tool that achieves both good prediction performance and efficiency using only protein sequence information.
引用
收藏
页数:10
相关论文
共 49 条
  • [1] InterPreTS: protein Interaction Prediction through Tertiary Structure
    Aloy, P
    Russell, RB
    [J]. BIOINFORMATICS, 2003, 19 (01) : 161 - 162
  • [2] Interrogating protein interaction networks through structural biology
    Aloy, P
    Russell, RB
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (09) : 5896 - 5901
  • [3] Artin E., 1964, The Gamma Function
  • [4] Bader GD, 2003, NUCLEIC ACIDS RES, V31, P248, DOI 10.1093/nar/gkg056
  • [5] Analyzing yeast protein-protein interaction data obtained from different sources
    Bader, GD
    Hogue, CWV
    [J]. NATURE BIOTECHNOLOGY, 2002, 20 (10) : 991 - 997
  • [6] The universal protein resource (UniProt)
    Bairoch, Amos
    Bougueleret, Lydie
    Altairac, Severine
    Amendolia, Valeria
    Auchincloss, Andrea
    Puy, Ghislaine Argoud
    Axelsen, Kristian
    Baratin, Delphine
    Blatter, Marie-Claude
    Boeckmann, Brigitte
    Bollondi, Laurent
    Boutet, Emmanuel
    Quintaje, Silvia Braconi
    Breuza, Lionel
    Bridge, Alan
    deCastro, Edouard
    Coral, Danielle
    Coudert, Elisabeth
    Cusin, Isabelle
    Dobrokhotov, Pavel
    Dornevil, Dolnide
    Duvaud, Severine
    Estreicher, Anne
    Famiglietti, Livia
    Feuermann, Marc
    Gehant, Sebastian
    Farriol-Mathis, Nathalie
    Ferro, Serenella
    Gasteiger, Elisabeth
    Gateau, Alain
    Gerritsen, Vivienne
    Gos, Arnaud
    Gruaz-Gumowski, Nadine
    Hinz, Ursula
    Hulo, Chantal
    Hulo, Nicolas
    Ioannidis, Vassilios
    Ivanyi, Ivan
    James, Janet
    Jain, Eric
    Jimenez, Silvia
    Jungo, Florence
    Junker, Vivien
    Keller, Guillaume
    Lachaize, Corinne
    Lane-Guermonprez, Lydie
    Langendijk-Genevaux, Petra
    Lara, Vicente
    Lemercier, Philippe
    Le Saux, Virginie
    [J]. NUCLEIC ACIDS RESEARCH, 2007, 35 : D193 - D197
  • [7] Kernel methods for predicting protein-protein interactions
    Ben-Hur, A
    Noble, WS
    [J]. BIOINFORMATICS, 2005, 21 : I38 - I46
  • [8] Predicting protein-protein interactions from primary structure
    Bock, JR
    Gough, DA
    [J]. BIOINFORMATICS, 2001, 17 (05) : 455 - 460
  • [9] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [10] CHANG D, 2008, BMC RES NOTES, V1