A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data

被引:47
作者
Costa, Pedro R. [1 ]
Acencio, Marcio L. [1 ]
Lemke, Ney [1 ]
机构
[1] Univ Estadual Paulista, UNESP, Inst Biociencias Botucatu, Dept Fis & Biofis, BR-18618970 Sao Paulo, Brazil
来源
BMC GENOMICS | 2010年 / 11卷
基金
巴西圣保罗研究基金会;
关键词
PLASMINOGEN-ACTIVATOR; DISEASE; ASSOCIATION; IDENTIFICATION; TARGET; PRIORITIZATION; TOOL;
D O I
10.1186/1471-2164-11-S5-S9
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products. Results: In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively. Conclusions: We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.
引用
收藏
页数:15
相关论文
共 57 条
[1]   Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information [J].
Acencio, Marcio L. ;
Lemke, Ney .
BMC BIOINFORMATICS, 2009, 10 :290
[2]   Speeding disease gene discovery by sequence based candidate prioritization [J].
Adie, EA ;
Adams, RR ;
Evans, KL ;
Porteous, DJ ;
Pickard, BS .
BMC BIOINFORMATICS, 2005, 6 (1)
[3]   Gene prioritization through genomic data fusion [J].
Aerts, S ;
Lambrechts, D ;
Maity, S ;
Van Loo, P ;
Coessens, B ;
De Smet, F ;
Tranchevent, LC ;
De Moor, B ;
Marynen, P ;
Hassan, B ;
Carmeliet, P ;
Moreau, Y .
NATURE BIOTECHNOLOGY, 2006, 24 (05) :537-544
[4]   Superior Antitumor Activity of SAR3419 to Rituximab in Xenograft Models for Non-Hodgkin's Lymphoma [J].
Al-Katib, Ayad M. ;
Aboukameel, Amro ;
Mohammad, Ramzi ;
Bissery, Marie-Christine ;
Zuany-Amorim, Claudia .
CLINICAL CANCER RESEARCH, 2009, 15 (12) :4038-4045
[5]   Properties and identification of human protein drug targets [J].
Bakheet, Tala M. ;
Doig, Andrew J. .
BIOINFORMATICS, 2009, 25 (04) :451-457
[6]   QuickGO: a web-based tool for Gene Ontology searching [J].
Binns, David ;
Dimmer, Emily ;
Huntley, Rachael ;
Barrell, Daniel ;
O'Donovan, Claire ;
Apweiler, Rolf .
BIOINFORMATICS, 2009, 25 (22) :3045-3046
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   The BioGRID interaction database:: 2008 update [J].
Breitkreutz, Bobby-Joe ;
Stark, Chris ;
Reguly, Teresa ;
Boucher, Lorrie ;
Breitkreutz, Ashton ;
Livstone, Michael ;
Oughtred, Rose ;
Lackner, Daniel H. ;
Bahler, Jurg ;
Wood, Valerie ;
Dolinski, Kara ;
Tyers, Mike .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D637-D640
[10]   The COPD genetic association compendium: a comprehensive online database of COPD genetic associations [J].
Castaldi, Peter J. ;
Cho, Michael H. ;
Cohn, Matthew ;
Langerman, Fawn ;
Moran, Sienna ;
Tarragona, Nestor ;
Moukhachen, Hala ;
Venugopal, Radhika ;
Hasimja, Delvina ;
Kao, Esther ;
Wallace, Byron ;
Hersh, Craig P. ;
Bagade, Sachin ;
Bertram, Lars ;
Silverman, Edwin K. ;
Trikalinos, Thomas A. .
HUMAN MOLECULAR GENETICS, 2010, 19 (03) :526-534