Fast integration of heterogeneous data sources for predicting gene function with limited annotation

被引:99
作者
Mostafavi, Sara [1 ,2 ]
Morris, Quaid [1 ,2 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 1A1, Canada
[2] Univ Toronto, Ctr Cellular & Biomol Res, Toronto, ON M5S 1A1, Canada
关键词
SELECTION; NETWORK;
D O I
10.1093/bioinformatics/btq262
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Many algorithms that integrate multiple functional association networks for predicting gene function construct a composite network as a weighted sum of the individual networks and then use the composite network to predict gene function. The weight assigned to an individual network represents the usefulness of that network in predicting a given gene function. However, because many categories of gene function have a small number of annotations, the process of assigning these network weights is prone to overfitting. Results: Here, we address this problem by proposing a novel approach to combining multiple functional association networks. In particular, we present a method where network weights are simultaneously optimized on sets of related function categories. The method is simpler and faster than existing approaches. Further, we show that it produces composite networks with improved function prediction accuracy using five example species (yeast, mouse, fly, Esherichia coli and human).
引用
收藏
页码:1759 / 1765
页数:7
相关论文
共 30 条
[1]  
[Anonymous], 2003, INT C MACH LEARN
[2]  
[Anonymous], P 25 C UNC ART INT M
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   The ENZYME database in 2000 [J].
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :304-305
[5]  
Cristianini N, 2002, ADV NEUR IN, V14, P367
[6]   Gene Expression Omnibus: NCBI gene expression and hybridization array data repository [J].
Edgar, R ;
Domrachev, M ;
Lash, AE .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :207-210
[7]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[8]  
Hastie T., 2001, ELEMENTS STAT LEARNI
[9]   Global Functional Atlas of Escherichia coli Encompassing Previously Uncharacterized Proteins [J].
Hu, Pingzhao ;
Janga, Sarath Chandra ;
Babu, Mohan ;
Diaz-Mejia, J. Javier ;
Butland, Gareth ;
Yang, Wenhong ;
Pogoutse, Oxana ;
Guo, Xinghua ;
Phanse, Sadhna ;
Wong, Peter ;
Chandran, Shamanta ;
Christopoulos, Constantine ;
Nazarians-Armavil, Anaies ;
Nasseri, Negin Karimi ;
Musso, Gabriel ;
Ali, Mehrab ;
Nazemof, Nazila ;
Eroukova, Veronika ;
Golshani, Ashkan ;
Paccanaro, Alberto ;
Greenblatt, Jack F. ;
Moreno-Hagelsieb, Gabriel ;
Emili, Andrew .
PLOS BIOLOGY, 2009, 7 (04) :929-947
[10]   KEGG: Kyoto Encyclopedia of Genes and Genomes [J].
Kanehisa, M ;
Goto, S .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :27-30