Rapid Annotation of Anonymous Sequences from Genome Projects Using Semantic Similarities and a Weighting Scheme in Gene Ontology

被引:26
作者
Fontana, Paolo [1 ]
Cestaro, Alessandro [1 ]
Velasco, Riccardo [1 ]
Formentin, Elide [2 ]
Toppo, Stefano [3 ]
机构
[1] FEM IASMA Res Ctr, San Michele All Adige, TN, Italy
[2] Univ Padua, Dept Biol, I-35100 Padua, Italy
[3] Univ Padua, Dept Biol Chem, I-35100 Padua, Italy
来源
PLOS ONE | 2009年 / 4卷 / 02期
关键词
PROTEIN FUNCTION PREDICTION; GO; TOOL;
D O I
10.1371/journal.pone.0004619
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task. Methodology: We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms) that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results. Conclusions: The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.
引用
收藏
页数:15
相关论文
共 48 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], SEMANTIC SIMILARITY
[3]   Gene Ontology annotation quality analysis in model eukaryotes [J].
Buza, Teresia J. ;
McCarthy, Fiona M. ;
Wang, Nan ;
Bridges, Susan M. ;
Burgess, Shane C. .
NUCLEIC ACIDS RESEARCH, 2008, 36 (02)
[4]   Blast2GO:: a universal tool for annotation, visualization and analysis in functional genomics research [J].
Conesa, A ;
Götz, S ;
García-Gómez, JM ;
Terol, J ;
Talón, M ;
Robles, M .
BIOINFORMATICS, 2005, 21 (18) :3674-3676
[5]   Transcriptome analysis of Medicago truncatula leaf senescence: similarities and differences in metabolic and transcriptional regulations as compared with Arabidopsis, nodule senescence and nitric oxide signalling [J].
De Michele, Roberto ;
Formentin, Elide ;
Todesco, Marco ;
Toppo, Stefano ;
Carimi, Francesco ;
Zottini, Michela ;
Barizza, Elisabetta ;
Ferrarini, Alberto ;
Delledonne, Massimo ;
Fontana, Paolo ;
Lo Schiavo, Fiorella .
NEW PHYTOLOGIST, 2009, 181 (03) :563-575
[6]   The genome of the social amoeba Dictyostelium discoideum [J].
Eichinger, L ;
Pachebat, JA ;
Glöckner, G ;
Rajandream, MA ;
Sucgang, R ;
Berriman, M ;
Song, J ;
Olsen, R ;
Szafranski, K ;
Xu, Q ;
Tunggal, B ;
Kummerfeld, S ;
Madera, M ;
Konfortov, BA ;
Rivero, F ;
Bankier, AT ;
Lehmann, R ;
Hamlin, N ;
Davies, R ;
Gaudet, P ;
Fey, P ;
Pilcher, K ;
Chen, G ;
Saunders, D ;
Sodergren, E ;
Davis, P ;
Kerhornou, A ;
Nie, X ;
Hall, N ;
Anjard, C ;
Hemphill, L ;
Bason, N ;
Farbrother, P ;
Desany, B ;
Just, E ;
Morio, T ;
Rost, R ;
Churcher, C ;
Cooper, J ;
Haydock, S ;
van Driessche, N ;
Cronin, A ;
Goodhead, I ;
Muzny, D ;
Mourier, T ;
Pain, A ;
Lu, M ;
Harper, D ;
Lindsay, R ;
Hauser, H .
NATURE, 2005, 435 (7038) :43-57
[7]   Phydbac "Gene Function Predictor": a gene annotation tool based on genomic context analysis [J].
Enault, F ;
Suhre, K ;
Claverie, JM .
BMC BIOINFORMATICS, 2005, 6 (1)
[8]  
Friedberg I, 2006, NUCLEIC ACIDS RES, V34, pW379, DOI 10.1093/nar/gkl045
[9]   Automated protein function prediction - the genomic challenge [J].
Friedberg, Iddo .
BRIEFINGS IN BIOINFORMATICS, 2006, 7 (03) :225-242
[10]   Computational protein function prediction: Are we making progress? [J].
Godzik, A. ;
Jambon, M. ;
Friedberg, I. .
CELLULAR AND MOLECULAR LIFE SCIENCES, 2007, 64 (19-20) :2505-2511