PA-GOSUB: a searchable database of model organism protein sequences with their predicted gene ontology molecular function and subcellular localization

被引:21
作者
Lu, P [1 ]
Szafron, D [1 ]
Greiner, R [1 ]
Wishart, DS [1 ]
Fyshe, A [1 ]
Pearcy, B [1 ]
Poulin, B [1 ]
Eisner, R [1 ]
Ngo, D [1 ]
Lamb, N [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
关键词
D O I
10.1093/nar/gkil20
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
PA-GOSUB (Proteome Analyst: Gene Ontology Molecular Function and Subcellular Localization) is a publicly available, web-based, searchable and downloadable database that contains the sequences, predicted GO molecular functions and predicted subcellular localizations of more than 107000 proteins from 10 model organisms (and growing), covering the major kingdoms and phyla for which annotated proteomes exist (http://www.cs.ualberta.ca/-bioinfo/ PA/GOSUB). The PA-GOSUB database effectively expands the coverage of subcellular localization and GO function annotations by a significant factor (already over five for subcellular localization, compared with Swiss-Prot v42.7), and more model organisms are being added to PA-GOSUB as their sequenced proteomes become available. PA-GOSUB can be used in three main ways. First, a researcher can browse the pre-computed PA-GOSUB annotations on a per-organism and per-protein basis using annotation-based and text-based filters. Second, a user can perform BLAST searches against the PA-GOSUB database and use the annotations from the homologs as simple predictors for the new sequences. Third, the whole of PA-GOSUB can be downloaded in either FASTA or comma-separated values (CSV) formats.
引用
收藏
页码:D147 / D153
页数:7
相关论文
共 11 条
[1]   Automated genome sequence analysis and annotation [J].
Andrade, MA ;
Brown, NP ;
Leroy, C ;
Hoersch, S ;
de Daruvar, A ;
Reich, C ;
Franchini, A ;
Tamames, J ;
Valencia, A ;
Ouzounis, C ;
Sander, C .
BIOINFORMATICS, 1999, 15 (05) :391-412
[2]   The InterPro database, an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, T ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :37-40
[3]   Functional and structural genomics using PEDANT [J].
Frishman, D ;
Albermann, K ;
Hani, J ;
Heumann, K ;
Metanomski, A ;
Zollner, A ;
Mewes, HW .
BIOINFORMATICS, 2001, 17 (01) :44-57
[4]   MAGPIE: Automated genome interpretation [J].
Gaasterland, T ;
Sensen, CW .
TRENDS IN GENETICS, 1996, 12 (02) :76-78
[5]   Genotator: A workbench for sequence annotation [J].
Harris, NL .
GENOME RESEARCH, 1997, 7 (07) :754-762
[6]   The Ensembl genome database project [J].
Hubbard, T ;
Barker, D ;
Birney, E ;
Cameron, G ;
Chen, Y ;
Clark, L ;
Cox, T ;
Cuff, J ;
Curwen, V ;
Down, T ;
Durbin, R ;
Eyras, E ;
Gilbert, J ;
Hammond, M ;
Huminiecki, L ;
Kasprzyk, A ;
Lehvaslaiho, H ;
Lijnzaad, P ;
Melsopp, C ;
Mongin, E ;
Pettett, R ;
Pocock, M ;
Potter, S ;
Rust, A ;
Schmidt, E ;
Searle, S ;
Slater, G ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Stupka, E ;
Ureta-Vidal, A ;
Vastrik, I ;
Clamp, M .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :38-41
[7]  
Kitson David H, 2002, Brief Bioinform, V3, P32, DOI 10.1093/bib/3.1.32
[8]   Predicting subcellular localization of proteins using machine-learned classifiers [J].
Lu, Z ;
Szafron, D ;
Greiner, R ;
Lu, P ;
Wishart, DS ;
Poulin, B ;
Anvik, J ;
Macdonell, C ;
Eisner, R .
BIOINFORMATICS, 2004, 20 (04) :547-556
[9]  
Overton G C, 1998, Pac Symp Biocomput, P291
[10]   Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations [J].
Szafron, D ;
Lu, P ;
Greiner, R ;
Wishart, DS ;
Poulin, B ;
Eisner, R ;
Lu, Z ;
Anvik, J ;
Macdonell, C ;
Fyshe, A ;
Meeuwis, D .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W365-W371