Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations

被引:83
作者
Szafron, D [1 ]
Lu, P [1 ]
Greiner, R [1 ]
Wishart, DS [1 ]
Poulin, B [1 ]
Eisner, R [1 ]
Lu, Z [1 ]
Anvik, J [1 ]
Macdonell, C [1 ]
Fyshe, A [1 ]
Meeuwis, D [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1093/nar/gkh485
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Proteome Analyst (PA) (http://www.cs.ualberta.ca/similar tobioinfo/PA/) is a publicly available, high-throughput, web-based system for predicting various properties of each protein in an entire proteome. Using machine-learned classifiers, PA can predict, for example, the GeneQuiz general function and Gene Ontology (GO) molecular function of a protein. In addition, PA is currently the most accurate and most comprehensive system for predicting subcellular localization, the location within a cell where a protein performs its main function. Two other capabilities of PA are notable. First, PA can create a custom classifier to predict a new property, without requiring any programming, based on labeled training data (i.e. a set of examples, each with the correct classification label) provided by a user. PA has been used to create custom classifiers for potassium-ion channel proteins and other general function ontologies. Second, PA provides a sophisticated explanation feature that shows why one prediction is chosen over another. The PA system produces a Naive Bayes classifier, which is amenable to a graphical and interactive approach to explanations for its predictions; transparent predictions increase the user's confidence in, and understanding of, PA.
引用
收藏
页码:W365 / W371
页数:7
相关论文
共 14 条
[1]   Automated genome sequence analysis and annotation [J].
Andrade, MA ;
Brown, NP ;
Leroy, C ;
Hoersch, S ;
de Daruvar, A ;
Reich, C ;
Franchini, A ;
Tamames, J ;
Valencia, A ;
Ouzounis, C ;
Sander, C .
BIOINFORMATICS, 1999, 15 (05) :391-412
[2]   The InterPro database, an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, T ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :37-40
[3]   Functional and structural genomics using PEDANT [J].
Frishman, D ;
Albermann, K ;
Hani, J ;
Heumann, K ;
Metanomski, A ;
Zollner, A ;
Mewes, HW .
BIOINFORMATICS, 2001, 17 (01) :44-57
[4]   MAGPIE: Automated genome interpretation [J].
Gaasterland, T ;
Sensen, CW .
TRENDS IN GENETICS, 1996, 12 (02) :76-78
[5]  
Gallin WJ, 2001, POTASSIUM CHANNELS IN CARDIOVASCULAR BIOLOGY, P3
[6]   Genotator: A workbench for sequence annotation [J].
Harris, NL .
GENOME RESEARCH, 1997, 7 (07) :754-762
[7]   The Ensembl genome database project [J].
Hubbard, T ;
Barker, D ;
Birney, E ;
Cameron, G ;
Chen, Y ;
Clark, L ;
Cox, T ;
Cuff, J ;
Curwen, V ;
Down, T ;
Durbin, R ;
Eyras, E ;
Gilbert, J ;
Hammond, M ;
Huminiecki, L ;
Kasprzyk, A ;
Lehvaslaiho, H ;
Lijnzaad, P ;
Melsopp, C ;
Mongin, E ;
Pettett, R ;
Pocock, M ;
Potter, S ;
Rust, A ;
Schmidt, E ;
Searle, S ;
Slater, G ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Stupka, E ;
Ureta-Vidal, A ;
Vastrik, I ;
Clamp, M .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :38-41
[8]  
Kitson David H, 2002, Brief Bioinform, V3, P32, DOI 10.1093/bib/3.1.32
[9]   Wrappers for feature subset selection [J].
Kohavi, R ;
John, GH .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :273-324
[10]   Predicting subcellular localization of proteins using machine-learned classifiers [J].
Lu, Z ;
Szafron, D ;
Greiner, R ;
Lu, P ;
Wishart, DS ;
Poulin, B ;
Anvik, J ;
Macdonell, C ;
Eisner, R .
BIOINFORMATICS, 2004, 20 (04) :547-556