Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations

被引：83

作者：

Szafron, D ^{[1
]}

Lu, P ^{[1
]}

Greiner, R ^{[1
]}

Wishart, DS ^{[1
]}

Poulin, B ^{[1
]}

Eisner, R ^{[1
]}

Lu, Z ^{[1
]}

Anvik, J ^{[1
]}

Macdonell, C ^{[1
]}

Fyshe, A ^{[1
]}

Meeuwis, D ^{[1
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada

来源：

NUCLEIC ACIDS RESEARCH | 2004年 / 32卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

10.1093/nar/gkh485

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Proteome Analyst (PA) (http://www.cs.ualberta.ca/similar tobioinfo/PA/) is a publicly available, high-throughput, web-based system for predicting various properties of each protein in an entire proteome. Using machine-learned classifiers, PA can predict, for example, the GeneQuiz general function and Gene Ontology (GO) molecular function of a protein. In addition, PA is currently the most accurate and most comprehensive system for predicting subcellular localization, the location within a cell where a protein performs its main function. Two other capabilities of PA are notable. First, PA can create a custom classifier to predict a new property, without requiring any programming, based on labeled training data (i.e. a set of examples, each with the correct classification label) provided by a user. PA has been used to create custom classifiers for potassium-ion channel proteins and other general function ontologies. Second, PA provides a sophisticated explanation feature that shows why one prediction is chosen over another. The PA system produces a Naive Bayes classifier, which is amenable to a graphical and interactive approach to explanations for its predictions; transparent predictions increase the user's confidence in, and understanding of, PA.

引用

页码：W365 / W371

页数：7

共 14 条

[1] Automated genome sequence analysis and annotation [J].

Andrade, MA ;

Brown, NP ;

Leroy, C ;

Hoersch, S ;

de Daruvar, A ;

Reich, C ;

Franchini, A ;

Tamames, J ;

Valencia, A ;

Ouzounis, C ;

Sander, C .

BIOINFORMATICS, 1999, 15 (05) :391-412

[2] The InterPro database, an integrated documentation resource for protein families, domains and functional sites [J].

Apweiler, R ;

Attwood, TK ;

Bairoch, A ;

Bateman, A ;

Birney, E ;

Biswas, M ;

Bucher, P ;

Cerutti, T ;

Corpet, F ;

Croning, MDR ;

Durbin, R ;

Falquet, L ;

Fleischmann, W ;

Gouzy, J ;

Hermjakob, H ;

Hulo, N ;

Jonassen, I ;

Kahn, D ;

Kanapin, A ;

Karavidopoulou, Y ;

Lopez, R ;

Marx, B ;

Mulder, NJ ;

Oinn, TM ;

Pagni, M ;

Servant, F ;

Sigrist, CJA ;

Zdobnov, EM .

NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :37-40

[3] Functional and structural genomics using PEDANT [J].

Frishman, D ;

Albermann, K ;

Hani, J ;

Heumann, K ;

Metanomski, A ;

Zollner, A ;

Mewes, HW .

BIOINFORMATICS, 2001, 17 (01) :44-57

[4] MAGPIE: Automated genome interpretation [J].

Gaasterland, T ;

Sensen, CW .

TRENDS IN GENETICS, 1996, 12 (02) :76-78

[5]

Gallin WJ, 2001, POTASSIUM CHANNELS IN CARDIOVASCULAR BIOLOGY, P3

[6] Genotator: A workbench for sequence annotation [J].

Harris, NL .

GENOME RESEARCH, 1997, 7 (07) :754-762

[7] The Ensembl genome database project [J].

Hubbard, T ;

Barker, D ;

Birney, E ;

Cameron, G ;

Chen, Y ;

Clark, L ;

Cox, T ;

Cuff, J ;

Curwen, V ;

Down, T ;

Durbin, R ;

Eyras, E ;

Gilbert, J ;

Hammond, M ;

Huminiecki, L ;

Kasprzyk, A ;

Lehvaslaiho, H ;

Lijnzaad, P ;

Melsopp, C ;

Mongin, E ;

Pettett, R ;

Pocock, M ;

Potter, S ;

Rust, A ;

Schmidt, E ;

Searle, S ;

Slater, G ;

Smith, J ;

Spooner, W ;

Stabenau, A ;

Stalker, J ;

Stupka, E ;

Ureta-Vidal, A ;

Vastrik, I ;

Clamp, M .

NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :38-41

[8]

Kitson David H, 2002, Brief Bioinform, V3, P32, DOI 10.1093/bib/3.1.32

[9] Wrappers for feature subset selection [J].

Kohavi, R ;

John, GH .

ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :273-324

[10] Predicting subcellular localization of proteins using machine-learned classifiers [J].

Lu, Z ;

Szafron, D ;

Greiner, R ;

Lu, P ;

Wishart, DS ;

Poulin, B ;

Anvik, J ;

Macdonell, C ;

Eisner, R .

BIOINFORMATICS, 2004, 20 (04) :547-556

← 1 2 →