The universal protein resource (UniProt)

被引:1171
作者
Bairoch, A
Apweiler, R
Wu, CH
Barker, WC
Boeckmann, B
Ferro, S
Gasteiger, E
Huang, HZ
Lopez, R
Magrane, M
Martin, MJ
Natale, DA
O'Donovan, C
Redaschi, N
Yeh, LSL
机构
[1] European Bioinformat Inst, EMBL Outstn, Cambridge CB10 1SD, England
[2] Ctr Med Univ Geneva, Swiss Inst Bioinformat, CH-1211 Geneva 4, Switzerland
[3] Georgetown Univ, Med Ctr, Natl Biomed Res Fdn, Washington, DC 20057 USA
关键词
D O I
10.1093/nar/gki070
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword list got augmented by additional keywords; we improved the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications. Furthermore, we introduced a new documentation file of the strains and their synonyms. Many new database cross-references were introduced and we started to make use of Digital Object Identifiers. We also achieved in collaboration with the Macromolecular Structure Database group at EBI an improved integration with structural databases by residue level mapping of sequences from the Protein Data Bank entries onto corresponding UniProt entries. For convenient sequence searches we provide the UniRef non-redundant sequence databases. The comprehensive UniParc database stores the complete body of publicly available protein sequence data.
引用
收藏
页码:D154 / D159
页数:6
相关论文
共 30 条
  • [1] Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
  • [2] Protein sequence databases
    Apweiler, R
    Bairoch, A
    Wu, CH
    [J]. CURRENT OPINION IN CHEMICAL BIOLOGY, 2004, 8 (01) : 76 - 80
  • [3] PRINTS and its automatic supplement, prePRINTS
    Attwood, TK
    Bradley, P
    Flower, DR
    Gaulton, A
    Maudling, N
    Mitchell, AL
    Moulton, G
    Nordle, A
    Paine, K
    Taylor, P
    Uddin, A
    Zygouri, C
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 400 - 402
  • [4] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
  • [5] The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
    Boeckmann, B
    Bairoch, A
    Apweiler, R
    Blatter, MC
    Estreicher, A
    Gasteiger, E
    Martin, MJ
    Michoud, K
    O'Donovan, C
    Phan, I
    Pilbout, S
    Schneider, M
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 365 - 370
  • [6] The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology
    Camon, E
    Magrane, M
    Barrell, D
    Lee, V
    Dimmer, E
    Maslen, J
    Binns, D
    Harte, N
    Lopez, R
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D262 - D266
  • [7] A novel method for automatic functional annotation of proteins
    Fleischmann, W
    Möller, S
    Gateau, A
    Apweiler, R
    [J]. BIOINFORMATICS, 1999, 15 (03) : 228 - 233
  • [8] Automated annotation of microbial proteomes in SWISS-PROT
    Gattiker, A
    Michoud, K
    Rivoire, C
    Auchincloss, AH
    Coudert, E
    Lima, T
    Kersey, P
    Pagni, M
    Sigrist, CJA
    Lachaize, C
    Veuthey, AL
    Gasteiger, E
    Bairoch, A
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2003, 27 (01) : 49 - 58
  • [9] The FlyBase database of the Drosophila genome projects and community literature
    Gelbart, W
    Bayraktaroglu, L
    Bettencourt, B
    Campbell, K
    Crosby, M
    Emmert, D
    Hradecky, P
    Huang, Y
    Letovsky, S
    Matthews, B
    Russo, S
    Schroeder, A
    Smutniak, F
    Zhou, P
    Zytkovicz, M
    Ashburner, M
    Drysdale, R
    de Grey, A
    Foulger, R
    Millburn, G
    Yamada, C
    Kaufman, T
    Matthews, K
    Gilbert, D
    Grumbling, G
    Strelets, V
    Shemen, C
    Rubin, G
    Berman, B
    Frise, E
    Gibson, M
    Harris, N
    Kaminker, J
    Lewis, S
    Marshall, B
    Misra, S
    Mungall, C
    Prochnik, S
    Richter, J
    Smith, C
    Shu, S
    Tupy, J
    Wiel, C
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 172 - 175
  • [10] Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure
    Gough, J
    Karplus, K
    Hughey, R
    Chothia, C
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2001, 313 (04) : 903 - 919