The Protein Information Resource: an integrated public resource of functional annotation of proteins

被引:154
作者
Wu, CH
Huang, HZ
Arminski, L
Castro-Alvear, J
Chen, YX
Hu, ZZ
Ledley, RS
Lewis, KC
Mewes, HW
Orcutt, BC
Suzek, BE
Tsugita, A
Vinayaka, CR
Yeh, LSL
Zhang, J
Barker, WC
机构
[1] Georgetown Univ, Med Ctr, Natl Biomed Res Fdn, Washington, DC 20007 USA
[2] Max Planck Inst Biochem, Munich Informat Ctr, GSF Forschungszentrum Umwelt & Gesundheit, D-82152 Martinsried, Germany
[3] Tokyo Univ Sci, Japan Int Prot Informat Database, Noda, Chiba 278, Japan
关键词
D O I
10.1093/nar/30.1.35
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases).
引用
收藏
页码:35 / 37
页数:3
相关论文
共 17 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[3]  
Barker WC, 1996, METHOD ENZYMOL, V266, P59
[4]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[5]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[6]  
Eddy S R, 1995, J Comput Biol, V2, P9, DOI 10.1089/cmb.1995.2.9
[7]   The PROSITE database, its status in 2002 [J].
Falquet, L ;
Pagni, M ;
Bucher, P ;
Hulo, N ;
Sigrist, CJA ;
Hofmann, K ;
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :235-238
[8]   The RESID Database of protein structure modifications and the NRL-3D Sequence-Structure Database [J].
Garavelli, JS ;
Hou, ZL ;
Pattabiraman, N ;
Stephens, RM .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :199-201
[9]   The PROSITE database, its status in 1999 [J].
Hofmann, K ;
Bucher, P ;
Falquet, L ;
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :215-219
[10]   ProClass protein family database [J].
Huang, HZ ;
Xiao, CL ;
Wu, CH .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :273-276