The Protein Information Resource

被引:299
作者
Wu, CH
Yeh, LSL
Huang, HZ
Arminski, L
Castro-Alvear, J
Chen, YX
Hu, ZZ
Kourtesis, P
Ledley, RS
Suzek, BE
Vinayaka, CR
Zhang, J
Barker, WC
机构
[1] Georgetown Univ, Med Ctr, Dept Biochem & Mol Biol, Washington, DC 20057 USA
[2] Georgetown Univ, Med Ctr, Natl Biomed Res Fdn, Washington, DC 20057 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/nar/gkg040
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Protein Information Resource (PIR) is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery. PIR maintains the Protein Sequence Database ( PSD), an annotated protein database containing over 283 000 sequences covering the entire taxonomic range. Family classification is used for sensitive identification, consistent annotation, and detection of annotation errors. The superfamily curation defines signature domain architecture and categorizes memberships to improve automated classification. To increase the amount of experimental annotation, the PIR has developed bibliography system for literature searching, mapping, and user submission, and has conducted retrospective attribution of citations for experimental features. PIR also maintains NREF, non-redundant reference database, and iProClass, an integrated database of protein family, function, and structure information. PIR-NREF provides timely and comprehensive collection of protein sequences, currently consisting of more than 1 000 000 entries from PIR-PSD, SWISS-PROT, TrEMBL, RefSeq, GenPept, and PDB. The PIR web site ( http:/ / pir. georgetown. edu) connects data analysis tools to underlying databases for information retrieval and knowledge discovery, with functionalities for interactive queries, combinations of sequence and text searches, and sorting and visual exploration of search results. The FTP site provides free download for PSD and NREF biweekly releases and auxiliary databases and files.
引用
收藏
页码:345 / 347
页数:3
相关论文
共 10 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [3] Barker WC, 1996, METHOD ENZYMOL, V266, P59
  • [4] Eddy S R, 1995, J Comput Biol, V2, P9, DOI 10.1089/cmb.1995.2.9
  • [5] Felsenstein J., 1989, CLADISTICS, V5, P164, DOI DOI 10.1111/J.1096-0031.1989.TB00562.X
  • [6] IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON
    PEARSON, WR
    LIPMAN, DJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) : 2444 - 2448
  • [7] RefSeq and LocusLink: NCBI gene-centered resources
    Pruitt, KD
    Maglott, DR
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 137 - 140
  • [8] CLUSTAL-W - IMPROVING THE SENSITIVITY OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT THROUGH SEQUENCE WEIGHTING, POSITION-SPECIFIC GAP PENALTIES AND WEIGHT MATRIX CHOICE
    THOMPSON, JD
    HIGGINS, DG
    GIBSON, TJ
    [J]. NUCLEIC ACIDS RESEARCH, 1994, 22 (22) : 4673 - 4680
  • [9] The Protein Data Bank: Unifying the archive
    Westbrook, J
    Feng, ZK
    Jain, S
    Bhat, TN
    Thanki, N
    Ravichandran, V
    Gilliland, GL
    Bluhm, W
    Weissig, H
    Greer, DS
    Bourne, PE
    Berman, HM
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 245 - 248
  • [10] iProClass:: an integrated, comprehensive and annotated protein classification database
    Wu, CH
    Xiao, CL
    Hou, ZL
    Huang, HZ
    Barker, WC
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 52 - 54