The SUPERFAMILY database in 2004: additions and improvements

被引:185
作者
Madera, M
Vogel, C
Kummerfeld, SK
Chothia, C
Gough, J
机构
[1] MRC, Mol Biol Lab, Cambridge CB2 2QH, England
[2] Stanford Univ, Sch Med, Dept Biol Struct, Stanford, CA 94305 USA
[3] RIKEN, Genom Sci Ctr, Tsurumi Ku, Yokohama, Kanagawa 2300045, Japan
关键词
D O I
10.1093/nar/gkh117
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.
引用
收藏
页码:D235 / D239
页数:5
相关论文
共 20 条
  • [1] Domain combinations in archaeal, eubacterial and eukaryotic proteomes
    Apic, G
    Gough, J
    Teichmann, SA
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2001, 310 (02) : 311 - 325
  • [2] Ashburner M, 2001, GENOME RES, V11, P1425
  • [3] The geometry of domain combination in proteins
    Bashton, M
    Chothia, C
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2002, 315 (04) : 927 - 939
  • [4] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [5] The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
    Boeckmann, B
    Bairoch, A
    Apweiler, R
    Blatter, MC
    Estreicher, A
    Gasteiger, E
    Martin, MJ
    Michoud, K
    O'Donovan, C
    Phan, I
    Pilbout, S
    Schneider, M
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 365 - 370
  • [6] Evolution of the protein repertoire
    Chothia, C
    Gough, J
    Vogel, C
    Teichmann, SA
    [J]. SCIENCE, 2003, 300 (5626) : 1701 - 1703
  • [7] Profile hidden Markov models
    Eddy, SR
    [J]. BIOINFORMATICS, 1998, 14 (09) : 755 - 763
  • [8] Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure
    Gough, J
    Karplus, K
    Hughey, R
    Chothia, C
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2001, 313 (04) : 903 - 919
  • [9] SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments
    Gough, J
    Chothia, C
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 268 - 272
  • [10] Hegyi H, 2001, GENOME RES, V11, P1632, DOI 10.1101/gr. 183801