SUPERFAMILY-sophisticated comparative genomics, data mining, visualization and phylogeny

被引:344
作者
Wilson, Derek [1 ]
Pethica, Ralph [2 ]
Zhou, Yiduo [2 ]
Talbot, Charles [2 ]
Vogel, Christine [3 ]
Madera, Martin [2 ]
Chothia, Cyrus [1 ]
Gough, Julian [2 ]
机构
[1] MRC, Mol Biol Lab, Cambridge CB2 2QH, England
[2] Univ Bristol, Dept Comp Sci, Bristol BS8 1UB, Avon, England
[3] Univ Texas Austin, Inst Cellular & Mol Biol, Austin, TX 78712 USA
基金
英国医学研究理事会;
关键词
PROTEIN; FAMILY; EVOLUTION; DOMAINS; SYSTEM;
D O I
10.1093/nar/gkn762
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.
引用
收藏
页码:D380 / D386
页数:7
相关论文
共 40 条
[31]   A comparative and phylogenetic analysis of the α-actinin rod domain [J].
Virel, Ana ;
Backman, Lars .
MOLECULAR BIOLOGY AND EVOLUTION, 2007, 24 (10) :2254-2265
[32]   The relationship between domain duplication and recombination [J].
Vogel, C ;
Teichmann, SA ;
Pereira-Leal, J .
JOURNAL OF MOLECULAR BIOLOGY, 2005, 346 (01) :355-365
[33]   Supra-domains: Evolutionary units larger than single protein domains [J].
Vogel, C ;
Berzuini, C ;
Bashton, M ;
Gough, J ;
Teichmann, SA .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 336 (03) :809-823
[34]   Protein family expansions and biological complexity [J].
Vogel, Christine ;
Chothia, Cyrus .
PLOS COMPUTATIONAL BIOLOGY, 2006, 2 (05) :370-382
[35]   Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world [J].
Wang, Minglei ;
Yafremava, Liudmila S. ;
Caetano-Anolles, Derek ;
Mittenthal, Jay E. ;
Caetano-Anolles, Gustavo .
GENOME RESEARCH, 2007, 17 (11) :1572-1585
[36]   DBDtaxonomically broad transcription factor predictions: new content and functionality [J].
Wilson, Derek ;
Charoensawan, Varodom ;
Kummerfeld, Sarah K. ;
Teichmann, Sarah A. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D88-D92
[37]   The SUPERFAMILY database in 2007: families and functions [J].
Wilson, Derek ;
Madera, Martin ;
Vogel, Christine ;
Chothia, Cyrus ;
Gough, Julian .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D308-D313
[38]   PIRSF: family classification system at the Protein Information Resource [J].
Wu, CH ;
Nikolskaya, A ;
Huang, HZ ;
Yeh, LSL ;
Natale, DA ;
Vinayaka, CR ;
Hu, ZZ ;
Mazumder, R ;
Kumar, S ;
Kourtesis, P ;
Ledley, RS ;
Suzek, BE ;
Arminski, L ;
Chen, YX ;
Zhang, J ;
Cardenas, JL ;
Chung, S ;
Castro-Alvear, J ;
Dinkov, G ;
Barker, WC .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D112-D114
[39]   Phylogeny determined by protein domain content [J].
Yang, S ;
Doolittle, RF ;
Bourne, PE .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (02) :373-378
[40]   Gene3D: modelling protein structure, function and evolution [J].
Yeats, Corin ;
Maibaum, Michael ;
Marsden, Russell ;
Dibley, Mark ;
Lee, David ;
Addou, Sarah ;
Orengo, Christine A. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D281-D284