PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information

被引:35
作者
Qian, J
Stenger, B
Wilson, CA
Lin, J
Jansen, R
Teichmann, SA
Park, J
Krebs, WG
Yu, HY
Alexandrov, V
Echols, N
Gerstein, M
机构
[1] Yale Univ, Dept Mol Biophys & Biochem, New Haven, CT 06520 USA
[2] UCL, Dept Biochem & Mol Biol, London WC1E 6BT, England
[3] European Bioinformat Inst, Cambridge CB10 1SD, England
关键词
D O I
10.1093/nar/29.8.1750
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
As the number of protein folds is quite limited, a mode of analysis that will be increasingly common in the future, especially with the advent of structural genomics, is to survey and re-survey the finite parts list of folds from an expanding number of perspectives. We have developed a new resource, called PartsList, that lets one dynamically perform these comparative fold surveys. It is available on the web at http://bioinfo.mbb.yale.edu/partslist and http:il www.partslist.org. The system is based on the existing fold classifications and functions as a form of companion annotation for them, providing 'global views' of many already completed fold surveys, The central idea in the system is that of comparison through ranking; PartsList will rank the approximately 420 folds based on more than 180 attributes. These include: (i) occurrence in a number of completely sequenced genomes (e.g. it will show the most common folds in the worm versus yeast); (ii) occurrence in the structure databank (e.g. most common folds in the PDB) (iii) both absolute and relative gene expression information (e,g, most changing folds in expression over the cell cycle); (iv) protein-protein interactions, based on experimental data in yeast and comprehensive PDB surveys (e,g, most interacting fold) (v) sensitivity to inserted transposons: (vi) the number of functions associated with the fold (e.g. most multi-functional folds); (vii) amino acid composition (e,g, most Cys-rich fords); (viii) protein motions (e.g, most mobile folds); and (ix) the level of similarity based on a comprehensive set of structural alignments (e,g, most structurally variable folds). The integration of whole-genome expression and protein-protein interaction data with structural information is a particularly novel feature of our system. We provide three ways of visualizing the rankings: a profiler emphasizing the progression of high and low ranks across many preselected attributes, a dynamic comparer for custom comparisons and a numerical rankings correlator, These allow one to directly compare very different attributes of a fold (e,g, expression level, genome occurrence and maximum motion) in the uniform numerical format of ranks, This uniform framework, in turn, highlights the way that the frequency of many of the attributes falls off with approximate power-law behavior (i.e. according to V-b, for attribute value V and constant exponent b), with a few folds having large values and most having small values.
引用
收藏
页码:1750 / 1764
页数:15
相关论文
共 74 条
  • [1] Systematic management and analysis of yeast gene expression data
    Aach, J
    Rindone, W
    Church, GM
    [J]. GENOME RESEARCH, 2000, 10 (04) : 431 - 445
  • [2] Iterated profile searches with PSI-BLAST - a tool for discovery in protein databases
    Altschul, SF
    Koonin, EV
    [J]. TRENDS IN BIOCHEMICAL SCIENCES, 1998, 23 (11) : 444 - 447
  • [3] Classes of small-world networks
    Amaral, LAN
    Scala, A
    Barthélémy, M
    Stanley, HE
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (21) : 11149 - 11152
  • [4] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [5] BIND - a data specification for storing and describing biomolecular interactions, molecular complexes and pathways
    Bader, GD
    Hogue, CWV
    [J]. BIOINFORMATICS, 2000, 16 (05) : 465 - 477
  • [6] THE ENZYME DATA-BANK
    BAIROCH, A
    [J]. NUCLEIC ACIDS RESEARCH, 1993, 21 (13) : 3155 - 3156
  • [7] Integrating functional genomic information into the Saccharomyces genome database
    Ball, CA
    Dolinski, K
    Dwight, SS
    Harris, MA
    Issel-Tarver, L
    Kasarskis, A
    Scafe, CR
    Sherlock, G
    Binkley, G
    Jin, H
    Kaloper, M
    Orr, SD
    Schroeder, M
    Weng, S
    Zhu, Y
    Botstein, D
    Cherry, JM
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 77 - 80
  • [8] Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins
    Bateman, A
    Birney, E
    Durbin, R
    Eddy, SR
    Finn, RD
    Sonnhammer, ELL
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 260 - 262
  • [9] GenBank
    Benson, DA
    Karsch-Mizrachi, I
    Lipman, DJ
    Ostell, J
    Rapp, BA
    Wheeler, DL
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 15 - 18
  • [10] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242