High throughput profile-profile based fold recognition for the entire human proteome

被引:16
作者
McGuffin, Liam J.
Smith, Richard T.
Bryson, Kevin
Sorensen, Soren-Aksel
Jones, David T.
机构
[1] UCL, Dept Comp Sci, London WC1E 6BT, England
[2] Univ Reading, BioCtr, Reading RG6 6AS, Berks, England
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1186/1471-2105-7-288
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power. In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. Results: We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. Conclusion: This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.
引用
收藏
页数:11
相关论文
共 21 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[3]   Ensembl 2006 [J].
Birney, E. ;
Andrews, D. ;
Caccamo, M. ;
Chen, Y. ;
Clarke, L. ;
Coates, G. ;
Cox, T. ;
Cunningham, F. ;
Curwen, V. ;
Cutts, T. ;
Down, T. ;
Durbin, R. ;
Fernandez-Suarez, X. M. ;
Flicek, P. ;
Graf, S. ;
Hammond, M. ;
Herrero, J. ;
Howe, K. ;
Iyer, V. ;
Jekosch, K. ;
Kahari, A. ;
Kasprzyk, A. ;
Keefe, D. ;
Kokocinski, F. ;
Kulesha, E. ;
London, D. ;
Longden, I. ;
Melsopp, C. ;
Meidl, P. ;
Overduin, B. ;
Parker, A. ;
Proctor, G. ;
Prlic, A. ;
Rae, M. ;
Rios, D. ;
Redmond, S. ;
Schuster, M. ;
Sealy, I. ;
Searle, S. ;
Severin, J. ;
Slater, G. ;
Smedley, D. ;
Smith, J. ;
Stabenau, A. ;
Stalker, J. ;
Trevanion, S. ;
Ureta-Vidal, A. ;
Vogel, J. ;
White, S. ;
Woodwark, C. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D556-D561
[4]   3D-GENOMICS:: a database to compare structural and functional annotations of proteins between sequenced genomes [J].
Fleming, K ;
Müller, A ;
MacCallum, RM ;
Sternberg, MJE .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D245-D250
[5]   Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure [J].
Gough, J ;
Karplus, K ;
Hughey, R ;
Chothia, C .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 313 (04) :903-919
[6]   DALI - A NETWORK TOOL FOR PROTEIN-STRUCTURE COMPARISON [J].
HOLM, L ;
SANDER, C .
TRENDS IN BIOCHEMICAL SCIENCES, 1995, 20 (11) :478-480
[7]   Prediction of novel and analogous folds using fragment assembly and fold recognition [J].
Jones, DT ;
Bryson, K ;
Coleman, A ;
McGuffin, LJ ;
Sadowski, MI ;
Sodhi, JS ;
Ward, JJ .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 61 :143-151
[8]   GenTHREADER: An efficient and reliable protein fold recognition method for genomic sequences [J].
Jones, DT .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 287 (04) :797-815
[9]   The Genomic Threading Database: a comprehensive resource for structural annotations of the genomes from key organisms [J].
McGuffin, LJ ;
Street, SA ;
Bryson, K ;
Sorensen, SA ;
Jones, DT .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D196-D199
[10]   The Genomic Threading Database [J].
McGuffin, LJ ;
Street, S ;
Sorensen, SA ;
Jones, DT .
BIOINFORMATICS, 2004, 20 (01) :131-132