Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure

被引:934
作者
Gough, J
Karplus, K
Hughey, R
Chothia, C
机构
[1] MRC, Mol Biol Lab, Cambridge CB2 2QH, England
[2] Univ Calif Santa Cruz, Jack Baskin Sch Engn, Dept Comp Engn, Santa Cruz, CA 95064 USA
关键词
genome; superfamily; hidden Markov model; structure; homology;
D O I
10.1006/jmbi.2001.5080
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Of the sequence comparison methods, profile-based methods perform with greater selectively than those that use pairwise comparisons. Of the profile methods, hidden Markov models (HMMs) are apparently the best. The first part of this paper describes calculations that (i) improve the performance of HMMs and (ii) determine a good procedure for creating HMMs for sequences of proteins of known structure. For a family of related proteins, more homologues. are detected using multiple models built from diverse single seed sequences than from one model built from a good alignment of those sequences. A new procedure is described for detecting and correcting those errors that arise at the model-building stage of the procedure. These two improvements greatly increase selectivity and coverage. The second part of the paper describes the construction of a library of HMMs, called SUPERFAMILY, that represent essentially all proteins of known structure. The sequences of the domains in proteins of known structure, that have identifies less than 95%, are used as seeds to build the models. Using the current data, this gives a library with 4894 models. The third part of the paper describes the use of the SUPERFAMILY model library to annotate the sequences of over 50 genomes. The models match twice as many target sequences as are matched by pairwise sequence comparison methods. For each genome, close to half of the sequences are matched in all or in part and, overall, the matches cover 35% of eukaryotic genomes and 45% of bacterial genomes. On average roughly 15% of genome sequences are labelled as being hypothetical yet homologous to proteins of known structure. The annotations derived from these matches are available from a public web server at: http://stash.mrc-lmb.cam.ac.uk/SUPERFAMILY. This server also enables users to match their own sequences against the SUPERFAMILY model library. (C) 2001 Academic Press.
引用
收藏
页码:903 / 919
页数:17
相关论文
共 31 条
[1]
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]
Domain combinations in archaeal, eubacterial and eukaryotic proteomes [J].
Apic, G ;
Gough, J ;
Teichmann, SA .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 310 (02) :311-325
[4]
DETERMINANTS OF A PROTEIN FOLD - UNIQUE FEATURES OF THE GLOBIN AMINO-ACID-SEQUENCES [J].
BASHFORD, D ;
CHOTHIA, C ;
LESK, AM .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 196 (01) :199-216
[5]
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[6]
The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[7]
CLONED AND EXPRESSED NITRIC-OXIDE SYNTHASE STRUCTURALLY RESEMBLES CYTOCHROME-P-450 REDUCTASE [J].
BREDT, DS ;
HWANG, PM ;
GLATT, CE ;
LOWENSTEIN, C ;
REED, RR ;
SNYDER, SH .
NATURE, 1991, 351 (6329) :714-718
[8]
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[9]
The ASTRAL compendium for protein structure and sequence analysis [J].
Brenner, SE ;
Koehl, P ;
Levitt, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :254-256
[10]
Homology among (βα)8 barrels:: Implications for the evolution of metabolic pathways [J].
Copley, RR ;
Bork, P .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 303 (04) :627-640