Rapid protein domain assignment from amino acid sequence using predicted secondary structure

被引:111
作者
Marsden, RL
McGuffin, LJ
Jones, DT
机构
[1] UCL, Bioinformat Unit, Dept Comp Sci, London WC1E 6BT, England
[2] Brunel Univ, Inst Canc Genet & Pharmacogenomics, Dept Biol Sci, Uxbridge UB8 3PH, Middx, England
关键词
domains; secondary structure; protein folding; sequence analysis; structure prediction;
D O I
10.1110/ps.0209902
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The elucidation of the domain content of a given protein sequence in the absence of determined structure or significant sequence homology to known domains is an important problem in structural biology. Here we address how successfully the delineation of continuous domains can be accomplished in the absence of sequence homology using simple baseline methods, an existing prediction algorithm (Domain Guess by Size), and a newly developed method (DomSSEA). The study was undertaken with a view to measuring the usefulness of these prediction methods in terms of their application to fully automatic domain assignment. Thus, the sensitivity of each domain assignment method was measured by Calculating the number of correctly assigned top scoring predictions. We have implemented a new continuous domain identification method using the alignment of predicted secondary structures of target sequences against observed secondary structures of chains with known domain boundaries as assigned by Class Architecture Topology Homology (CATH). Taking top predictions only, the success rate of the method in correctly assigning domain number to the representative chain set is 73.3%. The top prediction for domain number and location of domain boundaries was correct for 24% of the multidomain set (+/-20 residues). These results have been put into context in relation to the results obtained from the other prediction methods assessed.
引用
收藏
页码:2814 / 2824
页数:11
相关论文
共 28 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [3] Protein Information Resource: a community resource for expert annotation of protein data
    Barker, WC
    Garavelli, JS
    Hou, ZL
    Huang, HZ
    Ledley, RS
    McGarvey, PB
    Mewes, HW
    Orcutt, BC
    Pfeiffer, F
    Tsugita, A
    Vinayaka, CR
    Xiao, CL
    Yeh, LSL
    Wu, C
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 29 - 32
  • [4] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [5] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [6] Birney E, 2001, AM J HUM GENET, V69, P219
  • [7] THE PREDICTION OF PROTEIN DOMAINS
    BUSETTA, B
    BARRANS, Y
    [J]. BIOCHIMICA ET BIOPHYSICA ACTA, 1984, 790 (02) : 117 - 124
  • [8] Identification of homology in protein structure classification
    Dietmann, S
    Holm, L
    [J]. NATURE STRUCTURAL BIOLOGY, 2001, 8 (11) : 953 - 957
  • [9] FISCHER D, 2001, PROTEINS S5, V45, P171
  • [10] A systematic comparison of protein structure classifications: SCOP, CATH and FSSP
    Hadley, C
    Jones, DT
    [J]. STRUCTURE WITH FOLDING & DESIGN, 1999, 7 (09): : 1099 - 1112