Improvement of the GenTHREADER method for genomic fold recognition

被引:267
作者
McGuffin, LJ [1 ]
Jones, DT [1 ]
机构
[1] UCL, Dept Comp Sci, Bioinformat Grp, London WC1E 6BT, England
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1093/bioinformatics/btg097
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In order to enhance genome annotation, the fully automatic fold recognition method GenTHREADER has been improved and benchmarked. The previous version of GenTHREADER consisted of a simple neural network which was trained to combine sequence alignment score, length information and energy potentials derived from threading into a single score representing the relationship between two proteins, as designated by CATH. The improved version incorporates PSI-BLAST searches, which have been jumpstarted with structural alignment profiles from FSSP, and now also makes use of PSIPRED predicted secondary structure and bi-directional scoring in order to calculate the final alignment score. Pairwise potentials and solvation potentials are calculated from the given sequence alignment which are then used as inputs to a multi-layer, feed-forward neural network, along with the alignment score, alignment length and sequence length. The neural network has also been expanded to accommodate the secondary structure element alignment (SSEA) score as an extra input and it is now trained to learn the FSSP Z-score as a measurement of similarity between two proteins. Results: The improvements made to GenTHREADER increase the number of remote homologues that can be detected with a low error rate, implying higher reliability of score, whilst also increasing the quality of the models produced. We find that up to five times as many true positives can be detected with low error rate per query. Total MaxSub score is doubled at low false positive rates using the improved method.
引用
收藏
页码:874 / 881
页数:8
相关论文
共 38 条
  • [1] Altschul SF, 1996, METHOD ENZYMOL, V266, P460
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [4] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [5] Protein Information Resource: a community resource for expert annotation of protein data
    Barker, WC
    Garavelli, JS
    Hou, ZL
    Huang, HZ
    Ledley, RS
    McGarvey, PB
    Mewes, HW
    Orcutt, BC
    Pfeiffer, F
    Tsugita, A
    Vinayaka, CR
    Xiao, CL
    Yeh, LSL
    Wu, C
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 29 - 32
  • [6] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [7] Birney E, 2001, AM J HUM GENET, V69, P219
  • [8] Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships
    Brenner, SE
    Chothia, C
    Hubbard, TJP
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) : 6073 - 6078
  • [9] Bujnicki JM, 2001, PROTEINS, P184
  • [10] LiveBench-1: Continuous benchmarking of protein structure prediction servers
    Bujnicki, JM
    Elofsson, A
    Fischer, D
    Rychlewski, L
    [J]. PROTEIN SCIENCE, 2001, 10 (02) : 352 - 361