Enhanced automated function prediction using distantly related sequences and contextual association by PFP

被引:92
作者
Hawkins, Troy
Luban, Stanislav
Kihara, Daisuke [1 ]
机构
[1] Purdue Univ, Dept Biol Sci, W Lafayette, IN 47907 USA
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Purdue Univ, Markey Ctr Struct Biol, W Lafayette, IN 47907 USA
[4] Purdue Univ, Coll Sci, Bindley Biosci Ctr, W Lafayette, IN 47907 USA
关键词
protein function prediction; PSI-BLAST; gene ontology; low-resolution function;
D O I
10.1110/ps.062153506
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The impetus for the recent development and emergence of automated function prediction methods is an exponentially growing flood of new experimental data, the interpretation of which is hindered by a shortage of reliable annotations for proteins that lack experimental characterization or significant homologs in current databases. Here we introduce PFP, an automated function prediction server that provides the most probable annotations for a query sequence in each of the three branches of the Gene Ontology: biological process, molecular function, and cellular component. Rather than utilizing precise pattern matching to identify functional motifs in the sequences and structures of these proteins, we designed PFP to increase the coverage of function annotation by lowering resolution of predictions when a detailed function is not predictable. To do this we extend a traditional PSI-BLAST search by extracting and scoring annotations ( GO terms) individually, including annotations from distantly related sequences, and applying a novel data mining tool, the Function Association Matrix, to score strongly associated pairs of annotations. We show that PFP can correctly assign function using only weakly similar sequences with a significantly better accuracy and coverage than a standard PSI-BLAST search, improving it more than fivefold. The most descriptive annotations predicted by PFP (GO depth >= 8) can identify a significant subgraph in the GO with > 60% accuracy and similar to 100% coverage for our benchmark set. We also provide examples of the superb performance of PFP in an assessment of automated function prediction servers at the Automated Function Prediction Special Interest Group meeting at ISMB 2005 (AFP-SIG '05).
引用
收藏
页码:1550 / 1556
页数:7
相关论文
共 10 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] [Anonymous], 13 ANN INT C INT SYS
  • [4] The Gene Ontology (GO) database and informatics resource
    Harris, MA
    Clark, J
    Ireland, A
    Lomax, J
    Ashburner, M
    Foulger, R
    Eilbeck, K
    Lewis, S
    Marshall, B
    Mungall, C
    Richter, J
    Rubin, GM
    Blake, JA
    Bult, C
    Dolan, M
    Drabkin, H
    Eppig, JT
    Hill, DP
    Ni, L
    Ringwald, M
    Balakrishnan, R
    Cherry, JM
    Christie, KR
    Costanzo, MC
    Dwight, SS
    Engel, S
    Fisk, DG
    Hirschman, JE
    Hong, EL
    Nash, RS
    Sethuraman, A
    Theesfeld, CL
    Botstein, D
    Dolinski, K
    Feierbach, B
    Berardini, T
    Mundodi, S
    Rhee, SY
    Apweiler, R
    Barrell, D
    Camon, E
    Dimmer, E
    Lee, V
    Chisholm, R
    Gaudet, P
    Kibbe, W
    Kishore, R
    Schwarz, EM
    Sternberg, P
    Gwinn, M
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D258 - D261
  • [5] HAWKINS T, 2005, 13 ANN INT C INT SYS, P117
  • [6] Automated Gene Ontology annotation for anonymous sequence data
    Hennig, S
    Groth, D
    Lehrach, H
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3712 - 3715
  • [7] GoFigure:: Automated gene Ontology™ annotation
    Khan, S
    Situ, G
    Decker, K
    Schmidt, CJ
    [J]. BIOINFORMATICS, 2003, 19 (18) : 2484 - 2485
  • [8] GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes
    Martin, DMA
    Berriman, M
    Barton, GJ
    [J]. BMC BIOINFORMATICS, 2004, 5 (1)
  • [9] Inference of protein function from protein structure
    Pal, D
    Eisenberg, D
    [J]. STRUCTURE, 2005, 13 (01) : 121 - 130
  • [10] Predicting protein function from sequence and structural data
    Watson, JD
    Laskowski, RA
    Thornton, JM
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 2005, 15 (03) : 275 - 284