Modeling the percolation of annotation errors in a database of protein sequences

被引:119
作者
Gilks, WR [1 ]
Audit, B
De Angelis, D
Tsoka, S
Ouzounis, CA
机构
[1] MRC, Biostat Unit, Cambridge CB2 2BW, England
[2] EMBL Cambridge Outstn, European Bioinformat Inst, Cambridge CB10 1SD, England
[3] Publ Hlth Lab Serv, Stat Unit, London, England
基金
英国医学研究理事会;
关键词
D O I
10.1093/bioinformatics/18.12.1641
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Public sequence databases contain information on the sequence, structure and function of proteins. Genome sequencing projects have led to a rapid increase in protein sequence information, but reliable, experimentally verified, information on protein function lags a long way behind. To address this deficit, functional annotation in protein databases is often inferred by sequence similarity to homologous, annotated proteins, with the attendant possibility of error. Now, the functional annotation in these homologous proteins may itself have been acquired through sequence similarity to yet other proteins, and it is generally not possible to determine how the functional annotation of any given protein has been acquired. Thus the possibility of chains of misannotation arises, a process we term 'error percolation'. With some simple assumptions, we develop a dynamical probabilistic model for these misannotation chains. By exploring the consequences of the model for annotation quality it is evident that this iterative approach leads to a systematic deterioration of database quality.
引用
收藏
页码:1641 / 1649
页数:9
相关论文
共 19 条
[1]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[2]   Automated genome sequence analysis and annotation [J].
Andrade, MA ;
Brown, NP ;
Leroy, C ;
Hoersch, S ;
de Daruvar, A ;
Reich, C ;
Franchini, A ;
Tamames, J ;
Valencia, A ;
Ouzounis, C ;
Sander, C .
BIOINFORMATICS, 1999, 15 (05) :391-412
[3]   Predicting function: From genes to genomes and back [J].
Bork, P ;
Dandekar, T ;
Diaz-Lazcoz, Y ;
Eisenhaber, F ;
Huynen, M ;
Yuan, YP .
JOURNAL OF MOLECULAR BIOLOGY, 1998, 283 (04) :707-725
[4]   Predicting functions from protein sequences - where are the bottlenecks? [J].
Bork, P ;
Koonin, EV .
NATURE GENETICS, 1998, 18 (04) :313-318
[5]   Errors in genome annotation [J].
Brenner, SE .
TRENDS IN GENETICS, 1999, 15 (04) :132-133
[6]  
desJardins M, 1997, ISMB-97 - FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS FOR MOLECULAR BIOLOGY, PROCEEDINGS, P92
[7]   Intrinsic errors in genome annotation [J].
Devos, D ;
Valencia, A .
TRENDS IN GENETICS, 2001, 17 (08) :429-431
[8]  
Hegyi H, 2001, GENOME RES, V11, P1632, DOI 10.1101/gr. 183801
[9]  
Iliopoulos I, 2001, GENOME BIOL, V2
[10]  
Iyer LM, 2001, GENOME BIOL, V2