Computational evaluation of TIS annotation for prokaryotic genomes

被引:13
作者
Hu, Gang-Qing [1 ,2 ,3 ]
Zheng, Xiaobin [1 ,2 ,3 ]
Ju, Li-Ning [1 ,2 ]
Zhu, Huaiqiu [1 ,2 ,3 ]
She, Zhen-Su [1 ,2 ,3 ,4 ]
机构
[1] Peking Univ, State Key Lab Turbulence & Complex Syst, Beijing 100871, Peoples R China
[2] Peking Univ, Dept Biomed Engn, Coll Engn, Beijing 100871, Peoples R China
[3] Peking Univ, Ctr Theoret Biol, Beijing 100871, Peoples R China
[4] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90095 USA
基金
中国国家自然科学基金;
关键词
D O I
10.1186/1471-2105-9-160
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Accurate annotation of translation initiation sites (TISs) is essential for understanding the translation initiation mechanism. However, the reliability of TIS annotation in widely used databases such as RefSeq is uncertain due to the lack of experimental benchmarks. Results: Based on a homogeneity assumption that gene translation-related signals are uniformly distributed across a genome, we have established a computational method for a large-scale quantitative assessment of the reliability of TIS annotations for any prokaryotic genome. The method consists of modeling a positional weight matrix (PWM) of aligned sequences around predicted TISs in terms of a linear combination of three elementary PWMs, one for true TIS and the two others for false TISs. The three elementary PWMs are obtained using a reference set with highly reliable TIS predictions. A generalized least square estimator determines the weighting of the true TIS in the observed PWM, from which the accuracy of the prediction is derived. The validity of the method and the extent of the limitation of the assumptions are explicitly addressed by testing on experimentally verified TISs with variable accuracy of the reference sets. The method is applied to estimate the accuracy of TIS annotations that are provided on public databases such as RefSeq and ProTISA and by programs such as EasyGene, GeneMarkS, Glimmer 3 and TiCo. It is shown that RefSeq's TIS prediction is significantly less accurate than two recent predictors, Tico and ProTISA. With convincing proofs, we show two general preferential biases in the RefSeq annotation, i.e. over-annotating the longest open reading frame ( LORF) and under-annotating ATG start codon. Finally, we have established a new TIS database, SupTISA, based on the best prediction of all the predictors; SupTISA has achieved an average accuracy of 92% over all 532 complete genomes. Conclusion: Large-scale computational evaluation of TIS annotation has been achieved. A new TIS database much better than RefSeq has been constructed, and it provides a valuable resource for further TIS studies.
引用
收藏
页数:12
相关论文
共 22 条
[1]   GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions [J].
Besemer, J ;
Lomsadze, A ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 2001, 29 (12) :2607-2618
[2]   A unique ATG triplet downstream of gene start in archaea: implications for translation initiation and evolution [J].
Cang, XH ;
Wang, J .
GENE, 2004, 327 (01) :75-79
[3]   Identifying bacterial genes and endosymbiont DNA with Glimmer [J].
Delcher, Arthur L. ;
Bratke, Kirsten A. ;
Powers, Edwin C. ;
Salzberg, Steven L. .
BIOINFORMATICS, 2007, 23 (06) :673-679
[4]   Starts of bacterial genes: estimating the reliability of computer predictions [J].
Frishman, D ;
Mironov, A ;
Gelfand, M .
GENE, 1999, 234 (02) :257-265
[5]   POSTTRANSCRIPTIONAL REGULATORY MECHANISMS IN ESCHERICHIA-COLI [J].
GOLD, L .
ANNUAL REVIEW OF BIOCHEMISTRY, 1988, 57 :199-233
[6]  
Gorodkin J, 1997, COMPUT APPL BIOSCI, V13, P583
[7]   ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes [J].
Hu, Gang-Qing ;
Zheng, Xiaobin ;
Yang, Yi-Fan ;
Ortet, Philippe ;
She, Zhen-Su ;
Zhu, Huaiqiu .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D114-D119
[8]   EasyGene - a prokaryotic gene finder that ranks ORFs by statistical significance [J].
Larsen, TS ;
Krogh, A .
BMC BIOINFORMATICS, 2003, 4 (1)
[9]   Evolution of translational initiation: new insights from the archaea [J].
Londei, P .
FEMS MICROBIOLOGY REVIEWS, 2005, 29 (02) :185-200
[10]   Large-scale prokaryotic gene prediction and comparison to genome annotation [J].
Nielsen, P ;
Krogh, A .
BIOINFORMATICS, 2005, 21 (24) :4322-4329