Comparison of computational methods for identifying translation initiation sites in EST data

被引:35
作者
Nadershahi, A
Fahrenkrug, SC
Ellis, LBM [1 ]
机构
[1] Univ Minnesota, Dept Lab Med & Pathol, Minneapolis, MN 55455 USA
[2] Univ Minnesota, Coll Biol Sci, St Paul, MN 55108 USA
[3] Univ Minnesota, Dept Anim Sci, St Paul, MN 55108 USA
关键词
D O I
10.1186/1471-2105-5-14
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Expressed Sequence Tag ( EST) sequences are generally single-strand, single-pass sequences, only 200 - 600 nucleotides long, contain errors resulting in frame shifts, and represent different parts of their parent cDNA. If the cDNAs contain translation initiation sites, they may be suitable for functional genomics studies. We have compared five methods to predict translation initiation sites in EST data: first-ATG, ESTScan, Diogenes, Netstart, and ATGpr. Results: A dataset of 100 EST sequences, 50 with and 50 without, translation initiation sites, was created. Based on analysis of this dataset, ATGpr is found to be the most accurate for predicting the presence versus absence of translation initiation sites. With a maximum accuracy of 76%, ATGpr more accurately predicts the position or absence of translation initiation sites than NetStart (57%) or Diogenes (50%). ATGpr similarly excels when start sites are known to be present (90%), whereas NetStart achieves only 60% overall accuracy. As a baseline for comparison, choosing the first ATG correctly identifies the translation initiation site in 74% of the sequences. ESTScan and Diogenes, consistent with their intended use, are able to identify open reading frames, but are unable to determine the precise position of translation initiation sites. Conclusions: ATGpr demonstrates high sensitivity, specificity, and overall accuracy in identifying start sites while also rejecting incomplete sequences. A database of EST sequences suitable for validating programs for translation initiation site prediction is now available. These tools and materials may open an avenue for future improvements in start site prediction and EST analysis.
引用
收藏
页数:10
相关论文
共 29 条
[1]  
ADAMS MD, 1995, NATURE, V377, P3
[2]   COMPLEMENTARY-DNA SEQUENCING - EXPRESSED SEQUENCE TAGS AND HUMAN GENOME PROJECT [J].
ADAMS, MD ;
KELLEY, JM ;
GOCAYNE, JD ;
DUBNICK, M ;
POLYMEROPOULOS, MH ;
XIAO, H ;
MERRIL, CR ;
WU, A ;
OLDE, B ;
MORENO, RF ;
KERLAVAGE, AR ;
MCCOMBIE, WR ;
VENTER, JC .
SCIENCE, 1991, 252 (5013) :1651-1656
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]  
Benson DA, 2003, NUCLEIC ACIDS RES, V31, P23, DOI 10.1093/nar/gkg057
[5]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[7]   EUKARYOTIC START AND STOP TRANSLATION SITES [J].
CAVENER, DR ;
RAY, SC .
NUCLEIC ACIDS RESEARCH, 1991, 19 (12) :3185-3192
[8]   Ensembl 2002: accommodating comparative genomics [J].
Clamp, M ;
Andrews, D ;
Barker, D ;
Bevan, P ;
Cameron, G ;
Chen, Y ;
Clark, L ;
Cox, T ;
Cuff, J ;
Curwen, V ;
Down, T ;
Durbin, R ;
Eyras, E ;
Gilbert, J ;
Hammond, M ;
Hubbard, T ;
Kasprzyk, A ;
Keefe, D ;
Lehvaslaiho, H ;
Iyer, V ;
Melsopp, C ;
Mongin, E ;
Pettett, R ;
Potter, S ;
Rust, A ;
Schmidt, E ;
Searle, S ;
Slater, G ;
Smith, J ;
Spooner, W ;
Stabenau, A ;
Stalker, J ;
Stupka, E ;
Ureta-Vidal, A ;
Vastrik, I ;
Birney, E .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :38-42
[9]   Translation initiation start prediction in human cDNAs with high accuracy [J].
Hatzigeorgiou, AG .
BIOINFORMATICS, 2002, 18 (02) :343-350
[10]  
Iseli C, 1999, Proc Int Conf Intell Syst Mol Biol, P138