Amino acid translation program for full-length cDNA sequences with frameshift errors

被引:28
作者
Fukunishi, Y
Hayashizaki, Y
机构
[1] RIKEN, Yokahama Inst, RIKEN Genom Sci Ctr, Lab Genome Explorat Res Grp, Yokohama, Kanagawa 2300045, Japan
[2] Japan Sci & Technol Corp, Core Res Evolut Sci & Technol, Tsukuba, Ibaraki 3050074, Japan
[3] RIKEN, Tsukuba Inst, Genome Sci Lab, Tsukuba, Ibaraki 3050074, Japan
关键词
phred score; Kozak consensus; codon usage; initiation codon; base-call error;
D O I
10.1152/physiolgenomics.2001.5.2.81
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
Here we present an amino acid translation program designed to suggest the position of experimental frameshift errors and predict amino acid sequences for full-length cDNA sequences having phred scores. Our program generates artificial insertions into artificial deletions from low-accuracy positions of the original sequence, thereby generating many candidate sequences. The validity of the most probable sequence (the likelihood that it represents the actual protein) is evaluated by using a score (Va) that is calculated in light of the Kozak consensus, preferred codon usage, and position of the initiation codon. To evaluate the software, we have used a database in which, out of 612 cDNA sequences, 524 (86%) carried 773 frameshift errors in the coding sequence. Our software detected and corrected 48% of the total frameshift errors in 62% of the total cDNA sequences with frameshift errors. The false positive rate of frameshift correction was 9%, and 91% of the suggested frameshifts were true.
引用
收藏
页码:81 / 87
页数:7
相关论文
共 15 条
[1]   THE TRANSLATIONAL TERMINATION SIGNAL DATABASE [J].
BROWN, CM ;
DALPHIN, ME ;
STOCKWELL, PA ;
TATE, WP .
NUCLEIC ACIDS RESEARCH, 1993, 21 (13) :3119-3123
[2]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[3]   Finding the genes in genomic DNA [J].
Burge, CB ;
Karlin, S .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1998, 8 (03) :346-354
[4]   The translational signal database, TransTerm, is now a relational database [J].
Dalphin, ME ;
Brown, CM ;
Stockwell, PA ;
Tate, WP .
NUCLEIC ACIDS RESEARCH, 1998, 26 (01) :335-337
[5]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[6]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[7]   Programmed translational frameshifting [J].
Farabaugh, PJ .
ANNUAL REVIEW OF GENETICS, 1996, 30 :507-528
[8]   THE CODON PREFERENCE PLOT - GRAPHIC ANALYSIS OF PROTEIN CODING SEQUENCES AND PREDICTION OF GENE-EXPRESSION [J].
GRIBSKOV, M ;
DEVEREUX, J ;
BURGESS, RR .
NUCLEIC ACIDS RESEARCH, 1984, 12 (01) :539-549
[9]  
Iseli C, 1999, Proc Int Conf Intell Syst Mol Biol, P138
[10]   AN ANALYSIS OF 5'-NONCODING SEQUENCES FROM 699 VERTEBRATE MESSENGER-RNAS [J].
KOZAK, M .
NUCLEIC ACIDS RESEARCH, 1987, 15 (20) :8125-8148