Overlapping translation of nucleic acid sequences for bioinformatics applications

被引:4
作者
Biro, JC [1 ]
机构
[1] Karolinska Inst & Homulus Informat, S-11460 Stockholm, Sweden
关键词
D O I
10.1016/S0306-9877(03)00008-2
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
An alternative method to TblastX has been developed. Nucleic acids in database and query sequences were translated into overlapping protein-like sequences (overlappingly translated sequences or OTSs) before searching with BlastP. Thus, each nucleic acid sequences is represented by a single 'protein like' sequence instead of three 'proteins' in different reading frames. The 3 x 3 comparison of TblastX is represented by a single comparison, giving faster results. Additional advantages are: (1) it can be more sensitive to detect weak sequence similarities than either blastN or TblastX; (2) codon redundancy is eliminated; (3) the sensitivity to single nucleotide polymorphism, mutation and sequencing errors is reduced; (4) it is insensitive to frame shifts. Results: BlastP using OTS detected about two thirds of blastN and TblastX matches but discovered additional similarities. When blastN and TblastX against nucleic acids were compared to blastP against OTS, identical matches discovered by blastP were generally longer (602, respectively. 213 letters, p < 0.01), had higher scores (748 respectively 460 bits, p < 0.05) and lower Evalues (3.16E - 20 vs. 1.17E + 03, p < 0.01) but the percentage identity was lower (25% respectively 61%, p < 0.001). A qualitative evaluation with LALIGN showed an improvement of the visualization when OTS-s were used instead of nucleic acids. Many extensive sequence similarities became better visible, for example the repeating similarity between prion protein and human insulin gene micro-satellite, and the surprising similarity between the first part of prion protein coding region and the human pro-insulin (34.4% identity and additional 17.2% similarity through 238 residues, score > 295 which is expected 4.6e - 18 times by chance). (C) 2003 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:654 / 659
页数:6
相关论文
共 10 条
[1]   ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES [J].
ALTSCHUL, SF ;
BOGUSKI, MS ;
GISH, W ;
WOOTTON, JC .
NATURE GENETICS, 1994, 6 (02) :119-129
[2]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[3]  
BISHOP MJ, 1999, GENETICS DATABASES
[4]   CODON-ANTICODON PAIRING - WOBBLE HYPOTHESIS [J].
CRICK, FHC .
JOURNAL OF MOLECULAR BIOLOGY, 1966, 19 (02) :548-&
[5]   Recoding: Dynamic reprogramming of translation [J].
Gesteland, RF ;
Atkins, JF .
ANNUAL REVIEW OF BIOCHEMISTRY, 1996, 65 :741-768
[6]  
HANCOCK JM, 1994, COMPUT APPL BIOSCI, V10, P67
[7]   A TIME-EFFICIENT, LINEAR-SPACE LOCAL SIMILARITY ALGORITHM [J].
HUANG, XQ ;
MILLER, W .
ADVANCES IN APPLIED MATHEMATICS, 1991, 12 (03) :337-357
[8]  
LETOVSKY SI, 1999, BIOINFORMATICS
[9]   The Eukaryotic Promoter Database (EPD) [J].
Périer, RC ;
Praz, V ;
Junier, T ;
Bonnard, C ;
Bucher, P .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :302-303
[10]  
Wootton JC, 1996, METHOD ENZYMOL, V266, P554