FINDING ERRORS IN DNA-SEQUENCES

被引:33
作者
POSFAI, J
ROBERTS, RJ
机构
[1] COLD SPRING HARBOR LAB,POB 100,COLD SPRING HARBOR,NY 11724
[2] HUNGARIAN ACAD SCI,BIOL RES CTR,INST BIOPHYS,H-6701 SZEGED,HUNGARY
关键词
READING FRAMES; FRAMESHIFTS;
D O I
10.1073/pnas.89.10.4698
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
An algorithm is described that can detect certain errors within coding regions of DNA sequences. The algorithm is based on the idea that an insertion or deletion error within a coding sequence would interrupt the reading frame and cause the correct translation of a DNA sequence to require one or more frameshifts. If the coding sequence shows similarity to a known protein sequence then such errors can be detected by comparing the conceptual translations of DNA sequences in all six reading frames with every sequence in a protein sequence data base. We have incorporated these ideas into a computer program, called DETECT, that can serve as an aid to the experimentalist who is determining new DNA sequences so that obvious errors may be located and corrected. The program has been tested using raw experimental data and against sequences from the European Molecular Biology Laboratory data base, annotated as containing frameshifts. We have also tested it using unidentified open reading frames that flank known, annotated genes in the GenBank data base. Many potential errors are apparent and in some cases functions can be suggested for the "corrected" versions of these reading frames leading to the identification of new genes. As more sequences are determined the power of this method will increase substantially.
引用
收藏
页码:4698 / 4702
页数:5
相关论文
共 31 条