Frame: detection of genomic sequencing errors

被引:16
作者
Brown, NP
Sander, C
Bork, P
机构
[1] European Bioinformat Inst EMBL, Cambridge CB10 1SD, England
[2] European Mol Biol Lab HD, Biocomp Unit, D-69012 Heidelberg, Germany
[3] Max Delbruck Ctr Mol Med, D-13122 Berlin, Germany
关键词
D O I
10.1093/bioinformatics/14.4.367
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The underlying error rate for genomic sequencing sometimes results in the introduction of artificial frameshifts and in-frame stop codons into putative protein encoding genes. Severe errors are then introduced into the inferred transcripts through mis-translation or premature termination. Results: We describe a system for screening segments of DNA for frameshift and in-fame stop errors in coding regions. The method is based on homology matching using blastx to compare all six reading frames of the query nucleotide sequence against selected protein sequence databases. Fragments of protein matching neighbouring regions of the query DNA are united and extended laterally to define candidate open reading frames, within which, frameshifts and stops are identified. Suitable targets include prokaryotic or other intron-free genomic sequence and complementary DNAs. As an example of its use, we report here two frameshifted ORFs that deviate from the original TIGR sequence annotations for the recently released Helicobacter pylori genome.
引用
收藏
页码:367 / 371
页数:5
相关论文
共 16 条