IDENTIFICATION OF PROTEIN CODING REGIONS BY DATABASE SIMILARITY SEARCH

被引:1427
作者
GISH, W
STATES, DJ
机构
[1] National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894-0001, Building 38A
[2] Institute for Biomedical Computing, Washington University, St. Louis, MO, 63110
关键词
D O I
10.1038/ng0393-266
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Sequence similarity between a translated nucleotide sequence and a known biological protein can provide strong evidence for the presence of a homologous coding region, even between distantly related genes. The computer program BLASTX performed conceptual translation of a nucleotide query sequence followed by a protein database search in one programmatic step. We characterized the sensitivity of BLASTX recognition to the presence of substitution, insertion and deletion errors in the query sequence and to sequence divergence. Reading frames were reliably identified in the presence of 1 % query errors, a rate that is typical for primary sequence data. BLASTX is appropriate for use in moderate and large scale sequencing projects at the earliest opportunity, when the data are most prone to containing errors.
引用
收藏
页码:266 / 272
页数:7
相关论文
共 36 条
[1]   SEQUENCE IDENTIFICATION OF 2,375 HUMAN BRAIN GENES [J].
ADAMS, MD ;
DUBNICK, M ;
KERLAVAGE, AR ;
MORENO, R ;
KELLEY, JM ;
UTTERBACK, TR ;
NAGLE, JW ;
FIELDS, C ;
VENTER, JC .
NATURE, 1992, 355 (6361) :632-634
[2]   AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE [J].
ALTSCHUL, SF .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) :555-565
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]   THE SWISS-PROT PROTEIN-SEQUENCE DATA-BANK [J].
BAIROCH, A ;
BOECKMANN, B .
NUCLEIC ACIDS RESEARCH, 1992, 20 :2019-2022
[5]  
BARKER WC, 1990, METHOD ENZYMOL, V183, P31
[6]  
BOGUSKI MS, 1992, DBEST DATABASE EXPRE
[7]  
BURKS C, 1990, METHOD ENZYMOL, V183, P3
[8]  
CHAN YL, 1987, J BIOL CHEM, V262, P1111
[9]  
CHEN CJ, 1990, J BIOL CHEM, V265, P506
[10]   IDENTIFYING CODING EXONS BY SIMILARITY SEARCH - ALU-DERIVED AND OTHER POTENTIALLY MISLEADING PROTEIN SEQUENCES [J].
CLAVERIE, JM .
GENOMICS, 1992, 12 (04) :838-841