Sensitivity and selectivity in protein similarity searches: A comparison of Smith-Waterman in hardware to BLAST and FASTA

被引:60
作者
Shpaer, EG [1 ]
Robinson, M [1 ]
Yee, D [1 ]
Candlin, JD [1 ]
Mines, R [1 ]
Hunkapiller, T [1 ]
机构
[1] UNIV WASHINGTON, SEATTLE, WA 98195 USA
关键词
D O I
10.1006/geno.1996.0614
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
To predict the functions of a possible protein product of any new or uncharacterized DNA sequence, it is important first to detect all significant similarities between the encoded amino acid sequence and any accumulated protein sequence data. We have implemented a set of queries and database sequences and proceeded to test and compare various similarity search methods and their parameterizations. We demonstrate here that the Smith-Waterman (S-W) dynamic programming method and the optimized version of FASTA are significantly better able to distinguish true similarities from statistical noise than is the popular database search tool BLAST. Also, a simple ''log-length normalization'' of S-W scores based on the query and target sequence lengths greatly increased the selectivity of the S-W searches, exceeding the default normalization method of FASTA. An implementation of the modified S-W algorithm in hardware (the Fast Data Finder) is able to match the accuracy of software versions while greatly speeding up its execution. We present here the selectivity and sensitivity data from these tests as well as results for various scoring matrices. We present data that will help users to choose threshold score values for evaluation of database search results. We also illustrate the impact of using simple-sequence masking tools such as SEG or XNU. (C) 1996 Academic Press, Inc.
引用
收藏
页码:179 / 191
页数:13
相关论文
共 26 条
[1]   AMINO-ACID SUBSTITUTION MATRICES FROM AN INFORMATION THEORETIC PERSPECTIVE [J].
ALTSCHUL, SF .
JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (03) :555-565
[2]   ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES [J].
ALTSCHUL, SF ;
BOGUSKI, MS ;
GISH, W ;
WOOTTON, JC .
NATURE GENETICS, 1994, 6 (02) :119-129
[3]   A PROTEIN ALIGNMENT SCORING SYSTEM SENSITIVE AT ALL EVOLUTIONARY DISTANCES [J].
ALTSCHUL, SF .
JOURNAL OF MOLECULAR EVOLUTION, 1993, 36 (03) :290-300
[4]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[5]  
[Anonymous], 1978, Atlas of protein sequence and structure
[6]  
BAIROCH A, 1994, NUCLEIC ACIDS RES, V22, P3578
[7]   EMPIRICAL AND STRUCTURAL MODELS FOR INSERTIONS AND DELETIONS IN THE DIVERGENT EVOLUTION OF PROTEINS [J].
BENNER, SA ;
COHEN, MA ;
GONNET, GH .
JOURNAL OF MOLECULAR BIOLOGY, 1993, 229 (04) :1065-1082
[8]   FROM GENOME SEQUENCES TO PROTEIN FUNCTION [J].
BORK, P ;
OUZOUNIS, C ;
SANDER, C .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1994, 4 (03) :393-403
[9]   INFORMATION ENHANCEMENT METHODS FOR LARGE-SCALE SEQUENCE-ANALYSIS [J].
CLAVERIE, JM ;
STATES, DJ .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :191-201
[10]  
COLLINS JF, 1988, COMPUT APPL BIOSCI, V4, P67