RAPID AND ACCURATE ESTIMATES OF STATISTICAL SIGNIFICANCE FOR SEQUENCE DATA-BASE SEARCHES

被引:92
作者
WATERMAN, MS [1 ]
VINGRON, M [1 ]
机构
[1] UNIV SO CALIF, DEPT MOLEC BIOL, LOS ANGELES, CA 90089 USA
关键词
D O I
10.1073/pnas.91.11.4625
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A central question in sequence comparison is the statistical significance of an observed Similarity. For local alignment containing gaps to optimize sequence similarity this problem has so far not been solved mathematically. Using as a basis the Chen-Stein theory of Poisson approximation, we present a practical method to approximate the probability that a local alignment score is a result of chance alone. For a set of similarity scores and gap penalties only one simulation of random alignments needs to be calculated to derive the key information allowing us to estimate the significance of any alignment calculated under this setting. We present applications to data base searching and the analysis of pairwise and self-comparisons of proteins.
引用
收藏
页码:4625 / 4628
页数:4
相关论文
共 24 条
[1]  
Aldous D., 1989, PROBABILITY APPROXIM
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   THE ERDOS-RENYI LAW IN DISTRIBUTION, FOR COIN TOSSING AND SEQUENCE MATCHING [J].
ARRATIA, R ;
GORDON, L ;
WATERMAN, MS .
ANNALS OF STATISTICS, 1990, 18 (02) :539-570
[4]   2 MOMENTS SUFFICE FOR POISSON APPROXIMATIONS - THE CHEN-STEIN METHOD [J].
ARRATIA, R ;
GOLDSTEIN, L ;
GORDON, L .
ANNALS OF PROBABILITY, 1989, 17 (01) :9-25
[5]   STOCHASTIC SCRABBLE - LARGE DEVIATIONS FOR SEQUENCES WITH SCORES [J].
ARRATIA, R ;
MORRIS, P ;
WATERMAN, MS .
JOURNAL OF APPLIED PROBABILITY, 1988, 25 (01) :106-119
[6]   `A PHASE TRANSITION FOR THE SCORE IN MATCHING RANDOM SEQUENCES ALLOWING DELETIONS [J].
Arratia, Richard ;
Waterman, Michael S. .
ANNALS OF APPLIED PROBABILITY, 1994, 4 (01) :200-225
[7]   VIRAL SRC GENE-PRODUCTS ARE RELATED TO THE CATALYTIC CHAIN OF MAMMALIAN CAMP-DEPENDENT PROTEIN-KINASE [J].
BARKER, WC ;
DAYHOFF, MO .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-BIOLOGICAL SCIENCES, 1982, 79 (09) :2836-2839
[8]  
CHVATAL V, 1975, J APPL PROBAB, V12
[9]  
COLLINS JF, 1990, METHOD ENZYMOL, V183, P474
[10]   PROTEIN AND NUCLEIC-ACID SEQUENCE DATABASE SEARCHING - A SUITABLE CASE FOR PARALLEL PROCESSING [J].
COULSON, AFW ;
COLLINS, JF ;
LYALL, A .
COMPUTER JOURNAL, 1987, 30 (05) :420-424