Sequence alignment:: an approximation law for the Z-value with applications to databank scanning

被引:17
作者
Bacro, JN
Comet, JP
机构
[1] INRA, UMR INAPG, INA PG, Dpt OMIP, F-75231 Paris 05, France
[2] Univ Evry Val Essonne, LaMI, F-91025 Evry, France
来源
COMPUTERS & CHEMISTRY | 2001年 / 25卷 / 04期
关键词
dynamic programming sequence alignment; significance; Z-value; approximated distribution; Gumbel distribution;
D O I
10.1016/S0097-8485(01)00074-2
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The Z-value is an attempt to estimate the statistical significance of a Smith and Waterman dynamic programming alignment score (H-score) through the use of a Monte-Carlo procedure. In this paper, we give an approximation for the Z-value law deduced from the Poisson clumping heuristic developed by Waterman and Vingron (Stat. Sci. 9 (1994) 367) in the case of independent and identically distributed sequences comparison. As for non-gapped alignment scores, our approximation is of Gumbel type but with parameters that are sequence independent. This result makes clear the related experimental results mentioned by Comet et al. (Comput. Chem. 23 (1999) 317). Using 'quasi-real' sequences (i.e. randomly shuffled sequences of the same length and amino acid composition as the real ones) we investigate the relevance of our approximation result. Since the Monte-Carlo approach we use generates a bias for the Gumbel decay parameter estimation, a correction procedure is proposed. Applications to real sequences are considered and we show how our results can be used to detect the potential biological relationships between real sequences. (C) 2001 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:401 / 410
页数:10
相关论文
共 29 条
[1]   Do aligned sequences share the same fold? [J].
Abagyan, RA ;
Batalov, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 273 (01) :355-368
[2]  
Aldous D., 1989, PROBABILITY APPROXIM
[3]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[4]   THE ERDOS-RENYI LAW IN DISTRIBUTION, FOR COIN TOSSING AND SEQUENCE MATCHING [J].
ARRATIA, R ;
GORDON, L ;
WATERMAN, MS .
ANNALS OF STATISTICS, 1990, 18 (02) :539-570
[5]   2 MOMENTS SUFFICE FOR POISSON APPROXIMATIONS - THE CHEN-STEIN METHOD [J].
ARRATIA, R ;
GOLDSTEIN, L ;
GORDON, L .
ANNALS OF PROBABILITY, 1989, 17 (01) :9-25
[6]   THE ERDOS-RENYI STRONG LAW FOR PATTERN-MATCHING WITH A GIVEN PROPORTION OF MISMATCHES [J].
ARRATIA, R ;
WATERMAN, MS .
ANNALS OF PROBABILITY, 1989, 17 (03) :1152-1169
[7]   AN EXTREME VALUE THEORY FOR SEQUENCE MATCHING [J].
ARRATIA, R ;
GORDON, L ;
WATERMAN, M .
ANNALS OF STATISTICS, 1986, 14 (03) :971-993
[8]   `A PHASE TRANSITION FOR THE SCORE IN MATCHING RANDOM SEQUENCES ALLOWING DELETIONS [J].
Arratia, Richard ;
Waterman, Michael S. .
ANNALS OF APPLIED PROBABILITY, 1994, 4 (01) :200-225
[9]   Significance of Z-value statistics of Smith-Waterman scores for protein alignments [J].
Comet, JP ;
Aude, JC ;
Glémet, E ;
Risler, JL ;
Hénaut, A ;
Slonimski, PP ;
Codani, JJ .
COMPUTERS & CHEMISTRY, 1999, 23 (3-4) :317-331
[10]  
COMET JP, 1998, THESIS U COMPIEGNE F