Compression and approximate matching

被引：10

作者：

Allison, L ^{[1
]}

Powell, D ^{[1
]}

Dix, TI ^{[1
]}

机构：

[1] Monash Univ, Sch Comp Sci & Software Engn, Clayton, Vic 3168, Australia

来源：

COMPUTER JOURNAL | 1999年 / 42卷 / 01期

关键词：

D O I：

10.1093/comjnl/42.1.1

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

A population of sequences is called non-random if there is a statistical model and an associated compression algorithm that allows members of the population to be compressed, on average. Any available statistical model of a population should be incorporated into algorithms for alignment of the sequences and doing so changes the rank order of possible alignments in general. The model should also be used in deciding if a resulting approximate match between two sequences is significant or not. It is shown how to do this for two plausible interpretations involving pairs of sequences that might or might not be related. Efficient alignment algorithms are described for quite general statistical models of sequences. The new alignment algorithms are more sensitive to what might be termed 'features' of the sequences. A natural significance test is shown to be rarely fooled by apparent similarities between two sequences that are merely typical of all or most members of the population, even unrelated members.

引用

页码：1 / 10

页数：10

共 24 条

[1] NORMALIZATION OF AFFINE GAP COSTS USED IN OPTIMAL SEQUENCE ALIGNMENT
ALLISON, L
[J]. JOURNAL OF THEORETICAL BIOLOGY, 1993, 161 (02) : 263 - 269
[2] Allison L, 1998, Proc Int Conf Intell Syst Mol Biol, V6, P8
[3] FINITE-STATE MODELS IN THE ALIGNMENT OF MACROMOLECULES
ALLISON, L
WALLACE, CS
YEE, CN
[J]. JOURNAL OF MOLECULAR EVOLUTION, 1992, 35 (01) : 77 - 89
[4] ALTSCHUL SF, 1985, MOL BIOL EVOL, V2, P526
[5] THE MULTIPLE ORIGINS OF HUMAN ALU SEQUENCES
BAINS, W
[J]. JOURNAL OF MOLECULAR EVOLUTION, 1986, 23 (03) : 189 - 199
[6] AN INEQUALITY WITH APPLICATIONS TO STATISTICAL ESTIMATION FOR PROBABILISTIC FUNCTIONS OF MARKOV PROCESSES AND TO A MODEL FOR ECOLOGY
BAUM, LE
EAGON, JA
[J]. BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1967, 73 (03) : 360 - &
[7] A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS
BAUM, LE
PETRIE, T
SOULES, G
WEISS, N
[J]. ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01): : 164 - &
[8] INFORMATION CONTENT OF A MULTISTATE DISTRIBUTION
BOULTON, DM
WALLACE, CS
[J]. JOURNAL OF THEORETICAL BIOLOGY, 1969, 23 (02) : 269 - +
[9] INFORMATION ENHANCEMENT METHODS FOR LARGE-SCALE SEQUENCE-ANALYSIS
CLAVERIE, JM
STATES, DJ
[J]. COMPUTERS & CHEMISTRY, 1993, 17 (02): : 191 - 201
[10] DEKEN JG, 1983, TIME WARPS STRING ED, P359

← 1 2 3 →