Sequence complexity for biological sequence analysis

被引：17

作者：

Allison, L ^{[1
]}

Stern, L

Edgoose, T

Dix, TI

机构：

[1] Monash Univ, Sch Comp Sci & Software Engn, Melbourne, Vic 3168, Australia

[2] Univ Melbourne, Dept Comp Sci & Software Engn, Parkville, Vic 3052, Australia

来源：

COMPUTERS & CHEMISTRY | 2000年 / 24卷 / 01期

关键词：

algorithm; DNA; complexity; entropy; pattern discovery; sequence analysis;

D O I：

10.1016/S0097-8485(00)80006-6

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

A new statistical model for DNA considers a sequence to be a mixture of regions with little structure and regions that are approximate repeats of other subsequences, i.e. instances of repeats do not need to match each other exactly. Both forward- and reverse-complementary repeats are allowed. The model has a small number of parameters which are fitted to the data. In general there are many explanations for a given sequence and how to compute the total probability of the data given the model is shown. Computer algorithms are described for these tasks. The model can be used to compute the information content of a sequence, either in total or base by base. This amounts to looking at sequences from a data-compression point of view and it is argued that this is a good way to tackle intelligent sequence analysis in general. (C) 2000 Elsevier Science Ltd. All rights reserved.

引用

页码：43 / 55

页数：13

共 37 条

[1] AGARWAL P, 1994, P 2 INT C INT SYST M, P1
[2] FINITE-STATE MODELS IN THE ALIGNMENT OF MACROMOLECULES
ALLISON, L
WALLACE, CS
YEE, CN
[J]. JOURNAL OF MOLECULAR EVOLUTION, 1992, 35 (01) : 77 - 89
[3] MINIMUM MESSAGE LENGTH ENCODING AND THE COMPARISON OF MACROMOLECULES
ALLISON, L
YEE, CN
[J]. BULLETIN OF MATHEMATICAL BIOLOGY, 1990, 52 (03) : 431 - 453
[4] ALLISON L, 1998, INTELLIGENT SYSTEMS, P8
[5] THE 5' FLANKING REGION OF HUMAN EPSILON-GLOBIN GENE
BARALLE, F
SHOULDERS, C
GOODBOURN, S
JEFFREYS, A
PROUDFOOT, NJ
[J]. NUCLEIC ACIDS RESEARCH, 1980, 8 (19) : 4393 - 4404
[6] AN INEQUALITY WITH APPLICATIONS TO STATISTICAL ESTIMATION FOR PROBABILISTIC FUNCTIONS OF MARKOV PROCESSES AND TO A MODEL FOR ECOLOGY
BAUM, LE
EAGON, JA
[J]. BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1967, 73 (03) : 360 - &
[7] A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS
BAUM, LE
PETRIE, T
SOULES, G
WEISS, N
[J]. ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01): : 164 - &
[8] BAYES T, 1958, BIOMETRIKA, V45, P296
[9] INFORMATION CONTENT OF A MULTISTATE DISTRIBUTION
BOULTON, DM
WALLACE, CS
[J]. JOURNAL OF THEORETICAL BIOLOGY, 1969, 23 (02) : 269 - +
[10] ON LENGTH OF PROGRAMS FOR COMPUTING FINITE BINARY SEQUENCES
CHAITIN, GJ
[J]. JOURNAL OF THE ACM, 1966, 13 (04) : 547 - +

← 1 2 3 4 →