Finding borders between coding and noncoding DNA regions by an entropic segmentation method

被引:101
作者
Bernaola-Galván, P
Grosse, I
Carpena, P
Oliver, JL
Román-Roldán, R
Stanley, HE
机构
[1] Boston Univ, Ctr Polymer Studies, Boston, MA 02215 USA
[2] Boston Univ, Dept Phys, Boston, MA 02215 USA
[3] Univ Malaga, Dept Fis Aplicada II, E-29071 Malaga, Spain
[4] Univ Oxford, Oxford OX1 3NP, England
[5] Univ Granada, Dept Genet, E-18071 Granada, Spain
[6] Univ Granada, Inst Biotechnol, E-18071 Granada, Spain
[7] Univ Granada, Dept Fis Aplicada, E-18071 Granada, Spain
关键词
D O I
10.1103/PhysRevLett.85.1342
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
We present a new computational approach to finding borders between coding and noncoding DNA. This approach has two features: (i) DNA sequences are described by a 12-letter alphabet that captures the differential base composition at each codon position, and (ii) the search for the borders is carried out by means of an entropic;segmentation method which uses only the general statistical properties of coding DNA. We find that this method is highly accurate in finding borders between coding and noncoding regions and requires no "prior training" on known data sets. Our results appear to be more accurate than those obtained with moving windows in the discrimination of coding from noncoding DNA.
引用
收藏
页码:1342 / 1345
页数:4
相关论文
共 19 条
[1]   The genome sequence of Rickettsia prowazekii and the origin of mitochondria [J].
Andersson, SGE ;
Zomorodipour, A ;
Andersson, JO ;
Sicheritz-Pontén, T ;
Alsmark, UCM ;
Podowski, RM ;
Näslund, AK ;
Eriksson, AS ;
Winkler, HH ;
Kurland, CG .
NATURE, 1998, 396 (6707) :133-140
[2]  
[Anonymous], 1997, GENOMICS, V45, P244
[3]   Compositional complexity of DNA sequence models [J].
Bernaola-Galván, P ;
Carpena, P ;
Román-Roldán, R ;
Oliver, JL .
COMPUTER PHYSICS COMMUNICATIONS, 1999, 121 :136-138
[4]   Compositional segmentation and long-range fractal correlations in DNA sequences [J].
BernaolaGalvan, P ;
RomanRoldan, R ;
Oliver, JL .
PHYSICAL REVIEW E, 1996, 53 (05) :5181-5189
[5]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[6]   Statistical characterization of the mobility edge of vibrational states in disordered materials [J].
Carpena, P ;
Bernaola-Galván, P .
PHYSICAL REVIEW B, 1999, 60 (01) :201-205
[7]   The DNA sequence of human chromosome 22 [J].
Dunham, I ;
Shimizu, N ;
Roe, BA ;
Chissoe, S ;
Dunham, I ;
Hunt, AR ;
Collins, JE ;
Bruskiewich, R ;
Beare, DM ;
Clamp, M ;
Smink, LJ ;
Ainscough, R ;
Almeida, JP ;
Babbage, A ;
Bagguley, C ;
Balley, J ;
Barlow, K ;
Bates, KN ;
Beasley, O ;
Bird, CP ;
Blakey, S ;
Bridgeman, AM ;
Buck, D ;
Burgess, J ;
Burrill, WD ;
Burton, J ;
Carder, C ;
Carter, NP ;
Chen, Y ;
Clark, G ;
Clegg, SM ;
Cobley, V ;
Cole, CG ;
Collier, RE ;
Connor, RE ;
Conroy, D ;
Corby, N ;
Coville, GJ ;
Cox, AV ;
Davis, J ;
Dawson, E ;
Dhami, PD ;
Dockree, C ;
Dodsworth, SJ ;
Durbin, RM ;
Ellington, A ;
Evans, KL ;
Fey, JM ;
Fleming, K ;
French, L .
NATURE, 1999, 402 (6761) :489-495
[8]   RECOGNITION OF PROTEIN CODING REGIONS IN DNA-SEQUENCES [J].
FICKETT, JW .
NUCLEIC ACIDS RESEARCH, 1982, 10 (17) :5303-5318
[9]  
GRANTHAM R, 1981, NUCLEIC ACIDS RES, V9, pR43
[10]   MEASURING CORRELATIONS IN SYMBOL SEQUENCES [J].
HERZEL, H ;
GROSSE, I .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 1995, 216 (04) :518-542