Statistical analysis of large DNA sequences using distribution of DNA words

被引:4
作者
Chaudhuri, P [1 ]
Das, S [1 ]
机构
[1] Indian Stat Inst, Theoret Stat & Mat Unit, Kolkata 700035, W Bengal, India
来源
CURRENT SCIENCE | 2001年 / 80卷 / 09期
关键词
D O I
暂无
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Conventional sequence alignment techniques for comparing and analysing relatively smaller DNA sequences of nearly equal sizes are not applicable to data consisting of large sequences with widely varying sizes. In this article DNA sequences have been analysed based on distributions of DNA words. DNA word frequencies are simple yet effective statistical tools to capture information about structural patterns, and they can reveal biologically significant features in DNA sequence. Our analysis demonstrates how such simple statistical summaries of large DNA data can enable us to detect the structural signature of a genome as well as to identify phylogenetic relationships among different species reflected in the variation of word distributions in their DNA sequences.
引用
收藏
页码:1161 / 1166
页数:6
相关论文
共 21 条
[1]  
Basu S, UNPUB
[2]   Similarities and dissimilarities of phage genomes [J].
Blaisdell, BE ;
Campbell, AM ;
Karlin, S .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (12) :5854-5859
[3]  
CHAUDHURI P, 2001, IN PRESS J BIOSCI
[4]   Genomic signature: Characterization and classification of species assessed by chaos game representation of sequences [J].
Deschavanne, PJ ;
Giron, A ;
Vilain, J ;
Fagot, G ;
Fertil, B .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (10) :1391-1399
[5]  
Doolittle R., 1996, METHOD ENZYMOL, V266, P1
[6]  
DOOLITTLE RF, 1990, METHOD ENZYMOL, V183, P1
[7]   PHYLOGENIES FROM MOLECULAR SEQUENCES - INFERENCE AND RELIABILITY [J].
FELSENSTEIN, J .
ANNUAL REVIEW OF GENETICS, 1988, 22 :521-565
[8]   STATISTICAL-INFERENCE OF PHYLOGENIES [J].
FELSENSTEIN, J .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1983, 146 :246-272
[9]   COMPUTATIONAL DNA-SEQUENCE ANALYSIS [J].
KARLIN, S ;
CARDON, LR .
ANNUAL REVIEW OF MICROBIOLOGY, 1994, 48 :619-654
[10]   HETEROGENEITY OF GENOMES - MEASURES AND VALUES [J].
KARLIN, S ;
LADUNGA, I ;
BLAISDELL, BE .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (26) :12837-12841