Over- and underrepresentation of short DNA words in herpesvirus genomes

被引:51
作者
Leung, MY
Marsh, GM
Speed, TP
机构
[1] UNIV TEXAS,HLTH SCI CTR,MEXICAN AMER TREATMENT EFFECT RES CTR,SAN ANTONIO,TX 78284
[2] UNIV CALIF BERKELEY,DEPT STAT,BERKELEY,CA 94720
关键词
DNA sequence; word count; Markov chain; z-score; herpesviruses;
D O I
10.1089/cmb.1996.3.345
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The relative abundance and rarity of DNA words have been recognized in previous biological studies to have implications for the regulation, repair, and evolutionary mechanisms of a genome, In this paper, we review several different measures of abundance and rarity of DNA words, including z-scores, representation ratios, and cross-ratios, that have appeared in the recent literature, and examine the concordance among them using the human cytomegalovirus genome sequence, We then rank all words of length k = 2,..., 5 of seven herpesvirus genomes according to their abundance, as measured by one of the z-scores based upon a stationary Markov model of order k - 2, Using a simple metric on the ranks of 2-words of the seven herpesvirus sequences, we construct an evolutionary tree, Several 3-words are observed to be consistently over- or underrepresented in all seven herpesviruses, Furthermore, clusters of some of the most over- and underrepresented 4- and 5-words in the genomes are identified with functional sites such as the origins of replication and regulatory signals of individual viruses.
引用
收藏
页码:345 / 360
页数:16
相关论文
共 47 条
[1]  
Agresti A., 1990, Analysis of categorical data
[2]   DNA MISMATCH CORRECTION BY VERY SHORT PATCH REPAIR MAY HAVE ALTERED THE ABUNDANCE OF OLIGONUCLEOTIDES IN THE ESCHERICHIA-COLI GENOME [J].
BHAGWAT, AS ;
MCCLELLAND, M .
NUCLEIC ACIDS RESEARCH, 1992, 20 (07) :1663-1668
[3]  
BILLINGSLEY P, 1995, PROBABILITY MEASURE
[5]   LINGUISTICS OF NUCLEOTIDE-SEQUENCES - MORPHOLOGY AND COMPARISON OF VOCABULARIES [J].
BRENDEL, V ;
BECKMANN, JS ;
TRIFONOV, EN .
JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 1986, 4 (01) :11-21
[6]  
BUCHER P, 1986, NUCLEIC ACIDS RES, V14, P1009
[7]   OVER-REPRESENTATION AND UNDER-REPRESENTATION OF SHORT OLIGONUCLEOTIDES IN DNA-SEQUENCES [J].
BURGE, C ;
CAMPBELL, AM ;
KARLIN, S .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (04) :1358-1362
[8]  
Cann A.J., 1993, PRINCIPLES MOL VIROL, P173
[9]   PERVASIVE CPG SUPPRESSION IN ANIMAL MITOCHONDRIAL GENOMES [J].
CARDON, LR ;
BURGE, C ;
CLAYTON, DA ;
KARLIN, S .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (09) :3799-3803
[10]   EXPECTED FREQUENCIES OF DNA PATTERNS USING WHITTLES FORMULA [J].
COWAN, R .
JOURNAL OF APPLIED PROBABILITY, 1991, 28 (04) :886-892