LINGUISTIC MEASURE OF TAXONOMIC AND FUNCTIONAL RELATEDNESS OF NUCLEOTIDE-SEQUENCES

被引:48
作者
PIETROKOVSKI, S [1 ]
HIRSHON, J [1 ]
TRIFONOV, EN [1 ]
机构
[1] LONG ISL UNIV,DEPT BOT,BROOKLYN,NY 11201
关键词
D O I
10.1080/07391102.1990.10508563
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The frequencies of “words”, oligonucleotides within nucleotide sequences, reflect the genetic information contained in the sequence “texts”. Nucleotide sequences are characteristically represented by their contrast word vocabularies. Comparison of the sequences by correlating their contrast vocabularies is shown to reflect well the relatedness (unrelatedness) between the sequences. A single value, the linguistic similarity between the sequences, is suggested as a measure of sequence relatedness. Sequences as short as 1000 bases can be characterized and quantitatively related to other sequences by this technique. The linguistic sequence similarity value is used for analysis of taxonomically and functionally diverse nucleotide sequences. The similarity value is shown to be very sensitive to the relatedness of the source species, thus providing a convenient tool for taxonomic classification of species by their sequence vocabularies. Functionally diverse sequences appear distinct by their linguistic similarity values. This can be a basis for a quick screening technique for functional characterization of the sequences and for mapping functionally distinct regions in long sequences. © Taylor & Francis Group, LLC.
引用
收藏
页码:1251 / 1268
页数:18
相关论文
共 39 条
[11]   HOMOLOGOUS PLANT AND BACTERIAL PROTEINS CHAPERONE OLIGOMERIC PROTEIN ASSEMBLY [J].
HEMMINGSEN, SM ;
WOOLFORD, C ;
VANDERVIES, SM ;
TILLY, K ;
DENNIS, DT ;
GEORGOPOULOS, CP ;
HENDRIX, RW ;
ELLIS, RJ .
NATURE, 1988, 333 (6171) :330-334
[12]   SHAKESPEARE NEW POEM - AN ODE TO STATISTICS [J].
KOLATA, G .
SCIENCE, 1986, 231 (4736) :335-336
[13]   NUCLEOTIDE-SEQUENCE AND GENOME ORGANIZATION OF BACTERIOPHAGE-S13 DNA [J].
LAU, PCK ;
SPENCER, JH .
GENE, 1985, 40 (2-3) :273-284
[14]  
LEE KY, 1956, ANN I PASTEUR PARIS, V91, P212
[15]  
LI WH, 1988, MOL BIOL EVOL, V5, P313
[16]   ENHANCED GRAPHIC MATRIX ANALYSIS OF NUCLEIC-ACID AND PROTEIN SEQUENCES [J].
MAIZEL, JV ;
LENK, RP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-BIOLOGICAL SCIENCES, 1981, 78 (12) :7665-7669
[17]  
MOLINA MI, 1987, J BIOL CHEM, V262, P6478
[18]  
Morton A. Q, 1978, LIT DETECTION
[19]   A GENERAL METHOD APPLICABLE TO SEARCH FOR SIMILARITIES IN AMINO ACID SEQUENCE OF 2 PROTEINS [J].
NEEDLEMAN, SB ;
WUNSCH, CD .
JOURNAL OF MOLECULAR BIOLOGY, 1970, 48 (03) :443-+
[20]   REGULATION OF THE SYNTHESIS OF RIBOSOMES AND RIBOSOMAL COMPONENTS [J].
NOMURA, M ;
GOURSE, R ;
BAUGHMAN, G .
ANNUAL REVIEW OF BIOCHEMISTRY, 1984, 53 :75-117