Self-Organizing Map (SOM) unveils and visualizes hidden sequence characteristics of a wide range of eukaryote genomes

被引:33
作者
Abe, T
Sugawara, H
Kanaya, S
Kinouchi, M
Ikemura, T [1 ]
机构
[1] Grad Univ Adv Studies Sokendai, Hayama Ctr Adv Res, Hayama, Kanagawa 2400193, Japan
[2] Grad Univ Adv Studies Sokendai, Shizuoka 4118540, Japan
[3] Natl Inst Genet, DNA Bank Japan, Shizuoka 4118540, Japan
[4] Natl Inst Genet, Ctr Informat Biol, Shizuoka 4118540, Japan
[5] Nara Inst Sci & Technol, Grad Sch Informat Sci, Dept Bioinformat & Genomes, Nara 6300101, Japan
[6] Yamagata Univ, Fac Engn, Dept Biosyst Engn, Yamagata 9928510, Japan
关键词
D O I
10.1016/j.gene.2005.09.040
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Novel tools are needed for comprehensive comparisons of interspecies characteristics of massive amounts of genomic sequences currently available. An unsupervised neural network algorithm, Self-Organizing Map (SOM), is an effective tool for clustering and visualizing high-dimensional complex data on a single map. We modified the conventional SOM, oil the basis of batch-learning SOM, for genome informatics making the learning process and resulting map independent of the order of data input. We generated the SOMs for tri- and tetranucleotide frequencies in 10- and 100-kb sequence fragments from 38 eukaryotes for which almost complete genome sequences are available. SOM recognized species-specific characteristics (key combinations of oligonucleotide frequencies) in the genomic sequences, permitting species-specific classification of the sequences without any information regarding the species. We also generated the SOM for tetranucleotide frequencies in 1-kb sequence fragments from the human genome and found sequences for four functional categories (5' and 3' UTRs, CDSs and introns) were classified primarily according to the categories. Because the classification and visualization power is very high, SOM is an efficient and powerful tool for extracting a wide range of genome information. (C) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:27 / 34
页数:8
相关论文
共 28 条
[1]   Informatics for unveiling hidden genome signatures [J].
Abe, T ;
Kanaya, S ;
Kinouchi, M ;
Ichiba, Y ;
Kozuki, T ;
Ikemura, T .
GENOME RESEARCH, 2003, 13 (04) :693-702
[2]  
Abe Takashi, 2002, Genome Inform, V13, P12
[3]   THE MOSAIC GENOME OF WARM-BLOODED VERTEBRATES [J].
BERNARDI, G ;
OLOFSSON, B ;
FILIPSKI, J ;
ZERIAL, M ;
SALINAS, J ;
CUNY, G ;
MEUNIERROTIVAL, M ;
RODIER, F .
SCIENCE, 1985, 228 (4702) :953-958
[4]  
Bernardi G., 2004, STRUCTURAL EVOLUTION
[5]  
Bolshoy Alexander, 2003, Appl Bioinformatics, V2, P103
[6]   AU-RICH ELEMENTS - CHARACTERIZATION AND IMPORTANCE IN MESSENGER-RNA DEGRADATION [J].
CHEN, CYA ;
SHYU, AB .
TRENDS IN BIOCHEMICAL SCIENCES, 1995, 20 (11) :465-470
[7]   TRANSLATIONAL REGULATION IN DEVELOPMENT [J].
CURTIS, D ;
LEHMANN, R ;
ZAMORE, PD .
CELL, 1995, 81 (02) :171-178
[8]   DIVERSITY OF CYTOPLASMIC FUNCTIONS FOR THE 3' UNTRANSLATED REGION OF EUKARYOTIC TRANSCRIPTS [J].
DECKER, CJ ;
PARKER, P .
CURRENT OPINION IN CELL BIOLOGY, 1995, 7 (03) :386-392
[9]   Genome-scale compositional comparisons in eukaryotes [J].
Gentles, AJ ;
Karlin, S .
GENOME RESEARCH, 2001, 11 (04) :540-546
[10]  
GRAZIANO P, 1998, NUCLEIC ACIDS RES, V26, P192