Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing

被引:42
作者
Chan, Chon-Kit Kenneth [1 ]
Hsu, Arthur L. [1 ]
Tang, Sen-Lin [2 ]
Halgamuge, Saman K. [1 ]
机构
[1] Univ Melbourne, Dept Mech Engn, Dynam Syst & Control Grp, Melbourne, Vic 3010, Australia
[2] Acad Sinica, Res Ctr Biodivers, Taipei 115, Taiwan
来源
JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY | 2008年
关键词
D O I
10.1155/2008/513701
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suitable training features, implementing a better clustering algorithm, and defining quantitative measures for assessing results. We investigated the suitability of each of di-, tri-, tetra-, and pentanucleotide frequencies. The results show that dinucleotide frequency is not a sufficiently strong signature for binning 10 kb long DNA sequences, compared to the other three. Furthermore, we observed that increased order of oligonucleotide frequency may deteriorate the assignment result in some cases, which indicates the possible existence of optimal species-specific oligonucleotide frequency. We replaced SOM with growing self-organising map (GSOM) where comparable results are obtained while gaining 7%-15% speed improvement. Copyright (c) 2008.
引用
收藏
页数:10
相关论文
共 36 条
[1]   Self-Organizing Map (SOM) unveils and visualizes hidden sequence characteristics of a wide range of eukaryote genomes [J].
Abe, T ;
Sugawara, H ;
Kanaya, S ;
Kinouchi, M ;
Ikemura, T .
GENE, 2006, 365 :27-34
[2]   Informatics for unveiling hidden genome signatures [J].
Abe, T ;
Kanaya, S ;
Kinouchi, M ;
Ichiba, Y ;
Kozuki, T ;
Ikemura, T .
GENOME RESEARCH, 2003, 13 (04) :693-702
[3]   Dynamic self-organizing maps with controlled growth for knowledge discovery [J].
Alahakoon, D ;
Halgamuge, SK ;
Srinivasan, B .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2000, 11 (03) :601-614
[4]  
Amarasiri R, 2003, FRONT ARTIF INTEL AP, V104, P682
[5]   Genome sequencing in the fast lane [J].
Bonetta, L .
NATURE METHODS, 2006, 3 (02) :141-147
[6]   Bioinformatics for whole-genome shotgun sequencing of microbial communities [J].
Chen, K ;
Pachter, L .
PLOS COMPUTATIONAL BIOLOGY, 2005, 1 (02) :106-112
[7]  
Chen S, 2005, 2005 IEEE INTERNATIONAL CONFERENCE ON E-TECHNOLOGY, E-COMMERCE AND E-SERVICE, PROCEEDINGS, P202
[8]   Environmental shotgun sequencing: Its potential and challenges for studying the hidden world of microbes [J].
Eisen, Jonathan A. .
PLOS BIOLOGY, 2007, 5 (03) :384-388
[9]   Self-evolving neural networks for rule-based data processing [J].
Halgamuge, SK .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1997, 45 (11) :2766-2773
[10]   FUZZY NEURAL NETWORKS - BETWEEN FUNCTIONAL EQUIVALENCE AND APPLICABILITY [J].
HALGAMUGE, SK ;
GLESNER, M .
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 1995, 6 (02) :185-196