The Sequence Analysis and Management System - SAMS-2.0: Data management and sequence analysis adapted to changing requirements from traditional sanger sequencing to ultrafast sequencing technologies

被引:23
作者
Bekel, Thomas [1 ]
Henckel, Kolja [1 ]
Kuester, Helge [2 ]
Meyer, Folker [3 ]
Runte, Virginie Mittard
Neuweger, Heiko [1 ]
Paarmann, Daniel [3 ]
Rupp, Oliver
Zakrzewski, Martha
Puehler, Alfred [4 ]
Stoye, Jens [5 ]
Goesmann, Alexander
机构
[1] Univ Bielefeld, Ctr Biotechnol CeBiTec, Int NRW Grad Sch Bioinformat & Genome Res, D-33594 Bielefeld, Germany
[2] Leibniz Univ Hannover, Inst Plant Genet, D-30419 Hannover, Germany
[3] Argonne Natl Lab, Argonne, IL 60439 USA
[4] Univ Bielefeld, Lehrstuhl Genet, D-33594 Bielefeld, Germany
[5] Univ Bielefeld, Tech Fak, AG Genominformat, D-33594 Bielefeld, Germany
关键词
Whole genome shotgun sequencing; DNA sequence quality control; cDNA sequencing; EST clustering; Ultrafast sequencing; COMPLETE GENOME SEQUENCE; TIGR GENE INDEXES; ARBUSCULAR MYCORRHIZA; EST; BACTERIUM; REVEALS; TOOL; RECONSTRUCTION; METAGENOME; INSIGHTS;
D O I
10.1016/j.jbiotec.2009.01.006
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
DNA sequencing plays a more and more important role in various fields of genetics. This includes sequencing of whole genomes, libraries of cDNA clones and probes of metagenome communities. The applied sequencing technologies evolve permanently. Willi the emergence of ultrafast sequencing technologies, a new era of DNA sequencing has recently started. Concurrently, the needs for adapted bioinformatics tools arise. Since the ability to process current datasets efficiently is essential for modern genetics, a modular bioinformatics platform providing extensive sequence analysis methods, is designated to achieve well the constantly growing requirements. The Sequence Analysis and Management System (SAMS) is a bioinformatics software platform with a database backend designed to Support the computational analysis of (1) whole genome shotgun (WGS) bacterial genome sequencing, (2) cDNA sequencing by reading expressed sequence tags (ESTs) as well as (3) sequence data obtained by Ultrafast sequencing. It provides extensive bioinformatics analysis of sequenced single reads. sequencing libraries and fragments of arbitrary DNA sequences such as assembled contigs of metagenome reads for instance. The system has been implemented to cope with several thousands of sequences, efficiently processing them and storing the results for further analysis. With the project set up, SAMS automatically recognizes the data type. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:3 / 12
页数:10
相关论文
共 47 条
[1]   COMPLEMENTARY-DNA SEQUENCING - EXPRESSED SEQUENCE TAGS AND HUMAN GENOME PROJECT [J].
ADAMS, MD ;
KELLEY, JM ;
GOCAYNE, JD ;
DUBNICK, M ;
POLYMEROPOULOS, MH ;
XIAO, H ;
MERRIL, CR ;
WU, A ;
OLDE, B ;
MORENO, RF ;
KERLAVAGE, AR ;
MCCOMBIE, WR ;
VENTER, JC .
SCIENCE, 1991, 252 (5013) :1651-1656
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]  
[Anonymous], 2004, RepeatMasker Open-3.0 [Internet]
[4]   The InterPro database, an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, T ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :37-40
[5]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[6]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[7]   DBEST - DATABASE FOR EXPRESSED SEQUENCE TAGS [J].
BOGUSKI, MS ;
LOWE, TMJ ;
TOLSTOSHEV, CM .
NATURE GENETICS, 1993, 4 (04) :332-333
[8]   EMMA:: a platform for consistent storage and efficient analysis of microarray data [J].
Dondrup, M ;
Goesmann, A ;
Bartels, D ;
Kalinowski, J ;
Krause, L ;
Linke, B ;
Rupp, O ;
Sczyrba, A ;
Pühler, A ;
Meyer, F .
JOURNAL OF BIOTECHNOLOGY, 2003, 106 (2-3) :135-146
[9]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[10]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194