SIMAP-a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters

被引:37
作者
Rattei, Thomas [1 ]
Tischler, Patrick [1 ]
Goetz, Stefan [2 ]
Jehl, Marc-Andre [1 ]
Hoser, Jonathan [1 ]
Arnold, Roland [1 ]
Conesa, Ana [2 ]
Mewes, Hans-Werner [1 ,3 ]
机构
[1] Tech Univ Munich, Wissensch Zentrum Weihenstephan, Dept Genome Oriented Bioinformat, D-8050 Freising Weihenstephan, Germany
[2] Ctr Invest Principe Felipe, Bioinformat Dept, Valencia, Spain
[3] German Res Ctr Environm Hlth GmbH, Helmholtz Zentrum Munchen, Inst Bioinformat & Syst Biol MIPS, Neuherberg, Germany
关键词
BLAST2GO; GENOMICS; MATRIX;
D O I
10.1093/nar/gkp949
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date precalculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).
引用
收藏
页码:D223 / D226
页数:4
相关论文
共 19 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   SIMAP -: The similarity matrix of proteins [J].
Arnold, R ;
Rattei, T ;
Tischler, P ;
Truong, MD ;
Stümpflen, V ;
Mewes, W .
BIOINFORMATICS, 2005, 21 :42-46
[3]   Blast2GO:: a universal tool for annotation, visualization and analysis in functional genomics research [J].
Conesa, A ;
Götz, S ;
García-Gómez, JM ;
Terol, J ;
Talón, M ;
Robles, M .
BIOINFORMATICS, 2005, 21 (18) :3674-3676
[4]   High-throughput functional annotation and data mining with the Blast2GO suite [J].
Gotz, Stefan ;
Garcia-Gomez, Juan Miguel ;
Terol, Javier ;
Williams, Tim D. ;
Nagaraj, Shivashankar H. ;
Nueda, Maria Jose ;
Robles, Montserrat ;
Talon, Manuel ;
Dopazo, Joaquin ;
Conesa, Ana .
NUCLEIC ACIDS RESEARCH, 2008, 36 (10) :3420-3435
[5]   Metagenomics: Application of genomics to uncultured microorganisms [J].
Handelsman, J .
MICROBIOLOGY AND MOLECULAR BIOLOGY REVIEWS, 2004, 68 (04) :669-+
[6]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919
[7]   Ensembl 2009 [J].
Hubbard, T. J. P. ;
Aken, B. L. ;
Ayling, S. ;
Ballester, B. ;
Beal, K. ;
Bragin, E. ;
Brent, S. ;
Chen, Y. ;
Clapham, P. ;
Clarke, L. ;
Coates, G. ;
Fairley, S. ;
Fitzgerald, S. ;
Fernandez-Banet, J. ;
Gordon, L. ;
Graf, S. ;
Haider, S. ;
Hammond, M. ;
Holland, R. ;
Howe, K. ;
Jenkinson, A. ;
Johnson, N. ;
Kahari, A. ;
Keefe, D. ;
Keenan, S. ;
Kinsella, R. ;
Kokocinski, F. ;
Kulesha, E. ;
Lawson, D. ;
Longden, I. ;
Megy, K. ;
Meidl, P. ;
Overduin, B. ;
Parker, A. ;
Pritchard, B. ;
Rios, D. ;
Schuster, M. ;
Slater, G. ;
Smedley, D. ;
Spooner, W. ;
Spudich, G. ;
Trevanion, S. ;
Vilella, A. ;
Vogel, J. ;
White, S. ;
Wilder, S. ;
Zadissa, A. ;
Birney, E. ;
Cunningham, F. ;
Curwen, V. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D690-D697
[8]   InterPro: the integrative protein signature database [J].
Hunter, Sarah ;
Apweiler, Rolf ;
Attwood, Teresa K. ;
Bairoch, Amos ;
Bateman, Alex ;
Binns, David ;
Bork, Peer ;
Das, Ujjwal ;
Daugherty, Louise ;
Duquenne, Lauranne ;
Finn, Robert D. ;
Gough, Julian ;
Haft, Daniel ;
Hulo, Nicolas ;
Kahn, Daniel ;
Kelly, Elizabeth ;
Laugraud, Aurelie ;
Letunic, Ivica ;
Lonsdale, David ;
Lopez, Rodrigo ;
Madera, Martin ;
Maslen, John ;
McAnulla, Craig ;
McDowall, Jennifer ;
Mistry, Jaina ;
Mitchell, Alex ;
Mulder, Nicola ;
Natale, Darren ;
Orengo, Christine ;
Quinn, Antony F. ;
Selengut, Jeremy D. ;
Sigrist, Christian J. A. ;
Thimma, Manjula ;
Thomas, Paul D. ;
Valentin, Franck ;
Wilson, Derek ;
Wu, Cathy H. ;
Yeats, Corin .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D211-D215
[9]   MEGAN analysis of metagenomic data [J].
Huson, Daniel H. ;
Auch, Alexander F. ;
Qi, Ji ;
Schuster, Stephan C. .
GENOME RESEARCH, 2007, 17 (03) :377-386
[10]   IMG/M: a data management and analysis system for metagenomes [J].
Markowitz, Victor M. ;
Ivanova, Natalia N. ;
Szeto, Ernest ;
Palaniappan, Krishna ;
Chu, Ken ;
Dalevi, Daniel ;
Chen, I-Min A. ;
Grechkin, Yuri ;
Dubchak, Inna ;
Anderson, Iain ;
Lykidis, Athanasios ;
Mavromatis, Konstantinos ;
Hugenholtz, Philip ;
Kyrpides, Nikos C. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D534-D538