CoGenT++:: an extensive and extensible data environment for computational genomics

被引:18
作者
Goldovsky, L
Janssen, P
Ahrén, D
Audit, B
Cases, I
Darzentas, N
Enright, AJ
López-Bigas, N
Peregrin-Alvarez, JM
Smith, M
Tsoka, S
Kunin, V
Ouzounis, CA [1 ]
机构
[1] European Bioinformat Inst, EMBL, Computat Genom Grp, Cambridge Outstn, Cambridge CB10 1SD, England
[2] CEN SCK, Belgian Nucl Res Ctr, Microbiol Lab, B-2400 Mol, Belgium
[3] Natl Ctr Res & Technol, Inst Agrobiotechnol, GR-57001 Thessaloniki, Greece
[4] Ecole Normale Super Lyon, Phys Lab, F-69364 Lyon, France
[5] CSIC, Natl Biotechnol Ctr, CNB, Transcript Networks Grp, E-28049 Madrid, Spain
[6] Sanger Inst, Cambridge CB10 1SA, England
[7] Hosp Sick Children, Toronto, ON M5G 1X, Canada
[8] DOE Joint Genome Inst, Walnut Creek, CA 94598 USA
基金
英国医学研究理事会;
关键词
D O I
10.1093/bioinformatics/bti579
中图分类号
Q5 [生物化学];
学科分类号
071010 [生物化学与分子生物学]; 081704 [应用化学];
摘要
Motivation: CoGenT++ is a data environment for computational research in comparative and functional genomics, designed to address issues of consistency, reproducibility, scalability and accessibility. Description: CoGenT++ facilitates the re-distribution of all fully sequenced and published genomes, storing information about species, gene names and protein sequences. We describe our scalable implementation of ProXSim, a continually updated all-against-all similarity database, which stores pairwise relationships between all genome sequences. Based on these similarities, derived databases are generated for gene fusions-AllFuse, putative orthologs-OFAM, protein families-TRIBES, phylogenetic profiles-ProfUse and phylogenetic trees. Extensions based on the CoGenT++ environment include disease gene prediction, pattern discovery, automated domain detection, genome annotation and ancestral reconstruction. Conclusion: CoGenT++ provides a comprehensive environment for computational genomics, accessible primarily for large-scale analyses as well as manual browsing.
引用
收藏
页码:3806 / 3810
页数:5
相关论文
共 35 条
[1]
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]
Automated genome sequence analysis and annotation [J].
Andrade, MA ;
Brown, NP ;
Leroy, C ;
Hoersch, S ;
de Daruvar, A ;
Reich, C ;
Franchini, A ;
Tamames, J ;
Valencia, A ;
Ouzounis, C ;
Sander, C .
BIOINFORMATICS, 1999, 15 (05) :391-412
[3]
The universal protein resource (UniProt) [J].
Bairoch, A ;
Apweiler, R ;
Wu, CH ;
Barker, WC ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E ;
Huang, HZ ;
Lopez, R ;
Magrane, M ;
Martin, MJ ;
Natale, DA ;
O'Donovan, C ;
Redaschi, N ;
Yeh, LSL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D154-D159
[4]
GenBank [J].
Benson, DA ;
Karsch-Mizrachi, I ;
Lipman, DJ ;
Ostell, J ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D34-D38
[5]
The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[6]
Prolinks: a database of protein functional linkages derived from coevolution [J].
Bowers, PM ;
Pellegrini, M ;
Thompson, MJ ;
Fierro, J ;
Yeates, TO ;
Eisenberg, D .
GENOME BIOLOGY, 2004, 5 (05)
[7]
Deshpande N, 2005, NUCLEIC ACIDS RES, V33, pD233
[8]
GeneRAGE: a robust algorithm for sequence clustering and domain detection [J].
Enright, AJ ;
Ouzounis, CA .
BIOINFORMATICS, 2000, 16 (05) :451-457
[9]
Protein families and TRIBES in genome sequence space [J].
Enright, AJ ;
Kunin, V ;
Ouzounis, CA .
NUCLEIC ACIDS RESEARCH, 2003, 31 (15) :4632-4638
[10]
An efficient algorithm for large-scale detection of protein families [J].
Enright, AJ ;
Van Dongen, S ;
Ouzounis, CA .
NUCLEIC ACIDS RESEARCH, 2002, 30 (07) :1575-1584