CoGenT++:: an extensive and extensible data environment for computational genomics

被引：18

作者：

Goldovsky, L

Janssen, P

Ahrén, D

Audit, B

Cases, I

Darzentas, N

Enright, AJ

López-Bigas, N

Peregrin-Alvarez, JM

Smith, M

Tsoka, S

Kunin, V

Ouzounis, CA ^{[1
]}

机构：

[1] European Bioinformat Inst, EMBL, Computat Genom Grp, Cambridge Outstn, Cambridge CB10 1SD, England

[2] CEN SCK, Belgian Nucl Res Ctr, Microbiol Lab, B-2400 Mol, Belgium

[3] Natl Ctr Res & Technol, Inst Agrobiotechnol, GR-57001 Thessaloniki, Greece

[4] Ecole Normale Super Lyon, Phys Lab, F-69364 Lyon, France

[5] CSIC, Natl Biotechnol Ctr, CNB, Transcript Networks Grp, E-28049 Madrid, Spain

[6] Sanger Inst, Cambridge CB10 1SA, England

[7] Hosp Sick Children, Toronto, ON M5G 1X, Canada

[8] DOE Joint Genome Inst, Walnut Creek, CA 94598 USA

来源：

BIOINFORMATICS | 2005年 / 21卷 / 19期

基金：

英国医学研究理事会;

关键词：

D O I：

10.1093/bioinformatics/bti579

中图分类号：

Q5 [生物化学];

学科分类号：

071010 [生物化学与分子生物学]; 081704 [应用化学];

摘要：

Motivation: CoGenT++ is a data environment for computational research in comparative and functional genomics, designed to address issues of consistency, reproducibility, scalability and accessibility. Description: CoGenT++ facilitates the re-distribution of all fully sequenced and published genomes, storing information about species, gene names and protein sequences. We describe our scalable implementation of ProXSim, a continually updated all-against-all similarity database, which stores pairwise relationships between all genome sequences. Based on these similarities, derived databases are generated for gene fusions-AllFuse, putative orthologs-OFAM, protein families-TRIBES, phylogenetic profiles-ProfUse and phylogenetic trees. Extensions based on the CoGenT++ environment include disease gene prediction, pattern discovery, automated domain detection, genome annotation and ancestral reconstruction. Conclusion: CoGenT++ provides a comprehensive environment for computational genomics, accessible primarily for large-scale analyses as well as manual browsing.

引用

页码：3806 / 3810

页数：5

共 35 条

[1]

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].