A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database

被引:40
作者
Dehal, Paramvir S.
Boore, Jeffrey L.
机构
[1] DOE Joint Genome Inst, Evolutionary Genom Dept, Walnut Creek, CA 94598 USA
[2] Lawrence Berkeley Natl Lab, Walnut Creek, CA 94598 USA
[3] Univ Calif Berkeley, Dept Integrat Biol, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
D O I
10.1186/1471-2105-7-201
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community. Discussion: The PhIGs database currently contains 23 completely sequenced genomes of fungi and metazoans, containing 409,653 genes that have been grouped into 42,645 gene clusters. Each gene cluster is built such that the gene sequence distances are consistent with the known organismal relationships and in so doing, maximizing the likelihood for the clusters to represent truly orthologous genes. The PhIGs website contains tools that allow the study of genes within their phylogenetic framework through keyword searches on annotations, such as GO and InterPro assignments, and sequence similarity searches by BLAST and HMM. In addition to displaying the evolutionary relationships of the genes in each cluster, the website also allows users to view the relative physical positions of homologous genes in specified sets of genomes. Summary: Accurate analyses of genes and genomes can only be done within their full phylogenetic context. The PhIGs database and corresponding website http://phigs.org address this problem for the scientific community. Our goal is to expand the content as more genomes are sequenced and use this framework to incorporate more analyses.
引用
收藏
页数:9
相关论文
共 21 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] [Anonymous], 2004, PHYLIP PHYLOGENY INF
  • [3] The InterPro database, an integrated documentation resource for protein families, domains and functional sites
    Apweiler, R
    Attwood, TK
    Bairoch, A
    Bateman, A
    Birney, E
    Biswas, M
    Bucher, P
    Cerutti, T
    Corpet, F
    Croning, MDR
    Durbin, R
    Falquet, L
    Fleischmann, W
    Gouzy, J
    Hermjakob, H
    Hulo, N
    Jonassen, I
    Kahn, D
    Kanapin, A
    Karavidopoulou, Y
    Lopez, R
    Marx, B
    Mulder, NJ
    Oinn, TM
    Pagni, M
    Servant, F
    Sigrist, CJA
    Zdobnov, EM
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 37 - 40
  • [4] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [5] The Jalview Java']Java alignment editor
    Clamp, M
    Cuff, J
    Searle, SM
    Barton, GJ
    [J]. BIOINFORMATICS, 2004, 20 (03) : 426 - 427
  • [6] Two rounds of whole genome duplication in the ancestral vertebrate
    Dehal, P
    Boore, JL
    [J]. PLOS BIOLOGY, 2005, 3 (10) : 1700 - 1708
  • [7] THE MULTIPLICITY OF DOMAINS IN PROTEINS
    DOOLITTLE, RF
    [J]. ANNUAL REVIEW OF BIOCHEMISTRY, 1995, 64 : 287 - 314
  • [8] Phylogenomics: Improving functional predictions for uncharacterized genes by evolutionary analysis
    Eisen, JA
    [J]. GENOME RESEARCH, 1998, 8 (03): : 163 - 167
  • [9] An efficient algorithm for large-scale detection of protein families
    Enright, AJ
    Van Dongen, S
    Ouzounis, CA
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (07) : 1575 - 1584
  • [10] DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS
    FITCH, WM
    [J]. SYSTEMATIC ZOOLOGY, 1970, 19 (02): : 99 - &