Predicting functional gene links from phylogenetic-statistical analyses of whole genomes

被引:140
作者
Barker, D [1 ]
Pagel, M [1 ]
机构
[1] Univ Reading, Sch Anim & Microbial Sci, Reading RG6 2AJ, Berks, England
关键词
D O I
10.1371/journal.pcbi.0010003
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
An important element of the developing field of proteomics is to understand protein-protein interactions and other functional links amongst genes. Across-species correlation methods for detecting functional links work on the premise that functionally linked proteins will tend to show a common pattern of presence and absence across a range of genomes. We describe a maximum likelihood statistical model for predicting functional gene linkages. The method detects independent instances of the correlated gain or loss of pairs of proteins on phylogenetic trees, reducing the high rates of false positives observed in conventional across-species methods that do not explicitly incorporate a phylogeny. We show, in a dataset of 10,551 protein pairs, that the phylogenetic method improves by up to 35% on across-species analyses at identifying known functionally linked proteins. The method shows that protein pairs with at least two to three correlated events of gain or loss are almost certainly functionally linked. Contingent evolution, in which one gene's presence or absence depends upon the presence of another, can also be detected phylogenetically, and may identify genes whose functional significance depends upon its interaction with other genes. Incorporating phylogenetic information improves the prediction of functional linkages. The improvement derives from having a lower rate of false positives and from detecting trends that across-species analyses miss. Phylogenetic methods can easily be incorporated into the screening of large-scale bioinformatics datasets to identify sets of protein links and to characterise gene networks.
引用
收藏
页码:24 / 31
页数:8
相关论文
共 40 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] [Anonymous], 2005, The Evolution of Cultural Diversity: A Phylogenetic Approach
  • [3] Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages
    Date, SV
    Marcotte, EM
    [J]. NATURE BIOTECHNOLOGY, 2003, 21 (09) : 1055 - 1062
  • [4] Protein function in the post-genomic era
    Eisenberg, D
    Marcotte, EM
    Xenarios, I
    Yeates, TO
    [J]. NATURE, 2000, 405 (6788) : 823 - 826
  • [5] Modularity in the gain and loss of genes: applications for function prediction
    Ettema, T
    van der Oost, J
    Huynen, M
    [J]. TRENDS IN GENETICS, 2001, 17 (09) : 485 - 487
  • [6] EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH
    FELSENSTEIN, J
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) : 368 - 376
  • [7] FELSENSTEIN J, 1985, AM NAT, V125, P1, DOI 10.1086/284325
  • [8] Coevolution of gene expression among interacting proteins
    Fraser, HB
    Hirsh, AE
    Wall, DP
    Eisen, MB
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (24) : 9033 - 9038
  • [9] Gilks W., 1995, Markov Chain Monte Carlo in Practice, DOI 10.1201/b14835
  • [10] Recurrent invasion and extinction of a selfish gene
    Goddard, MR
    Burt, A
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (24) : 13880 - 13885