A factor analysis model for functional genomics

被引:25
作者
Kustra, Rafal
Shioda, Romy
Zhu, Mu
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Univ Waterloo, Dept Combinator & Optimizat, Waterloo, ON N2L 3G1, Canada
[3] Univ Waterloo, Dept Stat & Actuarial Sci, Waterloo, ON N2L 3G1, Canada
关键词
D O I
10.1186/1471-2105-7-216
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Expression array data are used to predict biological functions of uncharacterized genes by comparing their expression profiles to those of characterized genes. While biologically plausible, this is both statistically and computationally challenging. Typical approaches are computationally expensive and ignore correlations among expression profiles and functional categories. Results: We propose a factor analysis model (FAM) for functional genomics and give a two-step algorithm, using genome-wide expression data for yeast and a subset of Gene-Ontology Biological Process functional annotations. We show that the predictive performance of our method is comparable to the current best approach while our total computation time was faster by a factor of 4000. We discuss the unique challenges in performance evaluation of algorithms used for genome-wide functions genomics. Finally, we discuss extensions to our method that can incorporate the inherent correlation structure of the functional categories to further improve predictive performance. Conclusion: Our factor analysis model is a computationally efficient technique for functional genomics and provides a clear and unified statistical framework with potential for incorporating important gene ontology information to improve predictions.
引用
收藏
页数:13
相关论文
共 31 条
  • [1] [Anonymous], 2004, ADV NEURAL INFORM PR
  • [2] Bazaraa M. S., 2013, NONLINEAR PROGRAMMIN
  • [3] BERTSIMAS D, 2005, OPTIMIZATION OVER IN
  • [4] Cluster validation techniques for genome expression data
    Bolshakova, N
    Azuaje, F
    [J]. SIGNAL PROCESSING, 2003, 83 (04) : 825 - 833
  • [5] Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae
    Chen, Y
    Xu, D
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (21) : 6414 - 6424
  • [6] COUTO F, 2003, 0329 TR DIFCUL
  • [7] Comparison of discrimination methods for the classification of tumors using gene expression data
    Dudoit, S
    Fridlyand, J
    Speed, TP
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) : 77 - 87
  • [8] Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs
    Frey, BJ
    Mohammad, N
    Morris, QD
    Zhang, W
    Robinson, MD
    Mnaimneh, S
    Chang, R
    Pan, Q
    Sat, E
    Rossant, J
    Bruneau, BG
    Aubin, JE
    Blencowe, BJ
    Hughes, TR
    [J]. NATURE GENETICS, 2005, 37 (09) : 991 - 996
  • [9] Friedman J., 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5
  • [10] THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE
    HANLEY, JA
    MCNEIL, BJ
    [J]. RADIOLOGY, 1982, 143 (01) : 29 - 36