High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics
被引:268
作者:
Carvalho, Carlos M.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Chicago, Grad Sch Business, Chicago, IL 60637 USAUniv Chicago, Grad Sch Business, Chicago, IL 60637 USA
Carvalho, Carlos M.
[1
]
Chang, Jeffrey
论文数: 0引用数: 0
h-index: 0
机构:Univ Chicago, Grad Sch Business, Chicago, IL 60637 USA
Chang, Jeffrey
Lucas, Joseph E.
论文数: 0引用数: 0
h-index: 0
机构:
Duke Univ, Dept Stat Sci, Durham, NC 27708 USAUniv Chicago, Grad Sch Business, Chicago, IL 60637 USA
Lucas, Joseph E.
[3
]
Nevins, Joseph R.
论文数: 0引用数: 0
h-index: 0
机构:
Duke Univ, Inst Genome Sci & Policy, Ctr Appl Genom & Technol, Durham, NC 27710 USA
Duke Univ, Med Ctr, Dept Mol Genet & Microbiol, Durham, NC 27706 USAUniv Chicago, Grad Sch Business, Chicago, IL 60637 USA
Nevins, Joseph R.
[2
,4
]
Wang, Quanli
论文数: 0引用数: 0
h-index: 0
机构:Univ Chicago, Grad Sch Business, Chicago, IL 60637 USA
Wang, Quanli
West, Mike
论文数: 0引用数: 0
h-index: 0
机构:
Duke Univ, Dept Stat Sci, Durham, NC 27708 USAUniv Chicago, Grad Sch Business, Chicago, IL 60637 USA
West, Mike
[3
]
机构:
[1] Univ Chicago, Grad Sch Business, Chicago, IL 60637 USA
[2] Duke Univ, Inst Genome Sci & Policy, Ctr Appl Genom & Technol, Durham, NC 27710 USA
[3] Duke Univ, Dept Stat Sci, Durham, NC 27708 USA
[4] Duke Univ, Med Ctr, Dept Mol Genet & Microbiol, Durham, NC 27706 USA
We describe Studies in molecular profiling and biological pathway analysis that use sparse latent factor and regression models for microarray gene expression data. We discuss breast cancer applications and key aspects of the modeling and computational methodology. Our case Studies aim to investigate and characterize heterogeneity of structure related to specific oncogenic pathways, its well as links between aggregate patterns in gene expression profiles and clinical biomarkers. Based on the metaphor of statistically derived "factors" as representing biological "subpathway" structure, we explore the decomposition of fitted sparse factor models into pathway subcomponents and investigate how these components overlay multiple aspects of known biological activity. Our methodology is based on sparsity modeling of multivariate regression, ANOVA, and latent factor models, as well as a class of models that combines all components. Hierarchical sparsity priors address questions of dimension reduction and multiple comparisons, as well its scalability of the methodology. The models include practically relevant non-Gaussian/nonparametric component,,. for latent structure. underlying often quite complex non-Gaussianity in multivariate expression patterns. Model search and fitting are addressed through stochastic simulation and evolutionary stochastic search methods that are exemplified in the oncogenic pathway Studies. Supplementary supporting material provides more details of the applications, its well as examples of the use of freely available software tools for implementing the methodology.