Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables

被引:214
作者
Zapala, Matthew A.
Schork, Nicholas J. [1 ]
机构
[1] Univ Calif San Diego, Moores UCSD Canc Ctr, Ctr Human Genet & Genom, Dept Psychiat,Biomed Sci Grad Program, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Moores UCSD Canc Ctr, Ctr Human Genet & Genom, Dept Psychiat,Polymorphism Res Lab, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Moores UCSD Canc Ctr, Ctr Human Genet & Genom, Dept Family & Prevent Med,Div Biostat, La Jolla, CA 92093 USA
[4] Univ Calif San Diego, Calif Inst Telecommun & Informat Technol, La Jolla, CA 92093 USA
关键词
analysis of variance; high-dimensional data; SINGULAR-VALUE DECOMPOSITION; CYCLIN-DEPENDENT KINASE-5; PHYLOGENETIC TREES; UP-REGULATION; COMPLEX;
D O I
10.1073/pnas.0609333103
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A fundamental step in the analysis of gene expression and other high-dimensional genomic data is the calculation of the similarity or distance between pairs of individual samples in a study. If one has collected N total samples and assayed the expression level of G genes on those samples, then an N x N similarity matrix can be formed that reflects the correlation or similarity of the samples with respect to the expression values over the G genes. This matrix can then be examined for patterns via standard data reduction and cluster analysis techniques. We consider an alternative to conventional data reduction and cluster analyses of similarity matrices that is rooted in traditional linear models. This analysis method allows predictor variables collected on the samples to be related to variation in the pairwise similarity/distance values reflected in the matrix. The proposed multivariate method avoids the need for reducing the dimensions of a similarity matrix, can be used to assess relationships between the genes used to construct the matrix and additional information collected on the samples under study, and can be used to analyze individual genes or groups of genes identified in different ways. The technique can be used with any high-dimensional assay or data type and is ideally suited for testing subsets of genes defined by their participation in a biochemical pathway or other a priori grouping. We showcase the methodology using three published gene expression data sets.
引用
收藏
页码:19430 / 19435
页数:6
相关论文
共 49 条
  • [11] How does gene expression clustering work?
    D'haeseleer, P
    [J]. NATURE BIOTECHNOLOGY, 2005, 23 (12) : 1499 - 1501
  • [12] Cannabinoids protect astrocytes from ceramide-induced apoptosis through the phosphatidylinositol 3-kinase/protein kinase B pathway
    del Pulgar, TG
    de Ceballos, ML
    Guzmán, M
    Velasco, G
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 2002, 277 (39) : 36527 - 36533
  • [13] Edgington E.S., 1995, Randomization Tests, V3rd Edn
  • [14] Cluster analysis and display of genome-wide expression patterns
    Eisen, MB
    Spellman, PT
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) : 14863 - 14868
  • [15] Testing association of a pathway with survival using gene expression data
    Goeman, JJ
    Oosting, J
    Cleton-Jansen, AM
    Anninga, JK
    van Houwelingen, HC
    [J]. BIOINFORMATICS, 2005, 21 (09) : 1950 - 1957
  • [16] Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
    Golub, TR
    Slonim, DK
    Tamayo, P
    Huard, C
    Gaasenbeek, M
    Mesirov, JP
    Coller, H
    Loh, ML
    Downing, JR
    Caligiuri, MA
    Bloomfield, CD
    Lander, ES
    [J]. SCIENCE, 1999, 286 (5439) : 531 - 537
  • [17] METRIC AND EUCLIDEAN PROPERTIES OF DISSIMILARITY COEFFICIENTS
    GOWER, JC
    LEGENDRE, P
    [J]. JOURNAL OF CLASSIFICATION, 1986, 3 (01) : 5 - 48
  • [18] Analysis of distance for structured multivariate data and extensions to multivariate analysis of variance
    Gower, JC
    Krzanowski, WJ
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 1999, 48 : 505 - 519
  • [19] Quantitative analysis of complex protein mixtures using isotope-coded affinity tags
    Gygi, SP
    Rist, B
    Gerber, SA
    Turecek, F
    Gelb, MH
    Aebersold, R
    [J]. NATURE BIOTECHNOLOGY, 1999, 17 (10) : 994 - 999
  • [20] Visualising very large phylogenetic trees in three dimensional hyperbolic space
    Hughes, T
    Hyun, Y
    Liberles, DA
    [J]. BMC BIOINFORMATICS, 2004, 5 (1)