Dimension reduction strategies for analyzing global gene expression data with a response

被引:52
作者
Chiaromonte, F [1 ]
Martinelli, J [1 ]
机构
[1] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
关键词
microarray data; singular value decomposition; sufficient dimension reduction; sliced inverse regression; randomization;
D O I
10.1016/S0025-5564(01)00106-7
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The analysis of global gene expression data from microarrays is breaking new ground in genetics research, while confronting modelers and statisticians with many critical issues. In this paper, we consider data sets in which a categorical or continuous response is recorded, along with gene expression, on a given number of experimental samples. Data of this type are usually employed to create a prediction mechanism for the response based on gene expression, and to identify a subset of relevant genes. This defines a regression setting characterized by a dramatic under-resolution with respect to the predictors (genes), whose number exceeds by orders of magnitude the number of available observations (samples). We present a dimension reduction strategy that, under appropriate assumptions, allows us to restrict attention to a few linear combinations of the original expression profiles, and thus to overcome under-resolution. These linear combinations can then be used to build and validate a regression model with standard techniques. Moreover, they can be used to rank original predictors, and ultimately to select a subset of them through comparison with a background 'chance scenario' based on a number of independent randomizations. We apply this strategy to publicly available data on leukemia classification. (C) 2002 Published by Elsevier Science Inc.
引用
收藏
页码:123 / 144
页数:22
相关论文
共 15 条
  • [1] Singular value decomposition for genome-wide expression data processing and modeling
    Alter, O
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) : 10101 - 10106
  • [2] [Anonymous], GENOME BIOL
  • [3] [Anonymous], REGRESSION GRAPHICS
  • [4] CHIAROMONTE F, UNPUB STRUCTURES EXH
  • [5] CHIAROMONTE F, IN PRESS ANN STAT
  • [6] CHINSHANG L, 2001, ENAR IMS SPRING M
  • [7] DUDOIT S, IN PRESS J AM STAT A
  • [8] Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
    Golub, TR
    Slonim, DK
    Tamayo, P
    Huard, C
    Gaasenbeek, M
    Mesirov, JP
    Coller, H
    Loh, ML
    Downing, JR
    Caligiuri, MA
    Bloomfield, CD
    Lander, ES
    [J]. SCIENCE, 1999, 286 (5439) : 531 - 537
  • [9] Fundamental patterns underlying gene expression profiles: Simplicity from complexity
    Holter, NS
    Mitra, M
    Maritan, A
    Cieplak, M
    Banavar, JR
    Fedoroff, NV
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (15) : 8409 - 8414