Transformation and normalization of oligonucleotide microarray data

被引:49
作者
Geller, SC [1 ]
Gregg, JP
Hagerman, P
Rocke, DM
机构
[1] Texas A&M Univ, Dept Math, College Stn, TX 77843 USA
[2] Univ Calif Davis, Sch Med, Dept Pathol, Davis, CA 95616 USA
[3] Univ Calif Davis, Sch Med, Dept Biol Chem, Davis, CA 95616 USA
[4] Univ Calif Davis, Dept Appl Sci, Davis, CA 95616 USA
[5] Univ Calif Davis, Div Biostat, Davis, CA 95616 USA
关键词
D O I
10.1093/bioinformatics/btg245
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Most methods of analyzing microarray data or doing power calculations have an underlying assumption of constant variance across all levels of gene expression. The most common transformation, the logarithm, results in data that have constant variance at high levels but not at low levels. Rocke and Durbin showed that data from spotted arrays fit a two-component model and Durbin, Hardin, Hawkins, and Rocke, Huber et al. and Munson provided a transformation that stabilizes the variance as well as symmetrizes and normalizes the error structure. We wish to evaluate the applicability of this transformation to the error structure of GeneChip microarrays. Results: We demonstrate in an example study a simple way to use the two-component model of Rocke and Durbin and the data transformation of Durbin, Hardin, Hawkins and Rocke, Huber et al. and Munson on Affymetrix GeneChip data. In addition we provide a method for normalization of Affymetrix GeneChips simultaneous with the determination of the transformation, producing a data set without chip or slide effects but with constant variance and with symmetric errors. This transformation/normalization process can be thought of as a machine calibration in that it requires a few biologically constant replicates of one sample to determine the constant needed to specify the transformation and normalize. It is hypothesized that this constant needs to be found only once for a given technology in a lab, perhaps with periodic updates. It does not require extensive replication in each study. Furthermore, the variance of the transformed pilot data can be used to do power calculations using standard power analysis programs.
引用
收藏
页码:1817 / 1823
页数:7
相关论文
共 8 条
[1]  
Durbin B P, 2002, Bioinformatics, V18 Suppl 1, pS105
[2]  
Huber Wolfgang, 2002, Bioinformatics, V18 Suppl 1, pS96
[3]   Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection [J].
Li, C ;
Wong, WH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (01) :31-36
[4]  
MUNSON PJ, 2001, GEN WORKSH LOW LEV A
[5]   (X)BAR-Q AND RQ CHARTS - ROBUST-CONTROL CHARTS [J].
ROCKE, DM .
STATISTICIAN, 1992, 41 (01) :97-104
[6]   Approximate variance-stabilizing transformations for gene-expression microarray data [J].
Rocke, DM ;
Durbin, B .
BIOINFORMATICS, 2003, 19 (08) :966-972
[7]   A model for measurement error for gene expression arrays [J].
Rocke, DM ;
Durbin, B .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (06) :557-569
[8]   ONE DEGREE OF FREEDOM FOR NON-ADDITIVITY [J].
TUKEY, JW .
BIOMETRICS, 1949, 5 (03) :232-242