Maximum likelihood estimation of optimal scaling factors for expression array normalization

被引:22
作者
Hartemink, AJ [1 ]
Gifford, DK [1 ]
Jaakkola, TS [1 ]
Young, RA [1 ]
机构
[1] MIT, Comp Sci Lab, Cambridge, MA 02139 USA
来源
MICROARRAYS: OPTICAL TECHNOLOGIES AND INFORMATICS | 2001年 / 4266卷
关键词
normalization; scaling; microarray; oligonucleotide array; DNA chip; maximum likelihood; maximum a posteriori; MAP;
D O I
10.1117/12.427981
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Data from expression arrays must be comparable before it can be analyzed rigorously on a large scale. Accurate normalization improves the comparability of expression data because it seeks to account for sources of variation obscuring the underlying variation of interest. Undesirable variation in reported expression levels originates in the preparation and hybridization of the sample as well as in the manufacture of the array itself. and may differ depending on the array technology being employed. Published research to date has not characterized the degree of variation associated with these sources, and results are often reported without tight statistical bounds on their significance. We analyze the distributions of reported levels of exogenous control species spiked into samples applied to 1280 Affymetrix arrays. We develop a model for explaining reported expression levels under an assumption of primarily multiplicative variation. To compute the. scaling factors needed for normalization, we derive maximum likelihood and maximum a posteriori estimates for the parameters characterizing the multiplicative variation in reported spiked control expression levels. We conclude that the optimal scaling factors in this context are weighted geometric means and determine the appropriate weights. The optimal scaling factor estimates so computed can be used for subsequent array normalization.
引用
收藏
页码:132 / 140
页数:9
相关论文
共 14 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[3]  
CHEN T, 1999, 3 ANN INT C COMP MOL
[4]  
DERISI J, 1998, MGUIDE COMPLETE GUID
[5]  
FRIEDMAN N, 2000, 4 ANN INT C COMP MOL
[6]  
HARTEMINK AJ, 2001, PAC S BIOC, V6
[7]   Fundamental patterns underlying gene expression profiles: Simplicity from complexity [J].
Holter, NS ;
Mitra, M ;
Maritan, A ;
Cieplak, M ;
Banavar, JR ;
Fedoroff, NV .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (15) :8409-8414
[8]   Functional discovery via a compendium of expression profiles [J].
Hughes, TR ;
Marton, MJ ;
Jones, AR ;
Roberts, CJ ;
Stoughton, R ;
Armour, CD ;
Bennett, HA ;
Coffey, E ;
Dai, HY ;
He, YDD ;
Kidd, MJ ;
King, AM ;
Meyer, MR ;
Slade, D ;
Lum, PY ;
Stepaniants, SB ;
Shoemaker, DD ;
Gachotte, D ;
Chakraburtty, K ;
Simon, J ;
Bard, M ;
Friend, SH .
CELL, 2000, 102 (01) :109-126
[9]  
KERR MK, 2000, J COMPUTATIONAL BIOL, V7
[10]   Light-directed synthesis of high-density oligonucleotide arrays using semiconductor photoresists [J].
McGall, G ;
Labadie, J ;
Brock, P ;
Wallraff, G ;
Nguyen, T ;
Hinsberg, W .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (24) :13555-13560