Considerations when using the significance analysis of microarrays (SAM) algorithm

被引:102
作者
Larsson, O [1 ]
Wahlestedt, C [1 ]
Timmons, JA [1 ]
机构
[1] Karolinska Inst, Ctr Genom & Bioinformat, S-17177 Stockholm, Sweden
关键词
D O I
10.1186/1471-2105-6-129
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Users of microarray technology typically strive to use universally acceptable data analysis strategies to determine significant expression changes in their experiments. One of the most frequently utilised methods for gene expression data analysis is SAM ( significance analysis of microarrays). The impact of selection thresholds, on the output from SAM, may critically alter the conclusion of a study, yet this consideration has not been systematically evaluated in any publication. Results: We have examined the effect of discrete data selection criteria ( qualification criteria for inclusion) and response thresholds (out-put filtering) on the number of significant genes reported by SAM. The use of a reduced data set by applying arbitrary restrictions vis-a-vis abundance calls ( e. g. from D-chip) or application of the fold change (FC) option within SAM ( named the FC hurdle hereafter), can substantially alter the significant gene list when running SAM in Microsoft Excel. We determined that for a given final FC criteria ( e. g. 1.5 fold change) the FC hurdle applied within Microsoft Excel SAM alters the number of reported genes above the final FC criteria. The reason is that the FC hurdle changes the composition of the control data set, such that a different significance level (q-value) is obtained for any given gene. This effect can be so large that it changes subsequent post hoc analysis interpretation, such as ontology overrepresentation analysis. Conclusion: Our results argue for caution when using SAM. All data sets analysed with SAM could be reanalysed taking into account the potential impact of the use of arbitrary thresholds to trim data sets before significance testing.
引用
收藏
页数:6
相关论文
共 10 条
[1]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[2]   The Gene Ontology (GO) database and informatics resource [J].
Harris, MA ;
Clark, J ;
Ireland, A ;
Lomax, J ;
Ashburner, M ;
Foulger, R ;
Eilbeck, K ;
Lewis, S ;
Marshall, B ;
Mungall, C ;
Richter, J ;
Rubin, GM ;
Blake, JA ;
Bult, C ;
Dolan, M ;
Drabkin, H ;
Eppig, JT ;
Hill, DP ;
Ni, L ;
Ringwald, M ;
Balakrishnan, R ;
Cherry, JM ;
Christie, KR ;
Costanzo, MC ;
Dwight, SS ;
Engel, S ;
Fisk, DG ;
Hirschman, JE ;
Hong, EL ;
Nash, RS ;
Sethuraman, A ;
Theesfeld, CL ;
Botstein, D ;
Dolinski, K ;
Feierbach, B ;
Berardini, T ;
Mundodi, S ;
Rhee, SY ;
Apweiler, R ;
Barrell, D ;
Camon, E ;
Dimmer, E ;
Lee, V ;
Chisholm, R ;
Gaudet, P ;
Kibbe, W ;
Kishore, R ;
Schwarz, EM ;
Sternberg, P ;
Gwinn, M .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D258-D261
[3]   Identifying biological themes within lists of genes with EASE [J].
Hosack, DA ;
Dennis, G ;
Sherman, BT ;
Lane, HC ;
Lempicki, RA .
GENOME BIOLOGY, 2003, 4 (10)
[4]   Exploration, normalization, and summaries of high density oligonucleotide array probe level data [J].
Irizarry, RA ;
Hobbs, B ;
Collin, F ;
Beazer-Barclay, YD ;
Antonellis, KJ ;
Scherf, U ;
Speed, TP .
BIOSTATISTICS, 2003, 4 (02) :249-264
[5]   Summaries of affymetrix GeneChip probe level data [J].
Irizarry, RA ;
Bolstad, BM ;
Collin, F ;
Cope, LM ;
Hobbs, B ;
Speed, TP .
NUCLEIC ACIDS RESEARCH, 2003, 31 (04) :e15
[6]   Kinetics of senescence-associated changes of gene expression in an epithelial, temperature-sensitive SV40 large T antigen model [J].
Larsson, O ;
Scheele, C ;
Liang, ZC ;
Moll, J ;
Karlsson, C ;
Wahlestedt, C .
CANCER RESEARCH, 2004, 64 (02) :482-489
[7]   Gene regulation and DNA damage in the ageing human brain [J].
Lu, T ;
Pan, Y ;
Kao, SY ;
Li, C ;
Kohane, I ;
Chan, J ;
Yankner, BA .
NATURE, 2004, 429 (6994) :883-891
[8]   Higher plant glycosyltransferases [J].
Ross, Joe ;
Li, Yi ;
Lim, Eng-Kiat ;
Bowles, Dianna J. .
GENOME BIOLOGY, 2001, 2 (02)
[9]   Human muscle gene expression responses to endurance training provide a novel perspective on Duchenne muscular dystrophy [J].
Timmons, JA ;
Larsson, O ;
Jansson, E ;
Fischer, H ;
Gustafsson, T ;
Greenhaff, PL ;
Ridden, P ;
Rachman, J ;
Peyrard-Janvid, M ;
Wahlestedt, C ;
Sundberg, CJ .
FASEB JOURNAL, 2005, 19 (07) :750-760
[10]   Significance analysis of microarrays applied to the ionizing radiation response [J].
Tusher, VG ;
Tibshirani, R ;
Chu, G .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (09) :5116-5121