I/NI-calls for the exclusion of non-informative genes:: a highly effective filtering tool for microarray data

被引:86
作者
Talloen, Willem [1 ]
Clevert, Djork-Arne
Hochreiter, Sepp
Amaratunga, Dhammika
Bijnens, Luc
Kass, Stefan
Goehlmann, Hinrich W. H.
机构
[1] Janssen Pharmaceut NV, Johnson & Johnson Pharmaceut Res & Dev Div, Beerse, Belgium
[2] Johannes Kepler Univ Linz, Inst Bioinformat, A-4040 Linz, Austria
[3] Charite Univ Med Berlin, Dept Nephrol & Internal Intens Care, Berlin, Germany
[4] Johnson & Johnson Pharmaceut Res & Dev, Raritan, NJ USA
关键词
D O I
10.1093/bioinformatics/btm478
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: DNA microarray technology typically generates many measurements of which only a relatively small subset is informative for the interpretation of the experiment. To avoid false positive results, it is therefore critical to select the informative genes from the large noisy data before the actual analysis. Most currently available filtering techniques are supervised and therefore suffer from a potential risk of overfitting. The unsupervised filtering techniques, on the other hand, are either not very efficient or too stringent as they may mix up signal with noise. We propose to use the multiple probes measuring the same target mRNA as repeated measures to quantify the signal-to-noise ratio of that specific probe set. A Bayesian factor analysis with specifically chosen prior settings, which models this probe level information, is providing an objective feature filtering technique, named informative/non-informative calls (I/NI calls). Results: Based on 30 real-life data sets (including various human, rat, mice and Arabidopsis studies) and a spiked-in data set, it is shown that I/NI calls is highly effective, with exclusion rates ranging from 70% to 99%. Consequently, it offers a critical solution to the curse of high-dimensionality in the analysis of microarray data.
引用
收藏
页码:2897 / 2902
页数:6
相关论文
共 19 条
  • [1] Selection bias in gene extraction on the basis of microarray gene-expression data
    Ambroise, C
    McLachlan, GJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (10) : 6562 - 6566
  • [2] [Anonymous], GENOME BIOL
  • [3] [Anonymous], 1961, Adaptive Control Processes: a Guided Tour, DOI DOI 10.1515/9781400874668
  • [4] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [5] Multiple hypothesis testing in microarray experiments
    Dudoit, S
    Shaffer, JP
    Boldrick, JC
    [J]. STATISTICAL SCIENCE, 2003, 18 (01) : 71 - 103
  • [6] Transcriptomic analysis of the cardiac left ventricle in a rodent model of diabetic cardiomyopathy: molecular snapshot of a severe myocardial disease
    Glyn-Jones, Sarah
    Song, Sarah
    Black, Michael A.
    Phillips, Anthony R. J.
    Choong, Soon Y.
    Cooper, Garth J. S.
    [J]. PHYSIOLOGICAL GENOMICS, 2007, 28 (03) : 284 - 293
  • [7] Guyon I, 2003, J MACH LEARN RES, P1157, DOI [10.1016/j.aca.2011.07.027, DOI 10.1016/J.ACA.2011.07.027]
  • [8] Gene expression data preprocessing
    Herrero, J
    Díaz-Uriarte, R
    Dopazo, J
    [J]. BIOINFORMATICS, 2003, 19 (05) : 655 - 656
  • [9] A new summarization method for affymetrix probe level data
    Hochreiter, S
    Clevert, DA
    Obermayer, K
    [J]. BIOINFORMATICS, 2006, 22 (08) : 943 - 949
  • [10] Comparison of Affymetrix GeneChip expression measures
    Irizarry, RA
    Wu, ZJ
    Jaffee, HA
    [J]. BIOINFORMATICS, 2006, 22 (07) : 789 - 794