Mistaken identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics

被引:58
作者
Zeeberg, BR
Riss, J
Kane, DW
Bussey, KJ
Uchio, E
Linehan, WM
Barrett, JC
Weinstein, JN
机构
[1] NCI, Genom & Bioinformat Grp, Mol Pharmacol Lab, Ctr Canc Res,NIH, Bethesda, MD 20892 USA
[2] CCR, Lab Biosyst & Canc, Bethesda, MD 20892 USA
[3] SRA Int, Fairfax, VA 22033 USA
[4] NIH, Urol Oncol Branch, Bethesda, MD 20892 USA
关键词
Format Conversion; Default Date; Detective Work; Conversion Problem; Downstream Data Analysis;
D O I
10.1186/1471-2105-5-80
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: When processing microarray data sets, we recently noticed that some gene names were being changed inadvertently to non-gene names. Results: A little detective work traced the problem to default date format conversions and floating-point format conversions in the very useful Excel program package. The date conversions affect at least 30 gene names; the floating-point conversions affect at least 2,000 if Riken identifiers are included. These conversions are irreversible; the original gene names cannot be recovered. Conclusions: Users of Excel for analyses involving gene names should be aware of this problem, which can cause genes, including medically important ones, to be lost from view and which has contaminated even carefully curated public databases. We provide work-arounds and scripts for circumventing the problem.
引用
收藏
页数:6
相关论文
共 5 条
[1]   MatchMiner: a tool for batch navigation among gene and gene product identifiers [J].
Bussey, KJ ;
Kane, D ;
Sunshine, M ;
Narasimhan, S ;
Nishizuka, S ;
Reinhold, WC ;
Zeeberg, B ;
Ajay ;
Weinstein, JN .
GENOME BIOLOGY, 2003, 4 (04)
[2]   Computer-based methods for the mouse full-length cDNA encyclopedia: Real-time sequence clustering for construction of a nonredundant cDNA library [J].
Konno, H ;
Fukunishi, Y ;
Shibata, K ;
Itoh, M ;
Carninci, P ;
Sugahara, Y ;
Hayashizaki, Y .
GENOME RESEARCH, 2001, 11 (02) :281-289
[3]   Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs [J].
Okazaki, Y ;
Furuno, M ;
Kasukawa, T ;
Adachi, J ;
Bono, H ;
Kondo, S ;
Nikaido, I ;
Osato, N ;
Saito, R ;
Suzuki, H ;
Yamanaka, I ;
Kiyosawa, H ;
Yagi, K ;
Tomaru, Y ;
Hasegawa, Y ;
Nogami, A ;
Schönbach, C ;
Gojobori, T ;
Baldarelli, R ;
Hill, DP ;
Bult, C ;
Hume, DA ;
Quackenbush, J ;
Schriml, LM ;
Kanapin, A ;
Matsuda, H ;
Batalov, S ;
Beisel, KW ;
Blake, JA ;
Bradt, D ;
Brusic, V ;
Chothia, C ;
Corbani, LE ;
Cousins, S ;
Dalla, E ;
Dragani, TA ;
Fletcher, CF ;
Forrest, A ;
Frazer, KS ;
Gaasterland, T ;
Gariboldi, M ;
Gissi, C ;
Godzik, A ;
Gough, J ;
Grimmond, S ;
Gustincich, S ;
Hirokawa, N ;
Jackson, IJ ;
Jarvis, ED ;
Kanai, A .
NATURE, 2002, 420 (6915) :563-573
[4]   Inhibition of PPARγ2 gene expression by the HIF-1-regulated gene DEC1/Stra13:: A mechanism for regulation of adipogenesis by hypoxia [J].
Yun, Z ;
Maecker, HL ;
Johnson, RS ;
Giaccia, AJ .
DEVELOPMENTAL CELL, 2002, 2 (03) :331-341
[5]   GoMiner: a resource for biological interpretation of genomic and proteomic data [J].
Zeeberg, BR ;
Feng, WM ;
Wang, G ;
Wang, MD ;
Fojo, AT ;
Sunshine, M ;
Narasimhan, S ;
Kane, DW ;
Reinhold, WC ;
Lababidi, S ;
Bussey, KJ ;
Riss, J ;
Barrett, JC ;
Weinstein, JN .
GENOME BIOLOGY, 2003, 4 (04)