nuID: a universal naming scheme of oligonucleotides for Illumina, Affymetrix, and other microarrays

被引:64
作者
Du, Pan [1 ]
Kibbe, Warren A. [1 ]
Lin, Simon M. [1 ]
机构
[1] Northwestern Univ, Robert H Lurie Comprehens Canc Ctr, Chicago, IL 60611 USA
关键词
D O I
10.1186/1745-6150-2-16
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Oligonucleotide probes that are sequence identical may have different identifiers between manufacturers and even between different versions of the same company's microarray; and sometimes the same identifier is reused and represents a completely different oligonucleotide, resulting in ambiguity and potentially mis-identification of the genes hybridizing to that probe. Results: We have devised a unique, non-degenerate encoding scheme that can be used as a universal representation to identify an oligonucleotide across manufacturers. We have named the encoded representation 'nuID', for nucleotide universal identifier. Inspired by the fact that the raw sequence of the oligonucleotide is the true definition of identity for a probe, the encoding algorithm uniquely and non-degenerately transforms the sequence itself into a compact identifier (a lossless compression). In addition, we added a redundancy check (checksum) to validate the integrity of the identifier. These two steps, encoding plus checksum, result in an nulD, which is a unique, nondegenerate, permanent, robust and efficient representation of the probe sequence. For commercial applications that require the sequence identity to be confidential, we have an encryption schema for nulD. We demonstrate the utility of nulDs for the annotation of Illumina microarrays, and we believe it has universal applicability as a source-independent naming convention for oligomers. Reviewers: This article was reviewed by Itai Yanai, Rong Chen (nominated by Mark Gerstein), and Gregory Schuler (nominated by David Lipman).
引用
收藏
页数:7
相关论文
共 11 条
[1]   Globally distributed object identification for biological knowledgebases [J].
Clark, T ;
Martin, S ;
Liefeld, T .
BRIEFINGS IN BIOINFORMATICS, 2004, 5 (01) :59-70
[2]   A statistical method for predicting splice variants between two groups of samples using GeneChip® expression array data [J].
Fan, Wenhong ;
Khalid, Najma ;
Hallahan, Andrew R. ;
Olson, James M. ;
Zhao, Lue Ping .
THEORETICAL BIOLOGY AND MEDICAL MODELLING, 2006, 3
[3]  
Kawasaki Ernest S, 2006, J Biomol Tech, V17, P200
[4]   Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression [J].
Li, Shuyu ;
Li, Yiqun Helen ;
Wei, Tao ;
Su, Eric Wen ;
Duffin, Kevin ;
Liao, Birong .
BIOLOGY DIRECT, 2006, 1 (1)
[5]   Probe-level measurement error improves accuracy in detecting differential gene expression [J].
Liu, Xuejun ;
Milo, Marta ;
Lawrence, Neil D. ;
Rattray, Magnus .
BIOINFORMATICS, 2006, 22 (17) :2107-2113
[6]   Entrez Gene: gene-centered information at NCBI [J].
Maglott, D ;
Ostell, J ;
Pruitt, KD ;
Tatusova, T .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D54-D58
[7]   Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements [J].
Mecham, BH ;
Klus, GT ;
Strovel, J ;
Augustus, M ;
Byrne, D ;
Bozso, P ;
Wetmore, DZ ;
Mariani, TJ ;
Kohane, IS ;
Szallasi, Z .
NUCLEIC ACIDS RESEARCH, 2004, 32 (09) :e74
[8]   A model-based background adjustment for oligonucleotide expression arrays [J].
Wu, ZJ ;
Irizarry, RA ;
Gentleman, R ;
Martinez-Murillo, F ;
Spencer, F .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2004, 99 (468) :909-917
[9]   Probe Selection and Expression Index Computation of Affymetrix Exon Arrays [J].
Xing, Yi ;
Kapur, Karen ;
Wong, Wing Hung .
PLOS ONE, 2006, 1 (01)
[10]   Mistaken identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics [J].
Zeeberg, BR ;
Riss, J ;
Kane, DW ;
Bussey, KJ ;
Uchio, E ;
Linehan, WM ;
Barrett, JC ;
Weinstein, JN .
BMC BIOINFORMATICS, 2004, 5 (1)