Selection of long oligonucleotides for gene expression microarrays using weighted rank-sum strategy

被引:45
作者
Hu, Guangan
Llinas, Manuel
Li, Jingguang
Preiser, Peter Rainer
Bozdech, Zbynek
机构
[1] Nanyang Technol Univ, Sch Biol Sci, Singapore 637551, Singapore
[2] Princeton Univ, Lewis Sigler Inst Integrat Genom, Dept Mol Biol, Carl Icahn Lab, Princeton, NJ 08544 USA
[3] Tan Tock Seng Hosp, Dept Pathol & Lab Med, Singapore 308433, Singapore
关键词
D O I
10.1186/1471-2105-8-350
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The design of long oligonucleotides for spotted DNA microarrays requires detailed attention to ensure their optimal performance in the hybridization process. The main challenge is to select an optimal oligonucleotide element that represents each genetic locus/gene in the genome and is unique, devoid of internal structures and repetitive sequences and its Tm is uniform with all other elements on the microarray. Currently, all of the publicly available programs for DNA long oligonucleotide microarray selection utilize various combinations of cutoffs in which each parameter (uniqueness, Tm, and secondary structure) is evaluated and filtered individually. The use of the cutoffs can, however, lead to information loss and to selection of suboptimal oligonucleotides, especially for genomes with extreme distribution of the GC content, a large proportion of repetitive sequences or the presence of large gene families with highly homologous members. Results: Here we present the program OligoRankPick which is using a weighted rank-based strategy to select microarray oligonucleotide elements via an integer weighted linear function. This approach optimizes the selection criteria ( weight score) for each gene individually, accommodating variable properties of the DNA sequence along the genome. The designed algorithm was tested using three microbial genomes Escherichia coli, Saccharomyces cerevisiae and the human malaria parasite species Plasmodium falciparum. In comparison to other published algorithms OligoRankPick provides significant improvements in oligonucleotide design for all three genomes with the most significant improvements observed in the microarray design for P. falciparum whose genome is characterized by large fluctuations of GC content, and abundant gene duplications. Conclusion: OligoRankPick is an efficient tool for the design of long oligonucleotide DNA microarrays which does not rely on direct oligonucleotide exclusion by parameter cutoffs but instead optimizes all parameters in context of each other. The weighted rank-sum strategy utilized by this algorithm provides high flexibility of oligonucleotide selection which accommodates extreme variability of DNA sequence properties along genomes of many organisms.
引用
收藏
页数:13
相关论文
共 35 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Core transcriptional regulatory circuitry in human embryonic stem cells [J].
Boyer, LA ;
Lee, TI ;
Cole, MF ;
Johnstone, SE ;
Levine, SS ;
Zucker, JR ;
Guenther, MG ;
Kumar, RM ;
Murray, HL ;
Jenner, RG ;
Gifford, DK ;
Melton, DA ;
Jaenisch, R ;
Young, RA .
CELL, 2005, 122 (06) :947-956
[3]   The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum [J].
Bozdech, Z ;
Llinás, M ;
Pulliam, BL ;
Wong, ED ;
Zhu, JC ;
DeRisi, JL .
PLOS BIOLOGY, 2003, 1 (01) :85-100
[4]   Expression profiling of the schizont and trophozoite stages of Plasmodium falciparum with a long-oligonucleotide microarray -: art. no. R9 [J].
Bozdech, Z ;
Zhu, JC ;
Joachimiak, MP ;
Cohen, FE ;
Pulliam, B ;
DeRisi, JL .
GENOME BIOLOGY, 2003, 4 (02)
[5]   Exploring the new world of the genome with DNA microarrays [J].
Brown, PO ;
Botstein, D .
NATURE GENETICS, 1999, 21 (Suppl 1) :33-37
[6]   Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii [J].
Carlton, JM ;
Angiuoli, SV ;
Suh, BB ;
Kooij, TW ;
Pertea, M ;
Silva, JC ;
Ermolaeva, MD ;
Allen, JE ;
Selengut, JD ;
Koo, HL ;
Peterson, JD ;
Pop, M ;
Kosack, DS ;
Shumway, MF ;
Bidwell, SL ;
Shallom, SJ ;
van Aken, SE ;
Riedmuller, SB ;
Feldblyum, TV ;
Cho, JK ;
Quackenbush, J ;
Sedegah, M ;
Shoaibi, A ;
Cummings, LM ;
Florens, L ;
Yates, JR ;
Raine, JD ;
Sinden, RE ;
Harris, MA ;
Cunningham, DA ;
Preiser, PR ;
Bergman, LW ;
Vaidya, AB ;
Van Lin, LH ;
Janse, CJ ;
Waters, AP ;
Smith, HO ;
White, OR ;
Salzberg, SL ;
Venter, JC ;
Fraser, CM ;
Hoffman, SL ;
Gardner, MJ ;
Carucci, DJ .
NATURE, 2002, 419 (6906) :512-519
[7]   Transcript copy number estimation using a mouse whole-genome oligonucleotide microarray [J].
Carter, MG ;
Sharov, AA ;
VanBuren, V ;
Dudekula, DB ;
Carmack, CE ;
Nelson, C ;
Ko, MSH .
GENOME BIOLOGY, 2005, 6 (07)
[8]   Optimization of probe length and the number of probes per gene for optimal microarray analysis of gene expression [J].
Chou, CC ;
Chen, CH ;
Lee, TT ;
Peck, K .
NUCLEIC ACIDS RESEARCH, 2004, 32 (12) :e99
[9]   Exploring the metabolic and genetic control of gene expression on a genomic scale [J].
DeRisi, JL ;
Iyer, VR ;
Brown, PO .
SCIENCE, 1997, 278 (5338) :680-686
[10]   Genome sequence of the human malaria parasite Plasmodium falciparum [J].
Gardner, MJ ;
Hall, N ;
Fung, E ;
White, O ;
Berriman, M ;
Hyman, RW ;
Carlton, JM ;
Pain, A ;
Nelson, KE ;
Bowman, S ;
Paulsen, IT ;
James, K ;
Eisen, JA ;
Rutherford, K ;
Salzberg, SL ;
Craig, A ;
Kyes, S ;
Chan, MS ;
Nene, V ;
Shallom, SJ ;
Suh, B ;
Peterson, J ;
Angiuoli, S ;
Pertea, M ;
Allen, J ;
Selengut, J ;
Haft, D ;
Mather, MW ;
Vaidya, AB ;
Martin, DMA ;
Fairlamb, AH ;
Fraunholz, MJ ;
Roos, DS ;
Ralph, SA ;
McFadden, GI ;
Cummings, LM ;
Subramanian, GM ;
Mungall, C ;
Venter, JC ;
Carucci, DJ ;
Hoffman, SL ;
Newbold, C ;
Davis, RW ;
Fraser, CM ;
Barrell, B .
NATURE, 2002, 419 (6906) :498-511