Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt

被引:2525
作者
Durinck, Steffen [1 ]
Spellman, Paul T. [1 ]
Birney, Ewan [2 ]
Huber, Wolfgang [2 ]
机构
[1] Lawrence Berkeley Natl Lab, Berkeley, CA USA
[2] European Bioinformat Inst, Cambridge, England
关键词
BIOCONDUCTOR; DATABASE;
D O I
10.1038/nprot.2009.97
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Genomic experiments produce multiple views of biological systems, among them are DNA sequence and copy number variation, and mRNA and protein abundance. Understanding these systems needs integrated bioinformatic analysis. Public databases such as Ensembl provide relationships and mappings between the relevant sets of probe and target molecules. However, the relationships can be biologically complex and the content of the databases is dynamic. We demonstrate how to use the computational environment R to integrate and jointly analyze experimental datasets, employing BioMart web services to provide the molecule mappings. We also discuss typical problems that are encountered in making gene-to-transcript-to-protein mappings. The approach provides a flexible, programmable and reproducible basis for state-of-the-art bioinformatic data integration.
引用
收藏
页码:1184 / 1191
页数:8
相关论文
共 17 条
[1]   Analysis of cell-based RNAi screens [J].
Boutros, Michael ;
Bras, Ligia P. ;
Huber, Wolfgang .
GENOME BIOLOGY, 2006, 7 (07)
[2]   The HGNC Database in 2008: a resource for the human genome [J].
Bruford, Elspeth A. ;
Lush, Michael J. ;
Wright, Mathew W. ;
Sneddon, Tam P. ;
Povey, Sue ;
Birney, Ewan .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D445-D448
[3]   BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis [J].
Durinck, S ;
Moreau, Y ;
Kasprzyk, A ;
Davis, S ;
De Moor, B ;
Brazma, A ;
Huber, W .
BIOINFORMATICS, 2005, 21 (16) :3439-3440
[4]  
DURINCK S, 2006, NEWSLETTER R PROJECT, P40
[5]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[6]  
Hahne F, 2008, USE R, P1, DOI 10.1007/978-0-387-77240-0_1
[7]   Ensembl 2009 [J].
Hubbard, T. J. P. ;
Aken, B. L. ;
Ayling, S. ;
Ballester, B. ;
Beal, K. ;
Bragin, E. ;
Brent, S. ;
Chen, Y. ;
Clapham, P. ;
Clarke, L. ;
Coates, G. ;
Fairley, S. ;
Fitzgerald, S. ;
Fernandez-Banet, J. ;
Gordon, L. ;
Graf, S. ;
Haider, S. ;
Hammond, M. ;
Holland, R. ;
Howe, K. ;
Jenkinson, A. ;
Johnson, N. ;
Kahari, A. ;
Keefe, D. ;
Keenan, S. ;
Kinsella, R. ;
Kokocinski, F. ;
Kulesha, E. ;
Lawson, D. ;
Longden, I. ;
Megy, K. ;
Meidl, P. ;
Overduin, B. ;
Parker, A. ;
Pritchard, B. ;
Rios, D. ;
Schuster, M. ;
Slater, G. ;
Smedley, D. ;
Spooner, W. ;
Spudich, G. ;
Trevanion, S. ;
Vilella, A. ;
Vogel, J. ;
White, S. ;
Wilder, S. ;
Zadissa, A. ;
Birney, E. ;
Cunningham, F. ;
Curwen, V. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D690-D697
[8]   Exploration, normalization, and summaries of high density oligonucleotide array probe level data [J].
Irizarry, RA ;
Hobbs, B ;
Collin, F ;
Beazer-Barclay, YD ;
Antonellis, KJ ;
Scherf, U ;
Speed, TP .
BIOSTATISTICS, 2003, 4 (02) :249-264
[9]   EnsMart: A generic system for fast and flexible access to biological data [J].
Kasprzyk, A ;
Keefe, D ;
Smedley, D ;
London, D ;
Spooner, W ;
Melsopp, C ;
Hammond, M ;
Rocca-Serra, P ;
Cox, T ;
Birney, E .
GENOME RESEARCH, 2004, 14 (01) :160-169
[10]   Reactome knowledgebase of human biological pathways and processes [J].
Matthews, Lisa ;
Gopinath, Gopal ;
Gillespie, Marc ;
Caudy, Michael ;
Croft, David ;
de Bono, Bernard ;
Garapati, Phani ;
Hemish, Jill ;
Hermjakob, Henning ;
Jassal, Bijay ;
Kanapin, Alex ;
Lewis, Suzanna ;
Mahajan, Shahana ;
May, Bruce ;
Schmidt, Esther ;
Vastrik, Imre ;
Wu, Guanming ;
Birney, Ewan ;
Stein, Lincoln ;
D'Eustachio, Peter .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D619-D622