Ontology-driven indexing of public datasets for translational bioinformatics

被引:48
作者
Shah, Nigam H. [1 ]
Jonquet, Clement [1 ]
Chiang, Annie P. [1 ]
Butte, Atul J. [1 ]
Chen, Rong [1 ]
Musen, Mark A. [1 ]
机构
[1] Stanford Univ, Sch Med, Ctr Biomed Informat, Stanford, CA 94305 USA
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
TISSUE MICROARRAY; TEXT;
D O I
10.1186/1471-2105-10-S2-S1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The volume of publicly available genomic scale data is increasing. Genomic datasets in public repositories are annotated with free-text fields describing the pathological state of the studied sample. These annotations are not mapped to concepts in any ontology, making it difficult to integrate these datasets across repositories. We have previously developed methods to map text-annotations of tissue microarrays to concepts in the NCI thesaurus and SNOMED-CT. In this work we generalize our methods to map text annotations of gene expression datasets to concepts in the UMLS. We demonstrate the utility of our methods by processing annotations of datasets in the Gene Expression Omnibus. We demonstrate that we enable ontology-based querying and integration of tissue and gene expression microarray data. We enable identification of datasets on specific diseases across both repositories. Our approach provides the basis for ontology-driven data integration for translational research on gene and protein expression data. Based on this work we have built a prototype system for ontology based annotation and indexing of biomedical data. The system processes the text metadata of diverse resource elements such as gene expression data sets, descriptions of radiology images, clinical-trial reports, and PubMed article abstracts to annotate and index them with concepts from appropriate ontologies. The key functionality of this system is to enable users to locate biomedical data resources related to particular ontology concepts.
引用
收藏
页数:10
相关论文
共 16 条
[1]  
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[2]   Integration of genomic technologies for accelerated cancer drug development [J].
Basik, M ;
Mousses, S ;
Trent, J .
BIOTECHNIQUES, 2003, 35 (03) :580-+
[3]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270
[4]   Creation and implications of a phenome-genome network [J].
Butte, AJ ;
Kohane, IS .
NATURE BIOTECHNOLOGY, 2006, 24 (01) :55-62
[5]  
Butte Atul J, 2006, AMIA Annu Symp Proc, P106
[6]  
Dai M., 2008, AMIA Summit on Translational Bioinformatics, V21
[7]  
Hersh W, 1995, Proc Annu Symp Comput Appl Med Care, P858
[8]   Software tools for high-throughput analysis and archiving of immunohistochemistry staining data obtained with tissue microarrays [J].
Liu, CL ;
Prapong, W ;
Natkunam, Y ;
Alizadeh, A ;
Montgomery, K ;
Gilks, CB ;
van de Rijn, M .
AMERICAN JOURNAL OF PATHOLOGY, 2002, 161 (05) :1557-1565
[9]   The Stanford Tissue Microarray Database [J].
Marinelli, Robert J. ;
Montgomery, Kelli ;
Liu, Chih Long ;
Shah, Nigam H. ;
Prapong, Wijan ;
Nitzberg, Michael ;
Zachariah, Zachariah K. ;
Sherlock, Gavin J. ;
Natkunam, Yasodha ;
West, Robert B. ;
van de Rijn, Matt ;
Brown, Patrick O. ;
Ball, Catherine A. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :D871-D877
[10]   A comparative evaluation of full-text, concept-based, and context-sensitive search [J].
Moskovitch, Robert ;
Martins, Susana B. ;
Behiri, Eytan ;
Weiss, Aviram ;
Shahar, Yuval .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2007, 14 (02) :164-174