CoPub Mapper: mining MEDLINE based on search term co-publication

被引:55
作者
Alako, BTF
Veldhoven, A
van Baal, S
Jelier, R
Verhoeven, S
Rullmann, T
Polman, J
Jenster, G
机构
[1] Erasmus MC, Dept Urol, NL-3000 DR Rotterdam, Netherlands
[2] NV Organon, Dept Mol Design & Informat, NL-5340 BH Oss, Netherlands
[3] Erasmus MC, Dept Med Informat, Rotterdam, Netherlands
[4] Erasmus MC, Dept Genet, Rotterdam, Netherlands
关键词
D O I
10.1186/1471-2105-6-51
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: High throughput microarray analyses result in many differentially expressed genes that are potentially responsible for the biological process of interest. In order to identify biological similarities between genes, publications from MEDLINE were identified in which pairs of gene names and combinations of gene name with specific keywords were co-mentioned. Results: MEDLINE search strings for 15,621 known genes and 3,731 keywords were generated and validated. PubMed IDs were retrieved from MEDLINE and relative probability of co-occurrences of all gene-gene and gene-keyword pairs determined. To assess gene clustering according to literature co-publication, 150 genes consisting of 8 sets with known connections ( same pathway, same protein complex, or same cellular localization, etc.) were run through the program. Receiver operator characteristics (ROC) analyses showed that most gene sets were clustered much better than expected by random chance. To test grouping of genes from real microarray data, 221 differentially expressed genes from a microarray experiment were analyzed with CoPub Mapper, which resulted in several relevant clusters of genes with biological process and disease keywords. In addition, all genes versus keywords were hierarchical clustered to reveal a complete grouping of published genes based on co-occurrence. Conclusion: The CoPub Mapper program allows for quick and versatile querying of co-published genes and keywords and can be successfully used to cluster predefined groups of genes and microarray data.
引用
收藏
页数:15
相关论文
共 47 条
[21]   Text-based knowledge discovery: search and mining of life-sciences documents [J].
Mack, R ;
Hehenberger, M .
DRUG DISCOVERY TODAY, 2002, 7 (11) :S89-S98
[22]   Use of keyword hierarchies to interpret gene expression patterns [J].
Masys, DR ;
Welsh, JB ;
Fink, JL ;
Gribskov, M ;
Klacansky, I ;
Corbeil, J .
BIOINFORMATICS, 2001, 17 (04) :319-326
[23]   Biology's name game [J].
Pearson, H .
NATURE, 2001, 411 (6838) :631-632
[24]  
Pustejovsky James, 2001, MEDINFO, V10, P371
[25]   The computational analysis of scientific literature to define and recognize gene expression clusters [J].
Raychaudhuri, S ;
Chang, JT ;
Imam, F ;
Altman, RB .
NUCLEIC ACIDS RESEARCH, 2003, 31 (15) :4553-4560
[26]   Using text analysis to identify functionally coherent gene groups [J].
Raychaudhuri, S ;
Schütze, H ;
Altman, RB .
GENOME RESEARCH, 2002, 12 (10) :1582-1590
[27]   Mining the biomedical literature in the genomic era: An overview [J].
Shatkay, H ;
Feldman, R .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2003, 10 (06) :821-855
[28]   Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses [J].
Smalheiser, NR ;
Swanson, DR .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 1998, 57 (03) :149-153
[29]   Venn Mapping: clustering of heterologous microarray data based on the number of co-occurring differentially expressed genes [J].
Smid, M ;
Dorssers, LCJ ;
Jenster, G .
BIOINFORMATICS, 2003, 19 (16) :2065-2071
[30]   The epidemiology of polycystic ovary syndrome - Prevalence and associated disease risk [J].
Solomon, CG .
ENDOCRINOLOGY AND METABOLISM CLINICS OF NORTH AMERICA, 1999, 28 (02) :247-+