Extending the mutual information measure to rank inferred literature relationships

被引:73
作者
Wren, JD [1 ]
机构
[1] Univ Oklahoma, Dept Bot & Microbiol, Adv Ctr Genome Technol, Norman, OK 73019 USA
关键词
D O I
10.1186/1471-2105-5-145
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Within the peer-reviewed literature, associations between two things are not always recognized until commonalities between them become apparent. These commonalities can provide justification for the inference of a new relationship where none was previously known, and are the basis of most observation-based hypothesis formation. It has been shown that the crux of the problem is not finding inferable associations, which are extraordinarily abundant given the scale-free networks that arise from literature-based associations, but determining which ones are informative. The Mutual Information Measure (MIM) is a well-established method to measure how informative an association is, but is limited to direct (i.e. observable) associations. Results: Herein, we attempt to extend the calculation of mutual information to indirect (i.e. inferable) associations by using the MIM of shared associations. Objects of general research interest (e.g. genes, diseases, phenotypes, drugs, ontology categories) found within MEDLINE are used to create a network of associations for evaluation. Conclusions: Mutual information calculations can be effectively extended into implied relationships and a significance cutoff estimated from analysis of random word networks. Of the models tested, the shared minimum MIM (MMIM) model is found to correlate best with the observed strength and frequency of known associations. Using three test cases, the MMIM method tends to rank more specific relationships higher than counting the number of shared relationships within a network.
引用
收藏
页数:13
相关论文
共 31 条
[11]   Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders [J].
Hamosh, A ;
Scott, AF ;
Amberger, J ;
Bocchini, C ;
Valle, D ;
McKusick, VA .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :52-55
[12]  
Hristovski D, 2001, STUD HEALTH TECHNOL, V84, P1344
[13]   A literature network of human genes for high-throughput analysis of gene expression [J].
Jenssen, TK ;
Lægreid, A ;
Komorowski, J ;
Hovig, E .
NATURE GENETICS, 2001, 28 (01) :21-+
[14]   Initial sequencing and analysis of the human genome [J].
Lander, ES ;
Int Human Genome Sequencing Consortium ;
Linton, LM ;
Birren, B ;
Nusbaum, C ;
Zody, MC ;
Baldwin, J ;
Devon, K ;
Dewar, K ;
Doyle, M ;
FitzHugh, W ;
Funke, R ;
Gage, D ;
Harris, K ;
Heaford, A ;
Howland, J ;
Kann, L ;
Lehoczky, J ;
LeVine, R ;
McEwan, P ;
McKernan, K ;
Meldrim, J ;
Mesirov, JP ;
Miranda, C ;
Morris, W ;
Naylor, J ;
Raymond, C ;
Rosetti, M ;
Santos, R ;
Sheridan, A ;
Sougnez, C ;
Stange-Thomann, N ;
Stojanovic, N ;
Subramanian, A ;
Wyman, D ;
Rogers, J ;
Sulston, J ;
Ainscough, R ;
Beck, S ;
Bentley, D ;
Burton, J ;
Clee, C ;
Carter, N ;
Coulson, A ;
Deadman, R ;
Deloukas, P ;
Dunham, A ;
Dunham, I ;
Durbin, R ;
French, L .
NATURE, 2001, 409 (6822) :860-921
[15]  
Li J. Q., 2002, Autonomic & Autacoid Pharmacology, V22, P57, DOI 10.1046/j.1474-8673.2002.00242.x
[16]  
LINDBERG S, 1986, EUR J RESPIR DIS, V68, P96
[17]   UNDERSTANDING AND USING THE MEDICAL SUBJECT-HEADINGS (MESH) VOCABULARY TO PERFORM LITERATURE SEARCHES [J].
LOWE, HJ ;
BARNETT, GO .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1994, 271 (14) :1103-1108
[18]  
Pratt W., 2003, P 2 INT C KNOWLEDGE, P105, DOI DOI 10.1145/945645.945662
[19]   RefSeq and LocusLink: NCBI gene-centered resources [J].
Pruitt, KD ;
Maglott, DR .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :137-140
[20]  
Rindflesch T C, 2000, Pac Symp Biocomput, P517