Extracting unrecognized gene relationships from the biomedical literature via matrix factorizations

被引:9
作者
Kim, Hyunsoo [1 ]
Park, Haesun [1 ]
Drake, Barry L. [1 ]
机构
[1] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
关键词
D O I
10.1186/1471-2105-8-S9-S6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The construction of literature-based networks of gene-gene interactions is one of the most important applications of text mining in bioinformatics. Extracting potential gene relationships from the biomedical literature may be helpful in building biological hypotheses that can be explored further experimentally. Recently, latent semantic indexing based on the singular value decomposition (LSI/SVD) has been applied to gene retrieval. However, the determination of the number of factors k used in the reduced rank matrix is still an open problem. Results: In this paper, we introduce a way to incorporate a priori knowledge of gene relationships into LSI/SVD to determine the number of factors. We also explore the utility of the non-negative matrix factorization (NMF) to extract unrecognized gene relationships from the biomedical literature by taking advantage of known gene relationships. A gene retrieval method based on NMF (GR/NMF) showed comparable performance with LSI/SVD. Conclusion: Using known gene relationships of a given gene, we can determine the number of factors used in the reduced rank matrix and retrieve unrecognized genes related with the given gene by LSI/SVD or GR/NMF.
引用
收藏
页数:11
相关论文
共 26 条
[1]  
[Anonymous], IMAGE
[2]   Fyn tyrosine kinase is a critical regulator of disabled-1 during brain development [J].
Arnaud, L ;
Ballif, BA ;
Förster, E ;
Cooper, JA .
CURRENT BIOLOGY, 2003, 13 (01) :9-17
[3]   Using linear algebra for intelligent information retrieval [J].
Berry, MW ;
Dumais, ST ;
OBrien, GW .
SIAM REVIEW, 1995, 37 (04) :573-595
[4]   Matrices, vector spaces, and information retrieval [J].
Berry, MW ;
Drmac, Z ;
Jessup, ER .
SIAM REVIEW, 1999, 41 (02) :335-362
[5]   Reelin activates src family tyrosine kinases in neurons [J].
Bock, HH ;
Herz, J .
CURRENT BIOLOGY, 2003, 13 (01) :18-26
[6]  
Bro R, 1997, J CHEMOMETR, V11, P393, DOI 10.1002/(SICI)1099-128X(199709/10)11:5<393::AID-CEM483>3.3.CO
[7]  
2-C
[8]   Metagenes and molecular pattern discovery using matrix factorization [J].
Brunet, JP ;
Tamayo, P ;
Golub, TR ;
Mesirov, JP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (12) :4164-4169
[9]   Gene clustering by Latent Semantic Indexing of MEDLINE abstracts [J].
Homayouni, R ;
Heinrich, K ;
Wei, L ;
Berry, MW .
BIOINFORMATICS, 2005, 21 (01) :104-115
[10]  
Hoyer PO, 2004, J MACH LEARN RES, V5, P1457