Using citation data to improve retrieval from MEDLINE

被引:40
作者
Bernstam, EV
Herskovic, JR
Aphinyanaphongs, Y
Aliferis, CF
Sriram, MG
Hersh, WR
机构
[1] Univ Texas, Hlth Sci Ctr, Sch Hlth Informat Sci, Houston, TX 77030 USA
[2] Vanderbilt Univ, Dept Biomed Informat, Nashville, TN USA
[3] Oregon Hlth & Sci Univ, Dept Med Informat & Clin Epidemiol, Portland, OR USA
关键词
D O I
10.1197/jamia.M1909
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
Objective: To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant. Design and Measurements: A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, journal impact factor, PageRank, and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology. Results: Citation-based algorithms were more effective than noncitation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best noncitation-based algorithm (p < 0.001). The authors saw similar differences between citation-based and noncitation-based algorithms at 10, 20, 50, 200, 500, and 1,000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than noncitation-based algorithms. Conclusion: Algorithms that have proved successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs.
引用
收藏
页码:96 / 105
页数:10
相关论文
共 34 条
[1]
Text categorization models for high-quality article retrieval in internal medicine [J].
Aphinyanaphongs, Y ;
Tsamardinos, I ;
Statnikov, A ;
Hardin, D ;
Aliferis, CF .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2005, 12 (02) :207-216
[2]
APHINYANAPHONGS Y, 2004, MED SAN FRANC CA
[3]
Identifying diagnostic studies in MEDLINE: Reducing the number needed to read [J].
Bachmann, LM ;
Coray, R ;
Estermann, P ;
ter Riet, G .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2002, 9 (06) :653-658
[4]
Baeza-Yates R.A., 1999, Modern Information Retrieval
[5]
Borodin A., 2005, ACM Transactions on Internet Technology, V5, P231, DOI 10.1145/1052934.1052942
[6]
BRIN S, 1998, WWW7 COMPUTER NETWOR, V30, P107
[7]
Garfield E, 1999, CAN MED ASSOC J, V161, P979
[8]
GARFIELD E, 1977, CITATION INDEXING AU, V1
[9]
Haynes R Brian, 2005, ACP J Club, V142, pA8
[10]
DEVELOPING OPTIMAL SEARCH STRATEGIES FOR DETECTING CLINICALLY SOUND STUDIES IN MEDLINE [J].
HAYNES, RB ;
WILCZYNSKI, N ;
MCKIBBON, KA ;
WALKER, CJ ;
SINCLAIR, JC .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, 1 (06) :447-458