Integrating query of relational and textual data in clinical databases: A case study

被引:21
作者
Fisk, JM
Mutalik, P
Levin, FW
Erdos, J
Taylor, C
Nadkarni, P
机构
[1] Yale Univ, Sch Med, Ctr Med Informat, New Haven, CT 06520 USA
[2] Vet Adm Med Ctr, Informat Technol Off, West Haven, CT 06516 USA
[3] Vet Adm Med Ctr, Dept Radiol, West Haven, CT 06516 USA
关键词
D O I
10.1197/jamia.M1133
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objectives: The authors designed and implemented a clinical data mart composed of an integrated information retrieval (IR) and relational database management system (RDBMS). Design: Using commodity software, which supports interactive, attribute-centric text and relational searches, the mart houses 2.8 million documents that span a five-year period and supports basic IR features such as Boolean searches, stemming, and proximity and fuzzy searching. Measurements: Results are relevance-ranked using either "total documents per patient" or "report type weighting." Results: Non-curated medical text has a significant degree of malformation with respect to spelling and punctuation, which creates difficulties for text indexing and searching. Presently, the IR facilities of RDBMS packages lack the features necessary to handle such malformed text adequately. Conclusion: A robust IR+RDBMS system can be developed, but it requires integrating RDBMSs with third-party IR software. RDBMS vendors need to make their IR offerings more accessible to non-programmers.
引用
收藏
页码:21 / 38
页数:18
相关论文
共 27 条
[1]  
BAEZAYATES RA, 1999, MODERN INFORMATION R
[2]  
BAIN T, 2002, SQL SERVER FULL TEXT
[3]  
Callan James, 1992, P 3 INT C DAT EXP SY, P347
[4]  
CODD EF, 1970, COMMUN ACM, V13, P377, DOI 10.1145/357980.358007
[5]  
*DEP VET AFF, 1994, DEC HOSP COMP SYST V
[6]  
DEVRIES A, 2000, 1 DEL WORKSH INF SEE
[7]  
*DYN INF SYST CORP, 2002, LLNL CHOOS OMN SPEED
[8]  
FRIEDL JEF, 1997, MASTERING REGULAR EX
[9]   SEARCHING FOR INFORMATION IN A HYPERTEXT MEDICAL HANDBOOK [J].
FRISSE, ME .
COMMUNICATIONS OF THE ACM, 1988, 31 (07) :880-886
[10]  
Garcia-Molina H., 2000, DATABASE SYSTEMS IMP