Text mining neuroscience journal articles to populate neuroscience databases

被引:12
作者
Crasto, CJ [1 ]
Marenco, LN
Migliore, M
Mao, BQ
Nadkarni, PM
Miller, P
Shepherd, GM
机构
[1] Yale Univ, Ctr Med Informat, New Haven, CT 06520 USA
[2] Yale Univ, Dept Neurobiol, New Haven, CT 06520 USA
[3] Yale Univ, Dept Anesthesiol, New Haven, CT 06520 USA
[4] Yale Univ, Dept Mol Cellular & Dev Biol, New Haven, CT 06520 USA
[5] CNR, Inst Biophys, Palermo, Italy
关键词
text mining; natural language processing; neuroscience; databases; supervised and unsupervised learning;
D O I
10.1385/NI:1:3:215
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We have developed a program NeuroText to populate the neuroscience databases in SenseLab (http://senselab.med.yale.edu/senselab) by mining the natural language text of neuroscience articles. NeuroText uses a two-step approach to identify relevant articles. The first step (pre-processing), aimed at 100% sensitivity, identifies abstracts containing database keywords. In the second step, potentially relevant abstracts identified in the first step are processed for specificity dictated by database architecture, and neuroscience, lexical and semantic contexts. NeuroText results were presented to the experts for validation using a dynamically generated interface that also allows expert-validated articles to be automatically deposited into the databases. Of the test set of 912 articles, 735 were rejected at the pre-processing step. For the remaining articles, the accuracy of predicting database-relevant articles was 85%. Twenty-two articles were erroneously identified. NeuroText deferred decisions on 29 articles to the expert. A comparison of NeuroText results versus the experts' analyses revealed that the program failed to correctly identify articles' relevance due to concepts that did not yet exist in the knowledgebase or due to vaguely presented information in the abstracts. NeuroText uses two "evolution" techniques (supervised and unsupervised) that play an important role in the continual improvement of the retrieval results. Software that uses the NeuroText approach can facilitate the creation of curated, special-interest, bibliography databases.
引用
收藏
页码:215 / 237
页数:23
相关论文
共 37 条
  • [1] Agresti A., 1990, CATEGORICAL DATA ANA, P59
  • [2] Aronson AR, 2001, J AM MED INFORM ASSN, P17
  • [3] BAEZAYATES R, 1999, MODERN INFORMATION R, P99
  • [4] PURIFICATION OF A NEW NEUROTROPHIC FACTOR FROM MAMMALIAN BRAIN
    BARDE, YA
    EDGAR, D
    THOENEN, H
    [J]. EMBO JOURNAL, 1982, 1 (05) : 549 - 553
  • [5] Cantrell AR, 1997, J NEUROSCI, V17, P7330
  • [6] Capogna M, 1997, J NEUROSCI, V17, P7190
  • [7] Membrane and synaptic properties of mitral cells in slices of rat olfactory bulb
    Chen, WR
    Shepherd, GM
    [J]. BRAIN RESEARCH, 1997, 745 (1-2) : 189 - 196
  • [8] NTDB: Thermodynamic Database for Nucleic Acids
    Chiu, WLAK
    Sze, CN
    Ip, LN
    Chan, SK
    Au-Yeung, SCF
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 230 - 233
  • [9] HIGH AGREEMENT BUT LOW KAPPA .2. RESOLVING THE PARADOXES
    CICCHETTI, DV
    FEINSTEIN, AR
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 1990, 43 (06) : 551 - 558
  • [10] A LIGHT AND ELECTRON-MICROSCOPIC ANALYSIS OF THE MOSSY FIBERS OF THE RAT DENTATE GYRUS
    CLAIBORNE, BJ
    AMARAL, DG
    COWAN, WM
    [J]. JOURNAL OF COMPARATIVE NEUROLOGY, 1986, 246 (04) : 435 - 458