Extracting discourse elements and annotating scientific documents using the SciAnnotDoc model: a use case in gender documents

被引:6
作者
de Ribaupierre, Helene [1 ,3 ]
Falquet, Gilles [2 ,4 ]
机构
[1] Cardiff Univ, Sch Comp Sci & Informat, 0000 0001 0807 5670, Cardiff grid56003, Wales
[2] Univ Geneva, 0000 0001 2322 4988, CUI, CH-grid8591 Geneva, Switzerland
[3] Cardiff Univ, Sch Comp Sci & Informat, Cardiff, S Glam, Wales
[4] Univ Geneva, CUI, Geneva, Switzerland
关键词
SciAnnotDoc model; Information retrieval; Knowledge management; Ontologies; Semantic publishing;
D O I
10.1007/s00799-017-0227-5
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
When scientists are searching for information, they generally have a precise objective in mind. Instead of looking for documents "about a topic T", they try to answer specific questions such as finding the definition of a concept, finding results for a particular problem, checking whether an idea has already been tested, or comparing the scientific conclusions of two articles. Answering these precise or complex queries on a corpus of scientific documents requires precise modelling of the full content of the documents. In particular, each document element must be characterised by its discourse type (hypothesis, definition, result, method, etc.). In this paper, we present a scientific document model (SciAnnotDoc ontology), developed from an empirical study conducted with scientists, that models the discourse types. We developed an automated process that analyses documents effectively identifying the discourse types of each element. Using syntactic rules (patterns), we evaluated the process output in terms of precision and recall using a previously annotated corpus in Gender Studies. We chose to annotate documents in Humanities, as these documents are well known to be less formalised than those in "hard science". The process output has been used to create a SciAnnotDoc representation of the corpus on top of which we built a faceted search interface. Experiments with users show that searches using with this interface clearly outperform standard keyword searches for precise or complex queries.
引用
收藏
页码:271 / 286
页数:16
相关论文
共 36 条
[1]   Working Long Hours and Having No Choice: Time Poverty in Guinea [J].
Bardasi, Elena ;
Wodon, Quentin .
FEMINIST ECONOMICS, 2010, 16 (03) :45-78
[2]  
Biber Douglas, 1991, VARIATION SPEECH WRI
[3]   Document structure and digital libraries: how researchers mobilize information in journal articles [J].
Bishop, AP .
INFORMATION PROCESSING & MANAGEMENT, 1999, 35 (03) :255-279
[4]   Constraints into preferences: Gender, status, and emerging career aspirations [J].
Correll, SJ .
AMERICAN SOCIOLOGICAL REVIEW, 2004, 69 (01) :93-113
[5]   Contested Collective Intelligence: Rationale, Technologies, and a Human-Machine Annotation Study [J].
De Liddo, Anna ;
Sandor, Agnes ;
Shum, Simon Buckingham .
COMPUTER SUPPORTED COOPERATIVE WORK-THE JOURNAL OF COLLABORATIVE COMPUTING AND WORK PRACTICES, 2012, 21 (4-5) :417-448
[6]  
de Ribaupierre H., 2014, THESIS
[7]  
Falquet G., 2015, 5 INT WORKSH SEM DIG
[8]  
Falquet G., 2014, P 14 INT C KNOWL TEC, P40
[9]  
Falquet G, 2011, BOOKSONLINE 11, DOI [10.1145/2064058.2064064, DOI 10.1145/2064058.2064064]
[10]  
Falquet G., 2013, P 6 INT WORKSH EXPL, P21