Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension

被引：39

作者：

Sun, Kai ^{[1
]}

Yu, Dian ^{[2
]}

Yu, Dong ^{[2
]}

Cardie, Claire ^{[1
]}

机构：

[1] Cornell Univ, Ithaca, NY 14850 USA

[2] Tencent AI Lab, Bellevue, WA USA

来源：

TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS | 2020年 / 8卷

关键词：

D O I：

10.1162/tacl_a_00305

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Machine reading comprehension tasks require a machine reader to answer questions relevant to the given document. In this paper, we present the first free-form multiple-Choice Chinese machine reading Comprehension dataset (C-3), containing 13,369 documents (dialogues or more formally written mixed-genre texts) and their associated 19,577 multiple-choice free-form questions collected from Chineseas-a-second-language examinations. We present a comprehensive analysis of the prior knowledge (i.e., linguistic, domainspecific, and generalworld knowledge) needed for these real-world problems. We implement rule-based and popular neuralmethods and find that there is still a significant performance gap between the best performing model (68.5%) and human readers (96.0%), especiallyon problems that require prior knowledge. We further study the effects of distractor plausibility and data augmentation based on translated relevant datasets for English on model performance. We expect C-3 to present great challenges to existing systems as answering 86.8% of questions requires both knowledge within and beyond the accompanying document, and we hope that C-3 can serve as a platform to study how to leverage various kinds of prior knowledge to better understand a given written or orally oriented text. C-3 is available at https://dataset.org/c3/.

引用

页码：141 / 155

页数：15

共 62 条

[1]

Adams Marilyn, 1982, READER MEETS AUTHOR, V13, P2

[2]

AlecRadford Karthik Narasimhan, 2018, IMPROVING LANGUAGE U

[3]

[Anonymous], 2018, Proceedings of the COLING

[4]

[Anonymous], 2016, ARXIV160706275

[5]

Choi E, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2174

[6]

Clark C, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P2924

[7]

Clark P, 2016, AAAI CONF ARTIF INTE, P2580

[8]

Cui Y., 2016, P INT C COMP LING, P1777

[9] Pre-Training With Whole Word Masking for Chinese BERT [J].

Cui, Yiming ;

Che, Wanxiang ;

Liu, Ting ;

Qin, Bing ;

Yang, Ziqing .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :3504-3514

[10]

Cui YM, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P2721

← 1 2 3 4 5 6 7 →