Perspectives on crowdsourcing annotations for natural language processing

被引:55
作者
Wang, Aobo [2 ]
Cong Duy Vu Hoang [1 ]
Kan, Min-Yen [2 ]
机构
[1] ASTAR, Inst Infocomm Res I2R, Human Language Technol Dept, Singapore 138632, Singapore
[2] Natl Univ Singapore, Singapore 117417, Singapore
基金
新加坡国家研究基金会;
关键词
Human computation; Crowdsourcing; NLP; Wikipedia; Mechanical Turk; Games with a purpose; Annotation; GAMES;
D O I
10.1007/s10579-012-9176-1
中图分类号
TP39 [计算机的应用];
学科分类号
080201 [机械制造及其自动化];
摘要
Crowdsourcing has emerged as a new method for obtaining annotations for training models for machine learning. While many variants of this process exist, they largely differ in their methods of motivating subjects to contribute and the scale of their applications. To date, there has yet to be a study that helps the practitioner to decide what form an annotation application should take to best reach its objectives within the constraints of a project. To fill this gap, we provide a faceted analysis of crowdsourcing from a practitioner's perspective, and show how our facets apply to existing published crowdsourced annotation applications. We then summarize how the major crowdsourcing genres fill different parts of this multi-dimensional space, which leads to our recommendations on the potential opportunities crowdsourcing offers to future annotation efforts.
引用
收藏
页码:9 / 31
页数:23
相关论文
共 59 条
[1]
Akkaya Cem, 2010, P NAACL HLT 2010 WOR, P195
[2]
Ambati Vamshi., 2010, Proceedings of the NAACL HLT Workshop on Creating Speech and Language Data With Amazon's Mechanical Turk, P62
[3]
[Anonymous], HCIL201009 U MAR
[4]
[Anonymous], 2010, P NAACL HLT 2010 WOR
[5]
[Anonymous], 2009, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
[6]
[Anonymous], 2005, P 10 INT C INT US IN, DOI DOI 10.1145/1040830.1040870
[7]
[Anonymous], 2008, P 6 INT C LANG RES E
[8]
[Anonymous], P 6 INT LANG RES EV
[9]
[Anonymous], 2009, COMPUTER SCI TECHNIC
[10]
[Anonymous], 2010, Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk