Transcriber: Development and use of a tool for assisting speech corpora production

被引:134
作者
Barras, C
Geoffrois, E
Wu, ZB
Liberman, M
机构
[1] LIMSI, Spoken Language Proc Grp, CNRS, F-91403 Orsay, France
[2] GIP, CTA, DGA, F-94114 Arcueil, France
[3] LDC, Philadelphia, PA 19104 USA
关键词
transcription tool; speech corpora; broadcast news; linguistic annotation formats;
D O I
10.1016/S0167-6393(00)00067-4
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present "Transcriber", a tool for assisting in the creation of speech corpora, and describe some aspects of its development and use. Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions. It is highly portable, relying on the scripting language Tcl/Tk with extensions such as Snack for advanced audio functions and telex for lexical analysis, and has been tested on various Unix systems and Windows. The data format follows the XML standard with Unicode support for multilingual transcriptions. Distributed as free software in order to encourage the production of corpora, ease their sharing, increase user feedback and motivate software contributions, Transcriber has been in use for over a year in several countries, As a result of this collective experience, new requirements arose to support additional data formats, video control, and a better management of conversational speech. Using the annotation graphs framework recently formalized, adaptation of the tool towards new tasks and support of different data formats will become easier. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:5 / 22
页数:18
相关论文
共 30 条
  • [1] [Anonymous], P INT C SPOK LANG PR
  • [2] [Anonymous], P INT C SPOK LANG PR
  • [3] [Anonymous], 1998, ICSLP 1998
  • [4] Barras C., 1998, P 1 INT C LANG RES E, P1373
  • [5] A formal framework for linguistic annotation
    Bird, S
    Liberman, M
    [J]. SPEECH COMMUNICATION, 2001, 33 (1-2) : 23 - 60
  • [6] BIRD S, 1999, LINGUISTIC ANNOTATIO
  • [7] BONNET F, 1998, TCLEX LEXICAL ANAL G
  • [8] Bray Tim, 1998, Extensible markup language
  • [9] BURGER S, 1999, 9 INT COCOSDA WORKSH
  • [10] Multi-level annotation in the Emu speech database management system
    Cassidy, S
    Harrington, J
    [J]. SPEECH COMMUNICATION, 2001, 33 (1-2) : 61 - 77