Bootstrapping word boundaries: A bottom-up corpus-based approach to speech segmentation

被引:89
作者
Cairns, P
Shillcock, R
Chater, N
Levy, J
机构
[1] UNIV EDINBURGH, CTR COGNIT SCI, EDINBURGH EH8 9LW, MIDLOTHIAN, SCOTLAND
[2] UNIV WARWICK, DEPT PSYCHOL, COVENTRY CV4 7AL, W MIDLANDS, ENGLAND
[3] UNIV LONDON, BIRKBECK COLL, DEPT PSYCHOL, LONDON WC1H 0PP, ENGLAND
基金
英国经济与社会研究理事会;
关键词
D O I
10.1006/cogp.1997.0649
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
Speech is continuous, and isolating meaningful chunks for lexical access is a nontrivial problem. In this paper we use neural network models and more conventional statistics to study the use of sequential phonological probabilities in the segmentation of an idealized phonological transcription of the London-Lund Corpus; these speech data are representative of genuine conversational English. We demonstrate, first, that the distribution of phonetic segments in English is an important cue to segmentation, and, second, that the distributional information is such that it might allow the infant, beginning with only a sensitivity to the statistics of subsegmental primitives, to bootstrap into a series of increasingly sophisticated segmentation competences, ending with an adult competence. We discuss the relation between the behavior of the models and existing psycholinguistic studies of speech segmentation. In particular, we confirm the utility of the Metrical Segmentation Strategy (Cutler & Norris, 1988) and demonstrate a route by which this utility might be recognized by the infant, without requiring the prior specification of categories like ''syllable'' or ''strong syllable.'' (C) 1997 Academic Press.
引用
收藏
页码:111 / 153
页数:43
相关论文
共 94 条