THE META-PI NETWORK - BUILDING DISTRIBUTED KNOWLEDGE REPRESENTATIONS FOR ROBUST MULTISOURCE PATTERN-RECOGNITION

被引：50

作者：

HAMPSHIRE, JB ^{[1
]}

WAIBEL, A ^{[1
]}

机构：

[1] CARNEGIE MELLON UNIV,SCH COMP SCI,PITTSBURGH,PA 15213

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 1992年 / 14卷 / 07期

关键词：

BAYESIAN DISCRIMINANT FUNCTION; CLASS-CONDITIONAL DENSITY; CONNECTIONISM; META-PI NETWORK; MIXTURE DENSITY; MULTISOURCE; PHONEME RECOGNITION; SPEECH RECOGNITION; TIME-DELAY NEURAL NETWORK (TDNN);

D O I：

10.1109/34.142911

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a multinetwork connectionist classifier that forms distributed low-level knowledge representations for robust pattern recognition, given random feature vectors generated by multiple statistically distinct sources. The architecture comprises a number of source-dependent modules (i.e., each module is trained to classify patterns from one particular source) that are linked by a combinational superstructure. The superstructure adapts to the source being processed, integrating source-dependent classifications based on its internal assessment of the source model or combination of source models most likely to classify the input signal correctly. To train this combinational network, we have developed a new form of multiplicative connection, which we call the "Meta-Pi" connection; its function is closely aligned with predecessors described in [3], [29], and [31]. We illustrate how the Meta-Pi paradigm implements an adaptive Bayesian maximum a posteriori (MAP) classifier. We demonstrate its performance in the context of multispeaker phoneme recognition. In this task, the Meta-Pi superstructure combines speaker-dependent time-delay neural network (TDNN) modules to perform multispeaker /b, d, g/ phoneme recognition with speaker-dependent error rates (2 %). Finally, we apply the Meta-Pi architecture to a limited source-independent recognition task, illustrating its discrimination of a novel source. We demonstrate that it can adapt to the novel source (speaker), given five adaptation examples of each of the three phonemes; the resulting error rate of 7 % is approximately three times that of a typical source-dependent classifier. Longer term adaptation yields discrimination that is comparable with a speaker-dependent classifier of the novel source. We conclude with an assessment of our experimental results and their implications for larger real-world multisource and source-independent pattern recognition systems.

引用

页码：751 / 769

页数：19

共 43 条

[1]

BARRON A, 1988, S INTERFACE STATE CO

[2] LINKS BETWEEN MARKOV-MODELS AND MULTILAYER PERCEPTRONS [J].

BOURLARD, H ;

WELLEKENS, CJ .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (12) :1167-1178

[3] AUTOMATIC PATTERN-RECOGNITION - A STUDY OF THE PROBABILITY OF ERROR [J].

DEVROYE, L .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1988, 10 (04) :530-543

[4]

Duda R. O., 1973, PATTERN CLASSIFICATI, V3

[5]

GISH H, 1990, 1990 IEEE P INT C AS, V3, P1361

[6]

HAMPSHIRE J, 1991, 1990 P CONN MOD SUMM, P159

[7]

Hampshire J B, 1990, IEEE Trans Neural Netw, V1, P216, DOI 10.1109/72.80233

[8]

HAMPSHIRE JB, 1989, CMUCS89166R CARN MEL

[9] ESTIMATION OF PARMAETERS FOR A MIXTURE OF NORMAL DISTRIBUTIONS [J].

HASSELBLAD, V .

TECHNOMETRICS, 1966, 8 (03) :431-+

[10] DESIGN AND ANALYSIS OF PATTERN RECOGNITION EXPERIMENTS [J].

HIGHLEYMAN, WH .

BELL SYSTEM TECHNICAL JOURNAL, 1962, 41 (02) :723-+

← 1 2 3 4 5 →