IDENTIFICATION OF A SPEAKER BY SPEECH SPECTROGRAMS

被引:20
作者
BOLT, RH
COOPER, FS
DAVID, EE
DENES, PB
PICKETT, JM
STEVENS, KN
机构
[1] Bolt Beranek and Newman Inc., Cambridge, MA 02138
[2] Haskins Laboratories, New York, NY 10017
[3] Bell Telephone Laboratories, Murray Hill
[4] Gallaudet College, Washington
[5] Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge
关键词
D O I
10.1126/science.166.3903.338
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
1) Speech carries many simultaneous messages interwoven in a complex of words and phrases, moods, and individual voice characteristics. In their acoustic realization as speech, these messages are highly interdependent and thus difficult to disentangle. However, human observers can, to a limited extent, identify voices by ear or by visual examintion of the acoustic patterns of speech. 2) The acoustic speech signal can be analyzed in frequency, energy, and time and recorded graphically to produce a spectrogram. Neither the spectrogram nor any other known process can directly display an individual's voice traits, because of the intermixing of these traits with the features that characterize words and phrases. At present, a human observer must examine the patterns of spectrograms and decide subjectively about the identities of talkers. 3) Similarities and differences among spectrographic patterns are ambiguous and may be misleading. Prominent similarities usually indicate that similar sounds were spoken, but do not necessarily imply that they were spoken by the same person; differences in pattern, when the words are the same, may reflect differences of speaker or only normal variations in the utterances of a single speaker. 4) Speech spectrograms, when used for voice identification, are not analogous to fingerprints, primarily because of fundamental differences in the sources of the patterns and consequent differences in their interpretation. For example, fingerprint patterns are a direct representation of anatomical traits. Vocal anatomy, on the contrary, is not represented in any direct way in voice spectrograms. In the interpretation of fingerprints, all points of similarity imply a match, although some more strongly than others; this simple relationship does not hold for the interpretation of voice patterns. 5) Experimental studies of voice identification by using visual interpretation of spectrograms by human observers indicate false identification rates ranging from zero to as high as 63 percent, depending on the type of task set for the observer, his training, and other factors. Reliable machine methods for voice identification have not yet been established. 6) Experience in applying spectrographic voice identification in law enforcement has led proponents of the method to express confidence in its reliability. The basis for this confidence is not, however, accessible to objective assessment. 7) Experimental studies to assess the reliability of voice identification under practical conditions, whether by experts or by explicit procedures, have not yet been made, but the requirements for such studies have been outlined. We find, in brief, that spectrographic voice identification has inherent difficulties and uncertainties. Anecdotal evidence given in support of the method is not scientifically convincing. The controlled experiments that have been reported give conflicting results. Furthermore, the experiments reported thus far do not provide a direct test of the practical task of determining whether two spoken passages were uttered by the same speaker or by two different speakers, one of whom may be a person unknown. We conclude that the available results are inadequate to establish the reliability of voice identification by spectrograms. We believe this conclusion is shared by most scientists who are knowledgeable about speech; hence, many of them are deeply concerned about the use of spectrographic evidence in the courts. Procedures exist, as we have suggested, by which the reliability of voice identification methods can be evaluated. We believe that such validation is urgently required.
引用
收藏
页码:338 / &
相关论文
共 26 条
[1]  
CLARKE FR, 1966, ESDTR66636 DEC SCI L
[2]  
CLARKE FR, CHARACTERISTICS DETE
[3]  
Cummins H, 1961, FINGERPRINTS PALMS S
[4]  
Fant G., 1960, ACOUSTIC THEORY SPEE
[5]  
FOURCIN A, 1969, PERSONAL COMMUNICATI
[6]  
Galton F., 1892, FINGER PRINTS
[7]  
GALTON F, 1965, REPRINT
[8]  
HECKER M, 1969, METHODS MEASURING SP
[9]   VOICEPRINT IDENTIFICATION [J].
KERSTA, LG .
NATURE, 1962, 196 (4861) :1253-&
[10]   VOICEPRINT-IDENTIFICATION INFALLIBILITY [J].
KERSTA, LG .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1962, 34 (12) :1978-&