Machine translation of cortical activity to text with an encoder-decoder framework

被引:172
作者
Makin, Joseph G. [1 ,2 ]
Moses, David A. [1 ,2 ]
Chang, Edward F. [1 ,2 ]
机构
[1] UCSF, Ctr Integrat Neurosci, San Francisco, CA 94143 USA
[2] UCSF, Dept Neurol Surg, San Francisco, CA 94143 USA
关键词
HUMAN SENSORIMOTOR CORTEX;
D O I
10.1038/s41593-020-0608-8
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
A decade after speech was first decoded from human brain signals, accuracy and speed remain far below that of natural speech. Here we show how to decode the electrocorticogram with high accuracy and at natural-speech rates. Taking a cue from recent advances in machine translation, we train a recurrent neural network to encode each sentence-length sequence of neural activity into an abstract representation, and then to decode this representation, word by word, into an English sentence. For each participant, data consist of several spoken repeats of a set of 30-50 sentences, along with the contemporaneous signals from similar to 250 electrodes distributed over peri-Sylvian cortices. Average word error rates across a held-out repeat set are as low as 3%. Finally, we show how decoding with limited data can be improved with transfer learning, by training certain layers of the network under multiple participants' data.
引用
收藏
页码:575 / +
页数:12
相关论文
共 42 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], 2010, Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics, DOI DOI 10.1007/978-1-4419-5951-5_4
[3]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[4]  
Bai S., 2018, EMPIRICAL EVALUATION, DOI [10.48550/arXiv.1803.01271, DOI 10.48550/ARXIV.1803.01271]
[5]   Digitization of the Canadian Parliamentary Debates [J].
Beelen, Kaspar ;
Alberdingk, Timothy ;
Cochrane, Christopher ;
Halvemaan, Kees ;
Hirst, Graeme ;
Kimmins, Michael ;
Lijbrink, Sander ;
Marx, Maarten ;
Naderi, Nona ;
Rheault, Ludovic ;
Polyanovsky, Roman ;
Whyte, Tanya .
CANADIAN JOURNAL OF POLITICAL SCIENCE-REVUE CANADIENNE DE SCIENCE POLITIQUE, 2017, 50 (03) :849-864
[6]   Functional organization of human sensorimotor cortex for speech articulation [J].
Bouchard, Kristofer E. ;
Mesgarani, Nima ;
Johnson, Keith ;
Chang, Edward F. .
NATURE, 2013, 495 (7441) :327-332
[7]   Classification of intended phoneme production from chronic intracortical microelectrode recordings in speech-motor cortex [J].
Brumberg, Jonathan S. ;
Wright, E. Joe ;
Andreasen, Dinal S. ;
Guenther, Frank H. ;
Kennedy, Philip R. .
FRONTIERS IN NEUROSCIENCE, 2011, 5 :1-12
[8]  
Brumberg JS, 2009, INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, P652
[9]   Synchronous and Asynchronous Theta and Gamma Activity during Episodic Memory Formation [J].
Burke, John F. ;
Zaghloul, Kareem A. ;
Jacobs, Joshua ;
Williams, Ryan B. ;
Sperling, Michael R. ;
Sharan, Ashwini D. ;
Kahana, Michael J. .
JOURNAL OF NEUROSCIENCE, 2013, 33 (01) :292-304
[10]   Multitask learning [J].
Caruana, R .
MACHINE LEARNING, 1997, 28 (01) :41-75