Speech synthesis from neural decoding of spoken sentences

被引:508
作者
Anumanchipalli, Gopala K. [1 ,2 ]
Chartier, Josh [1 ,2 ,3 ]
Chang, Edward F. [1 ,2 ,3 ]
机构
[1] Univ Calif San Francisco, Dept Neurol Surg, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, Weill Inst Neurosci, San Francisco, CA 94143 USA
[3] Univ Calif Berkeley & Univ Calif San Francisco Jo, Berkeley, CA 94720 USA
关键词
CORTEX;
D O I
10.1038/s41586-019-1119-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Technology that translates neural activity into speech would be transformative for people who are unable to communicate as a result of neurological impairments. Decoding speech from neural activity is challenging because speaking requires very precise and rapid multi-dimensional control of vocal tract articulators. Here we designed a neural decoder that explicitly leverages kinematic and sound representations encoded in human cortical activity to synthesize audible speech. Recurrent neural networks first decoded directly recorded cortical activity into representations of articulatory movement, and then transformed these representations into speech acoustics. In closed vocabulary tests, listeners could readily identify and transcribe speech synthesized from cortical activity. Intermediate articulatory dynamics enhanced performance even with limited data. Decoded articulatory representations were highly conserved across speakers, enabling a component of the decoder to be transferrable across participants. Furthermore, the decoder could synthesize speech when a participant silently mimed sentences. These findings advance the clinical viability of using speech neuroprosthetic technology to restore spoken communication.
引用
收藏
页码:493 / +
页数:18
相关论文
共 49 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] Decoding motor imagery from the posterior parietal cortex of a tetraplegic human
    Aflalo, Tyson
    Kellis, Spencer
    Klaes, Christian
    Lee, Brian
    Shi, Ying
    Pejsa, Kelsie
    Shanfield, Kathleen
    Hayes-Jackson, Stephanie
    Aisen, Mindy
    Heck, Christi
    Liu, Charles
    Andersen, Richard A.
    [J]. SCIENCE, 2015, 348 (6237) : 906 - 910
  • [3] Restoration of reaching and grasping movements through brain-controlled muscle stimulation in a person with tetraplegia: a proof-of-concept demonstration
    Ajiboye, A. Bolu
    Willett, Francis R.
    Young, Daniel R.
    Memberg, William D.
    Murphy, Brian A.
    Miller, Jonathan P.
    Walter, Benjamin L.
    Sweet, Jennifer A.
    Hoyen, Harry A.
    Keith, Michael W.
    Peckham, P. Hunter
    Simeral, John D.
    Donoghue, John P.
    Hochberg, Leigh R.
    Kirsch, Robert F.
    [J]. LANCET, 2017, 389 (10081) : 1821 - 1830
  • [4] Towards reconstructing intelligible speech from the human auditory cortex
    Akbari, Hassan
    Khalighinejad, Bahar
    Herrero, Jose L.
    Mehta, Ashesh D.
    Mesgarani, Nima
    [J]. SCIENTIFIC REPORTS, 2019, 9 (1)
  • [5] [Anonymous], P 7 ISCA SPEECH SYNT
  • [6] [Anonymous], P INTERSPEECH
  • [7] [Anonymous], P WORKSH SPEECH NAT
  • [8] [Anonymous], P 2006 IEEE INT C AC
  • [9] [Anonymous], 1999, Mocha: multichannel articulatory database
  • [10] [Anonymous], 2011, FESTVOX TOOLS CREATI