Efficient memory transformer based acoustic model for low latency streaming speech recognition

Invention Grant

US11646017B1 Efficient memory transformer based acoustic model for low latency streaming speech recognition 有权

Please log in to see more content

Patent Title: Efficient memory transformer based acoustic model for low latency streaming speech recognition
Application No.: US17193414

Application Date: 2021-03-05
Publication No.: US11646017B1

Publication Date: 2023-05-09
Inventor: Yangyang Shi , Yongqiang Wang , Chunyang Wu , Ching-Feng Yeh , Julian Yui-Hin Chan , Qiaochu Zhang , Duc Hoang Le , Michael Lewis Seltzer
Applicant: Meta Platforms, Inc.
Applicant Address: US CA Menlo Park
Assignee: Meta Platforms, Inc.
Current Assignee: Meta Platforms, Inc.
Current Assignee Address: US CA Menlo Park
Agency: Baker Botts L.L.P.
Main IPC: G10L15/16
IPC: G10L15/16 ; G10L15/183 ; G06N3/04 ; G10L15/22

Efficient memory transformer based acoustic model for low latency streaming speech recognition

Abstract:

In one embodiment, a method includes accessing a machine-learning model configured to generate an encoding for an utterance by using a module to process data associated with each segment of the utterance in a series of iterations, performing operations associated with an i-th segment during an n-th iteration by the module, which include receiving an input comprising input contextual embeddings generated for the i-th segment in a preceding iteration and a memory bank storing memory vectors generated in the preceding iteration for segments preceding the i-th segment, generating attention outputs and a memory vector based on keys, values, and queries generated using the input, and generating output contextual embeddings for the i-th segment based on the attention outputs, providing the memory vector to the module for performing operations associated with the i-th segment in a next iteration, and performing speech recognition by decoding the encoding of the utterance.

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/16	..利用人工神经网络