Multi-encoder end-to-end automatic speech recognition (ASR) for joint modeling of multiple input devices

Invention Grant

US11978433B2 Multi-encoder end-to-end automatic speech recognition (ASR) for joint modeling of multiple input devices 有权

Please log in to see more content

Patent Title: Multi-encoder end-to-end automatic speech recognition (ASR) for joint modeling of multiple input devices
Application No.: US17354480

Application Date: 2021-06-22
Publication No.: US11978433B2

Publication Date: 2024-05-07
Inventor: Felix Weninger , Marco Gaudesi , Ralf Leibold , Puming Zhan
Applicant: Microsoft Technology Licensing, LLC
Applicant Address: US WA Redmond
Assignee: Microsoft Technology Licensing, LLC.
Current Assignee: Microsoft Technology Licensing, LLC.
Current Assignee Address: US WA Redmond
Agency: Barta Jones, PLLC
Main IPC: G10L19/02
IPC: G10L19/02 ; G10L15/04 ; G10L21/0208 ; G10L25/24

Multi-encoder end-to-end automatic speech recognition (ASR) for joint modeling of multiple input devices

Abstract:

An end-to-end automatic speech recognition (ASR) system includes: a first encoder configured for close-talk input captured by a close-talk input mechanism; a second encoder configured for far-talk input captured by a far-talk input mechanism; and an encoder selection layer configured to select at least one of the first and second encoders for use in producing ASR output. The selection is made based on at least one of short-time Fourier transform (STFT), Mel-frequency Cepstral Coefficient (MFCC) and filter bank derived from at least one of the close-talk input and the far-talk input. If signals from both the close-talk input mechanism and the far-talk input mechanism are present for a speech segment, the encoder selection layer dynamically selects between the close-talk encoder and the far-talk encoder to select the encoder that better recognizes the speech segment. An encoder-decoder model is used to produce the ASR output.

Public/Granted literature

US20220406295A1 MULTI-ENCODER END-TO-END AUTOMATIC SPEECH RECOGNITION (ASR) FOR JOINT MODELING OF MULTIPLE INPUT DEVICES Public/Granted day:2022-12-22

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L19/00	用于冗余度下降情形（例如在声码器中）的语音或音频信号分析-合成技术；语音或音频信号编码或解码，采用源滤波器模型或心理声学分析（乐器中的入G10H）
G10L19/02	.利用频谱分析，例如变换声码器或子频带声码器