Multimodal speech recognition for real-time video audio-based display indicia application

Invention Grant

US09959872B2 Multimodal speech recognition for real-time video audio-based display indicia application 有权

Please log in to see more content

Patent Title: Multimodal speech recognition for real-time video audio-based display indicia application
Application No.: US14967726

Application Date: 2015-12-14
Publication No.: US09959872B2

Publication Date: 2018-05-01
Inventor: Priscilla Barreira Avegliano , Carlos Henrique Cardonha , Stefany Mazon , Julio Nogima
Applicant: International Business Machines Corporation
Applicant Address: US NY Armonk
Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
Current Assignee Address: US NY Armonk
Agency: Cantor Colburn LLP
Main IPC: G10L15/00
IPC: G10L15/00 ; G10L15/26 ; G10L15/32 ; G10L21/10 ; G10L21/18 ; H04N21/488 ; H04N21/44 ; H04N21/439 ; H04N21/84 ; H04N21/845 ; G10L21/06

Multimodal speech recognition for real-time video audio-based display indicia application

Abstract:

Aspects relate to computer implemented methods, systems, and processes to automatically generate audio-based display indicia of media content including receiving, by a processor, a plurality of media content categories including at least one feature, receiving a plurality of categorized speech recognition algorithms, each speech recognition algorithm being associated with a respective one or more of the plurality of media content categories, determining a media content category of a current media content based on at least one feature of the current media content, selecting one speech recognition algorithm from the plurality of categorized speech recognition algorithms based on the determination of the media content category of the current media content, and applying the selected speech recognition algorithm to the current media content.

Public/Granted literature

US20170169827A1 MULTIMODAL SPEECH RECOGNITION FOR REAL-TIME VIDEO AUDIO-BASED DISPLAY INDICIA APPLICATION Public/Granted day:2017-06-15

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）