Statistical-analysis-based reset of recurrent neural networks for automatic speech recognition

    公开(公告)号:US10255909B2

    公开(公告)日:2019-04-09

    申请号:US15637559

    申请日:2017-06-29

    Abstract: Techniques are provided for calculating reset parameters for recurrent neural networks (RNN). A methodology implementing the techniques according to an embodiment includes generating a sequence of statistics. The calculation of each statistic is based on outputs of an RNN that is periodically re-initialized at a selected RNN reset time such that each of the calculated statistics is associated with a unique RNN reset time selected from a pre-determined range of reset times. The method further includes analyzing the sequence to identify a maximum interval during which the sequence remains relatively constant. The method further includes selecting a reset time parameter and reset context duration parameter, for re-initialization of the RNN during operation. The reset time parameter is based on the duration of the identified maximum interval and the reset context duration parameter is based on a time associated with the starting point of the identified maximum interval.

    Score trend analysis for reduced latency automatic speech recognition

    公开(公告)号:US10657952B2

    公开(公告)日:2020-05-19

    申请号:US15892510

    申请日:2018-02-09

    Abstract: Techniques are provided for reducing the latency of automatic speech recognition using hypothesis score trend analysis. A methodology implementing the techniques according to an embodiment includes generating complete-phrase hypotheses and partial-phrase hypotheses, along with associated likelihood scores, based on a segment of speech. The method also includes selecting the complete-phrase hypothesis associated with the highest of the complete-phrase hypotheses likelihood scores, and selecting the partial-phrase hypothesis associated with the highest of the partial-phrase hypotheses likelihood scores. The method further includes calculating a relative likelihood score based on a ratio of the likelihood score associated with the selected complete-phrase hypothesis to the likelihood score associated with the selected partial-phrase hypothesis. The method further includes calculating a trend of the relative likelihood score as a function of time and identifying an endpoint of the speech based on a determination that the trend does not decrease over a selected time period.

    Context-aware query recognition for electronic devices

    公开(公告)号:US10147423B2

    公开(公告)日:2018-12-04

    申请号:US15280809

    申请日:2016-09-29

    Abstract: A method for context-aware query recognition in an electronic device includes receiving user speech from an input device. A speech signal is generated from the user speech. It is determined if the speech signal includes an action to be performed and if the electronic device is the intended recipient of the user speech. If the recognized speech signal include the action and the intended recipient of the user speech is the electronic device, a command is generated for the electronic device to perform the action.

    SPOKEN LANGUAGE UNDERSTANDING BASED ON BUFFERED KEYWORD SPOTTING AND SPEECH RECOGNITION

    公开(公告)号:US20180293974A1

    公开(公告)日:2018-10-11

    申请号:US15483421

    申请日:2017-04-10

    Abstract: Techniques are provided for spoken language understanding based on keyword spotting and speech recognition. A methodology implementing the techniques according to an embodiment includes detecting a user spoken keyword or key-phrase embedded in an initial segment of a received audio signal, which is stored in a buffer. The method further includes triggering an automatic speech recognition (ASR) processor in response to the key-phrase detection. The method further includes performing automatic speech recognition, by the ASR processor, on a combination of the buffered initial segment and one or more additional received segments of the audio signal which include further speech from the user. The method still further includes performing natural language understanding on the recognized speech to determine a user request. The key-phrase is user selectable and serves to wake the ASR processor from a sleeping or idle lower power consumption state, into an active higher power consumption recognition state.

    DYNAMIC ENROLLMENT OF USER-DEFINED WAKE-UP KEY-PHRASE FOR SPEECH ENABLED COMPUTER SYSTEM

    公开(公告)号:US20190043481A1

    公开(公告)日:2019-02-07

    申请号:US15855379

    申请日:2017-12-27

    Abstract: Techniques are provided for wake-on-voice (WOV) key-phrase enrollment. A methodology implementing the techniques according to an embodiment includes generating a WOV key-phrase model based on identification of the sequence of sub-phonetic units of a user-provided key-phrase. The WOV key-phrase model is employed by a WOV processor for detection of the user spoken key-phrase and triggering operation of an automatic speech recognition (ASR) processor in response to the detection. The method further includes updating an ASR language model based on the user-provided key-phrase. The update includes one of embedding the WOV key-phrase model into the ASR language model, converting sub-phonetic units of the WOV key-phrase model and embedding the converted WOV key-phrase model into the ASR language model, or generating an ASR key-phrase model by applying a phoneme-syllable based statistical language model to the user-provided key-phrase and embedding the generated ASR key-phrase model into the ASR language model.

    STATISTICAL-ANALYSIS-BASED RESET OF RECURRENT NEURAL NETWORKS FOR AUTOMATIC SPEECH RECOGNITION

    公开(公告)号:US20190005945A1

    公开(公告)日:2019-01-03

    申请号:US15637559

    申请日:2017-06-29

    Abstract: Techniques are provided for calculating reset parameters for recurrent neural networks (RNN). A methodology implementing the techniques according to an embodiment includes generating a sequence of statistics. The calculation of each statistic is based on outputs of an RNN that is periodically re-initialized at a selected RNN reset time such that each of the calculated statistics is associated with a unique RNN reset time selected from a pre-determined range of reset times. The method further includes analyzing the sequence to identify a maximum interval during which the sequence remains relatively constant. The method further includes selecting a reset time parameter and reset context duration parameter, for re-initialization of the RNN during operation. The reset time parameter is based on the duration of the identified maximum interval and the reset context duration parameter is based on a time associated with the starting point of the identified maximum interval.

    Dynamic enrollment of user-defined wake-up key-phrase for speech enabled computer system

    公开(公告)号:US10672380B2

    公开(公告)日:2020-06-02

    申请号:US15855379

    申请日:2017-12-27

    Abstract: Techniques are provided for wake-on-voice (WOV) key-phrase enrollment. A methodology implementing the techniques according to an embodiment includes generating a WOV key-phrase model based on identification of the sequence of sub-phonetic units of a user-provided key-phrase. The WOV key-phrase model is employed by a WOV processor for detection of the user spoken key-phrase and triggering operation of an automatic speech recognition (ASR) processor in response to the detection. The method further includes updating an ASR language model based on the user-provided key-phrase. The update includes one of embedding the WOV key-phrase model into the ASR language model, converting sub-phonetic units of the WOV key-phrase model and embedding the converted WOV key-phrase model into the ASR language model, or generating an ASR key-phrase model by applying a phoneme-syllable based statistical language model to the user-provided key-phrase and embedding the generated ASR key-phrase model into the ASR language model.

    QUERY REJECTION FOR LANGUAGE UNDERSTANDING
    10.
    发明申请

    公开(公告)号:US20180349794A1

    公开(公告)日:2018-12-06

    申请号:US15611104

    申请日:2017-06-01

    Abstract: Techniques are provided for rejecting out-of-domain (OD) queries in a language understanding system. A methodology implementing the techniques according to an embodiment includes generating a plurality of in-domain (ID) utterances based on variations of provided ID sentences, and generating a plurality of OD utterances based on variations of provided OD sentences. The method may further include training an ID language model based on the generated ID utterances and training an OD language model based on the generated OD utterances. The ID language model is configured to generate an ID dataset based on calculated probabilities associated with the generated ID utterances. The OD language model is configured to generate an OD dataset based on calculated probabilities associated with the generated OD utterances. The method further includes training a classifier to detect OD queries from a plurality of received queries, the training based on the ID dataset and the OD dataset.

Patent Agency Ranking