Adaptive multi-microphone beamforming

    公开(公告)号:US10366701B1

    公开(公告)日:2019-07-30

    申请号:US15681395

    申请日:2017-08-20

    Applicant: Huan-Yu Su

    Inventor: Huan-Yu Su

    Abstract: Provided is a method and computer program product for producing an enhanced audio signal for an output device from audio signals received by 2 or more microphones in close proximity to each other. For example, one embodiment of the present invention comprises the steps of receiving a first input audio signal from the first microphone, digitizing the first input audio signal to produce a first digitized audio input signal, receiving a second input audio input signal from the second microphone, digitizing the second input audio input signal to produce a second digitized audio input signal, using the first digitized audio input signal as a reference signal to an adaptive prediction filter, using the second digitized audio input signal as input to said adaptive prediction filter and finally adding a prediction result signal from the adaptive prediction filter to the first digitized audio input signal to produce the enhanced audio signal. In other embodiments, any number of microphones can be used, and in all embodiments there is no requirement to detect or locate the source or direction of arrival of the input audio signals.

    Detecting and reporting a loss of connection by a telephone
    2.
    发明授权
    Detecting and reporting a loss of connection by a telephone 有权
    通过电话检测和报告连接丢失

    公开(公告)号:US07796623B2

    公开(公告)日:2010-09-14

    申请号:US12384019

    申请日:2009-03-30

    Abstract: There is provided a method of detecting and reporting poor voice quality for use by a gateway device. The method comprises facilitating a connection between a telephone and a remote telephone via a network, and detecting a poor voice quality indictor during the connection. The method further comprises capturing, for a pre-determined period of time, telephone voice data being exchanged between the gateway and the telephone, network voice data being exchanged between the gateway and the network, and gateway parameters. The method also comprises packetizing the telephone voice data, the network voice data and the gateway parameters into a plurality packets having a network address of a network storage, and transmitting the plurality packets destined for the network storage via the network. In one aspect, the poor voice quality indictor may be generated by a user of the telephone in response to a poor voice quality of the connection.

    Abstract translation: 提供了一种检测和报告由网关设备使用的较差语音质量的方法。 该方法包括通过网络促进电话和远程电话之间的连接,以及在连接期间检测不良语音质量指示符。 该方法还包括:在预定时间段内,捕获在网关与电话之间交换的电话语音数据,网关和网络之间交换的网络语音数据以及网关参数。 该方法还包括将电话语音数据,网络语音数据和网关参数分组成具有网络存储器的网络地址的多个分组,并且经由网络发送去往网络存储的多个分组。 在一个方面,响应于连接的差的语音质量,可能由电话的用户产生差的语音质量指示符。

    Pitch determination for speech processing
    3.
    发明申请
    Pitch determination for speech processing 审中-公开
    语音处理的音调确定

    公开(公告)号:US20080147384A1

    公开(公告)日:2008-06-19

    申请号:US12069973

    申请日:2008-02-14

    Inventor: Huan-Yu Su Yang Gao

    Abstract: There is provided a method of selecting a pitch lag value for a portion of a speech signal, the method comprising: computing a weighted correlation function of the portion of the speech signal for a range of delay times, wherein the weighting of the correlation function depends on both the delay time and a characteristic of one or more previous portions of the speech signal; and selecting the pitch lag value based on a delay time from the range of delay times that maximizes the weighted correlation function.

    Abstract translation: 提供了一种为语音信号的一部分选择音调滞后值的方法,所述方法包括:在延迟时间范围内计算语音信号部分的加权相关函数,其中相关函数的权重取决于 在延迟时间和语音信号的一个或多个先前部分的特性上; 以及从加权相关函数最大化的延迟时间的范围内,基于延迟时间选择音调滞后值。

    Pitch determination based on weighting of pitch lag candidates
    4.
    发明授权
    Pitch determination based on weighting of pitch lag candidates 有权
    基于音调滞后候选的加权的音调确定

    公开(公告)号:US07266493B2

    公开(公告)日:2007-09-04

    申请号:US11251179

    申请日:2005-10-13

    Inventor: Huan-Yu Su Yang Gao

    Abstract: There is provided a method of selecting a pitch lag value from a plurality of pitch lag candidates for coding a speech signal. The method comprises identifying the plurality of pitch lag candidates from a frame of the speech signal using correlation; classifying the speech signal to obtain a voice classification; determining whether one or more of the plurality of pitch lag candidates are in a temporal neighborhood of one or more previous pitch lag values; favoring the one or more of the plurality of pitch lag candidates determined to be in the temporal neighborhood of the one or more previous pitch lag values, by adaptive weighting, over other ones of the plurality of pitch lag candidates; and selecting the pitch lag value based on the voice classification and the one or more of the plurality of pitch lag candidates favored by the adaptive weighting.

    Abstract translation: 提供了一种从用于编码语音信号的多个音调滞后候选中选择音调滞后值的方法。 该方法包括使用相关性从语音信号的帧中识别多个音调滞后候选; 对语音信号进行分类以获得语音分类; 确定所述多个音调滞后候选中的一个或多个是否在一个或多个先前音调滞后值的时间邻域中; 通过对多个音调滞后候选中的其他音调滞后候选,通过自适应加权来确定被确定为处于一个或多个先前音调滞后值的时间邻域中的多个音调滞后候选中的一个或多个; 以及基于所述语音分类和由所述自适应加权优选的所述多个音调滞后候选中的一个或多个来选择所述音调滞后值。

    Complexity resource manager for multi-channel speech processing
    5.
    发明授权
    Complexity resource manager for multi-channel speech processing 有权
    用于多声道语音处理的复杂性资源管理器

    公开(公告)号:US07080010B2

    公开(公告)日:2006-07-18

    申请号:US10911118

    申请日:2004-08-03

    CPC classification number: G10L15/285

    Abstract: A multi-channel speech processor for encoding speech in a packet network environment is disclosed. In one illustrative aspect, a complexity resource manager (CRM) is executed by a controller or processor. The CRM manages the level of complexity of encoding which is used by a signal processing unit (SPU) to convert the speech signal into packet data. In general, the CRM determines the level of complexity of encoding based on a calculated complexity budget, where the complexity budget is determined based on the time required to process prior speech signal channels and the time available to process the remaining channels. In this way, the CRM is able to control the overall complexity of the speech processor through its ability to signal the SPU to encode speech signal in a complexity reduced mode based on the calculated complexity budget under certain conditions.

    Abstract translation: 公开了一种用于在分组网络环境中编码语音的多声道语音处理器。 在一个说明性方面,复杂性资源管理器(CRM)由控制器或处理器执行。 CRM管理由信号处理单元(SPU)用于将语音信号转换成分组数据的编码的复杂程度。 通常,CRM基于计算的复杂度预算确定编码的复杂程度,其中基于处理先前语音信号信道所需的时间和可用于处理剩余信道的时间来确定复杂度预算。 以这种方式,CRM能够通过其在特定条件下基于计算的复杂度预算在复杂度降低模式下对SPU进行信号编码语音信号的能力来控制语音处理器的总体复杂性。

    Using signal to noise ratio of a speech signal to adjust thresholds for extracting speech parameters for coding the speech signal
    6.
    发明授权
    Using signal to noise ratio of a speech signal to adjust thresholds for extracting speech parameters for coding the speech signal 有权
    使用语音信号的信噪比来调整用于提取用于编码语音信号的语音参数的阈值

    公开(公告)号:US06898566B1

    公开(公告)日:2005-05-24

    申请号:US09640841

    申请日:2000-08-16

    CPC classification number: G10L19/22 G10L19/09 G10L2025/783

    Abstract: There are provided speech coding methods and systems for estimating a plurality of speech parameters of a speech signal for coding the speech signal using one of a plurality of speech coding algorithms, the plurality of speech parameters includes pitch information, the plurality of speech parameters is calculated using a plurality of thresholds. An example method includes estimating a background noise level in the speech signal to determine a signal to noise ratio (SNR) for the speech signal, adjusting one or more of the plurality of thresholds based on the SNR to generate one or more SNR adjusted thresholds, analyzing the speech signal to extract the pitch information using the one or more SNR adjusted thresholds, and repeating the estimating, the adjusting and the analyzing to code the speech signal using one the plurality of speech coding algorithms.

    Abstract translation: 提供了语音编码方法和系统,用于使用多种语音编码算法中的一种来估计用于对语音信号进行编码的语音信号的多个语音参数,所述多个语音参数包括音调信息,所述多个语音参数被计算 使用多个阈值。 示例性方法包括估计语音信号中的背景噪声电平以确定语音信号的信噪比(SNR),基于SNR调整多个阈值中的一个或多个阈值以产生一个或多个SNR调整阈值, 分析语音信号以使用一个或多个SNR调整的阈值提取音调信息,并且使用多个语音编码算法中的一个重复对该语音信号的估计,调整和分析。

    Flexible variable rate vocoder for wireless communication systems
    7.
    发明授权
    Flexible variable rate vocoder for wireless communication systems 有权
    用于无线通信系统的灵活可变速率声码器

    公开(公告)号:US06856954B1

    公开(公告)日:2005-02-15

    申请号:US09627375

    申请日:2000-07-28

    Applicant: Huan-Yu Su

    Inventor: Huan-Yu Su

    CPC classification number: H04L1/0014

    Abstract: A flexible variable rate vocoder and related method of operation. The vocoder selects a target average data rate responsive to at least one network parameter and at least one external parameter.

    Abstract translation: 灵活的可变速率声码器及相关操作方法。 声码器响应于至少一个网络参数和至少一个外部参数来选择目标平均数据速率。

    Intelligent discontinuous transmission and comfort noise generation scheme for pulse code modulation speech coders
    8.
    发明授权
    Intelligent discontinuous transmission and comfort noise generation scheme for pulse code modulation speech coders 有权
    用于脉码调制语音编码器的智能不连续传输和舒适噪声生成方案

    公开(公告)号:US06510409B1

    公开(公告)日:2003-01-21

    申请号:US09484731

    申请日:2000-01-18

    Applicant: Huan-Yu Su

    Inventor: Huan-Yu Su

    CPC classification number: G10L19/012 G10L25/78

    Abstract: A fully backward compatible intelligent discontinued transmission (DTX) and comfort noise generation (CNG) scheme that is operable in pulse code modulation (PCM) speech coding systems. The scheme, for example, provides a speech encoder comprising a speech signal analysis circuitry configured to calculates a predetermined plurality of parameters from the speech signal, a voice activity detector configured to determine voice activity in the speech signal, where the speech encoder enters a discontinued transmission mode of the voice activity detector does not detect voice activity, and a transmitter configured to transmit one or more speech samples of the speech signal after the speech encoder enters the discontinued transmission mode, where the one or more speech samples are capable of use by a remote speech decoder to extract a parameter from the one or more speech samples in order generate a background noise base on the parameter.

    Abstract translation: 完全向后兼容的智能中断传输(DTX)和舒适噪声生成(CNG)方案,其可在脉冲编码调制(PCM)语音编码系统中操作。 该方案例如提供了语音编码器,其包括语音信号分析电路,该语音信号分析电路经配置以从语音信号计算预定的多个参数;语音活动检测器,被配置为确定语音信号中的语音活动,其中语音编码器进入中断 语音活动检测器的传输模式不检测语音活动,并且发送器被配置为在语音编码器进入中断传输模式之后发送语音信号的一个或多个语音样本,其中一个或多个语音样本能够由 远程语音解码器,用于从一个或多个语音样本中提取参数,以便根据该参数产生背景噪声。

    Adaptive tilt compensation for synthesized speech residual
    9.
    发明授权
    Adaptive tilt compensation for synthesized speech residual 有权
    用于合成语音残差的自适应倾斜补偿

    公开(公告)号:US06385573B1

    公开(公告)日:2002-05-07

    申请号:US09156826

    申请日:1998-09-18

    Inventor: Yang Gao Huan-Yu Su

    Abstract: A multi-rate speech codec supports a plurality of encoding bit rate modes by adaptively selecting encoding bit rate modes to match communication channel restrictions. In higher bit rate encoding modes, an accurate representation of speech through CELP (code excited linear prediction) and other associated modeling parameters are generated for higher quality decoding and reproduction. To achieve high quality in lower bit rate encoding modes, the speech encoder departs from the strict waveform matching criteria of regular CELP coders and strives to identify significant perceptual features of the input signal. To support lower bit rate encoding modes, a variety of techniques are applied many of which involve the classification of the input signal. For each bit rate mode selected, pluralities of fixed or innovation subcodebooks are selected for use in generating innovation vectors. At lower encoding bit rates, a decoder utilizes adaptive compensation to attempt to correct for spectral variations in the weighted synthesized residual. Although many approaches are possible, a long asymmetric window is applied to the synthesized residual to generate a reflection coefficient that is smoothed, scaled and used in a first order filter. Because the content of the window varies over time, the coefficient and therefore the filter varies (or adapts) to remove at least a portion of the spectral tilt. As a result, the synthesized speech signal sounds brighter without having introduced significant coding noise.

    Abstract translation: 多速率语音编解码器通过自适应地选择编码比特率模式以匹配通信信道限制来支持多种编码比特率模式。 在较高的比特率编码模式中,通过CELP(码激励线性预测)和其他相关联的建模参数的语音的精确表示被生成用于更高质量的解码和再现。 为了在低比特率编码模式下实现高质量,语音编码器脱离了常规CELP编码器的严格波形匹配标准,并努力识别输入信号的重要感知特征。 为了支持较低比特率编码模式,应用了许多技术,其中许多技术涉及输入信号的分类。 对于所选择的每个比特率模式,选择多个固定或创新子码本来用于产生创新向量。 在较低的编码比特率下,解码器利用自适应补偿来尝试校正加权合成残差中的频谱变化。 虽然许多方法是可能的,但是对合成残差应用长非对称窗口以产生在一阶滤波器中被平滑,缩放和使用的反射系数。 因为窗口的内容随时间而变化,所以系数因此滤波器变化(或适应)以去除光谱倾斜的至少一部分。 结果,合成的语音信号听起来更亮,没有引入显着的编码噪声。

    Comb codebook structure
    10.
    发明授权
    Comb codebook structure 有权
    梳码簿结构

    公开(公告)号:US06330531B1

    公开(公告)日:2001-12-11

    申请号:US09156649

    申请日:1998-09-18

    Applicant: Huan-Yu Su

    Inventor: Huan-Yu Su

    Abstract: A speech encoding comb codebook structure for providing good quality reproduced low bit-rate speech signals in a speech encoding system. The codebook structure requires minimal training, if any, and allows for reduced complexity and memory requirements. The codebook includes a first and at least one additional sub-codebooks, each having a plurality of code-vectors. The codebook may be randomly populated. All even elements may be set to zero in a first codebook, and all odd elements may be set to zero on a second codebook. The resulting comb codebook includes code-vector combination of the code-vectors from the sub-codebooks. In certain embodiments, the code-vectors of the sub-codebooks may contain zero valued elements. In other embodiments where the code-vectors of the sub-codebooks contain only non-zero elements, zero valued elements may be inserted in between the non-zero elements of the sub-codebooks during the forming of the resultant comb codebook. In such an embodiment, the memory requirements would be further reduced in that the zero valued elements need not be stored.

    Abstract translation: 一种用于在语音编码系统中提供良好质量的再现低比特率语音信号的语音编码梳状码本结构。 码本结构需要最少的培训(如果有的话),并允许降低复杂性和内存需求。 码本包括第一和至少一个附加的子码本,每个子码本具有多个码矢量。 码本可以随机填充。 所有偶数元素可以在第一码本中设置为零,并且所有奇数元素可以在第二码本上设置为零。 所得到的梳状码本包括来自子码本的码矢量的码矢量组合。 在某些实施例中,子码本的码矢可包含零值元素。 在其中子代码本的代码矢量仅包含非零元素的其他实施例中,在形成所生成的梳状码本的过程中,可将零值元素插入子码本的非零元素之间。 在这样的实施例中,存储器要求将进一步减少,因为不需要存储零值元素。

Patent Agency Ranking