联系QQ:1475113646

配资头条_[转载]【信息技术】【2018

分类:配资门户 热度:

配资头条_[转载]【信息技术】【2018.02】稳

本文为英国谢菲尔德大学(作者:Erfan Loweimi)的博士论文,共304页。

傅立叶分析在语音信号处理中起着关键作用。作为一个复数,它可以用幅度谱和相位谱以极性形式表示。幅度谱在语音处理的各个方面都有着广泛应用。然而,相位谱并不是语音信号处理的一个很有吸引力的起点。相对于精细和粗糙结构与语音感知有明显关系的幅度谱而言,相位谱难以解释和处理。事实上,没有一个有意义的趋势或极值可以促进建模过程。尽管如此,语音相位谱最近再次受到关注。大量工作表明,它可以有效地应用于多种语音处理中。现在基于相位的语音处理潜力已经确定,因此需要一个基本模型来帮助理解相位编码语音信息的方式。

本文提出了一种新的相位域声源滤波模型,该模型允许通过相位处理对语音声道(滤波器)和激励(源)分量进行反褶积。该模型利用Hilbert变换,显示了激励和声道元素在相位域中的混合,并提供了通过相位操作有效分离源和滤波器成分的框架。为了研究该方法的有效性,从用于自动语音识别(ASR)的相位滤波器部分提取一组特征,并利用相位的源部分进行基频估计。对两种情况下的精度和鲁棒性进行了说明和讨论。此外,在Hilbert变换中用广义对数函数代替对数函数,并通过回归滤波器计算群时延,从而进一步改进了该方法。

研究了特征提取过程中相位谱的统计分布及其表示方法。结果表明,相位谱呈钟形分布。一些统计规范化方法,如均值-方差规范化、拉普拉斯化、高斯化和直方图均衡化,成功地应用于基于相位的特征,并导致了显著的鲁棒性改进。

通过使用统计正规化和广义对数函数实现的鲁棒性增益鼓励使用更先进的基于模型的统计技术,如向量泰勒级数(VTSVTS在其原始公式中假设使用log函数进行压缩。为了同时利用VTS和广义对数函数,首先提出了一个新的公式,将两者合并为一个统一的框架,称为广义VTSgVTS)。为了充分利用gVTS框架,提出了一种新的信道噪声估计方法,然后研究了gVTS框架的扩展和信道估计用于群延迟域的方法。文中对所提出的问题进行了分析和讨论,提出了一些解决办法,并导出了相应的计算公式。此外,还研究了相位延迟域和群延迟域中的加性噪声和信道失真影响,并将结果用于推导gVTS方程。HMM/GMM中的Aurora-4 ASR任务和基于DNN的瓶颈系统在clean和多样式训练模式下的实验结果证实了该方法在处理加性噪声和信道噪声方面的有效性。

The Fourier analysis plays a key role in speech signal processing. As a complex quantity, it can be expressed in the polar form using the magnitude and phase spectra. The magnitude spectrum is widely used in almost every corner of speech processing. However, the phase spectrum is not an obviously appealing start point for processing the speech signal. In contrast to the magnitude spectrum whose fine and coarse structures have a clear relation to speech perception, the phase spectrum is difficult to interpret and manipulate. In fact, there is not a meaningful trend or extrema which may facilitate the modelling process. Nonetheless, the speech phase spectrum has recently gained renewed attention. An expanding body of work is showing that it can be usefully employed in a multitude of speech processing applications.Now that the potential for the phase-based speech processing has been established, there is a need for a fundamental model to help understand the way in which phase encodes speech information.In this thesis a novel phase-domain source-flter model is proposed that allows for deconvolution of the speech vocal tract (flter) and excitation (source) components through phase processing. This model utilises the Hilbert transform, shows how the excitation and vocal tract elements mix in the phase domain and provides a framework for efficiently segregating the source and filter components through phase manipulation. To investigate the efficacy of the suggested approach, a set of features is extracted from the phase filter part for automatic speech recognition (ASR) and the source part of the phase is utilised for fundamental frequency estimation. Accuracy and robustness in both cases are illustrated and discussed. In addition, the proposed approach is improved by replacing the log with the generalised logarithmic function in the Hilbert transform and also by computing the group delay via regression filter.Furthermore, statistical distribution of the phase spectrum and its representations along the feature extraction pipeline are studied. It is illustrated that the phase spectrum has a bell-shaped distribution. Some statistical normalisation methods such as mean-variance normalisation, Laplacianisation, Gaussianisation and Histogram equalisation are successfully applied to the phase-based features and lead to a significant robustness improvement.

The robustness gain achieved through using statistical normalisation and generalized logarithmic function encouraged the use of more advanced model-based statistical techniques such as vector Taylor Series (VTS). VTS in its original formulation assumes usage of the log function for compression. In order to simultaneously take advantage of the VTS and generalised logarithmic function, a new formulation is first developed to merge both into a unified framework called generalised VTS (gVTS). Also in order to leverage the gVTS framework, a novel channel noise estimation method is developed. The extensions of the gVTS framework and the proposed channel estimation to the group delay domain are then explored. The problems it presents are analysed and discussed, some solutions are proposed and fnally the corresponding formulae are derived. Moreover, the effect of additive noise and channel distortion in the phase and group delay domains are scrutinised and the results are utilised in deriving the gVTS equations. Experimental results in the Aurora-4 ASR task in an HMM/GMM set up along with a DNN-based bottleneck system in the clean and multi-style training modes confirmed the efficacy of the proposed approach in dealing with both additive and channel noise.

 

1. 引言2. 背景与相关工作3. 相位信息4. 相位域的源-滤波器分离5. 用于鲁棒ASR的相位/群时延域的广义VTS6. 结论与未来工作展望附录希尔伯特变换附录用于鲁棒ASR的广义向量泰勒级数(gVTS)方法附录基于广义向量泰勒级数的信道噪声估计附录用于ASR的深度神经网络附录使用的数据库描述附录特征提取技术回顾


更多精彩文章请关注公众号:

qrcode_for_gh_60b944f6c215_258.jpg

紧急提醒及公告:我站信息最终版权归小飞人所有,例如转发有关本网站资讯“配资头条_[转载]【信息技术】【2018”等有关内容仅作为交流分享信息之目的,如责编信息标记有误,请第一时间联系主要管理员修改或删除,多谢朋友们共同关注吧。

上一篇:期货配资排名_[转载]海洋地标 Atla 下一篇:股票配资114_侠客行︱心平气和看世界之
猜你喜欢
各种观点
热门排行
精彩图文