OU Shifeng, SONG Peng, GAO Ying. Laplacian Speech Model and Soft Decision Based MMSE Estimator for Noise Power Spectral Density in Speech Enhancement[J]. Chinese Journal of Electronics, 2018, 27(6): 1214-1220. doi: 10.1049/cje.2018.09.009
Citation: OU Shifeng, SONG Peng, GAO Ying. Laplacian Speech Model and Soft Decision Based MMSE Estimator for Noise Power Spectral Density in Speech Enhancement[J]. Chinese Journal of Electronics, 2018, 27(6): 1214-1220. doi: 10.1049/cje.2018.09.009

Laplacian Speech Model and Soft Decision Based MMSE Estimator for Noise Power Spectral Density in Speech Enhancement

doi: 10.1049/cje.2018.09.009
Funds:  This work is supported by the National Natural Science Foundation of China (No.61703360, No.61005021, No.61201457) and the Natural Science Foundation of Shandong Province (No.ZR2017MF008, No.ZR2017MF019).
More Information
  • Corresponding author: GAO Ying (corresponding author) was born in Liaoning Province, China, in 1978. She received the Ph.D. degree in geodetection and information technology from Jilin University, China in 2008. Currently, she is an associate professor at Yantai University, China. Her research interest is signal and information processing. (Email:claragaoying@126.com)
  • Received Date: 2016-01-13
  • Rev Recd Date: 2016-06-09
  • Publish Date: 2018-11-10
  • The estimation of noise Power spectral density (PSD) is a very crucial issue for speech enhancement as a result of its significant effect on the quality and intelligibility of the enhanced speech. Most of the existing estimators for noise PSD try to employ Gaussian speech priors, which, however, have been proven inconsistent with the reality. We derived an effective solution to this problem of estimating noise PSD in the Minimum mean square error (MMSE) sense when the speech component is modeled by a Laplacian distribution. Meanwhile, the soft decision technique instead of the hard Voice activity detection (VAD) is evolved into our algorithm, which can automatically makes the estimation unbiased without requiring a bias compensation. The performance of the proposed method is tested by several objective and subjective measures under various stationary and nonstationary noise environments. The results confirm that our method achieves good performance for all the noise conditions and Signalnoise-ratio (SNR) settings.
  • loading
  • C. Zhang, G.S. Morrison, E. Enzinger, et al., “Effects of telephone transmission on the performance of formanttrajectorbased forensic voice comparison—Female voices”, Speech Communication, Vol.55, No.6, pp.796-813, 2013.
    K. Li, Q. Fu and Y. Yan, “Speech enhancement using robust generalized sidelobe canceller with multi-channel postfiltering in adverse environments”, Chinese Journal of Electronics, Vol.21, No.1, pp.85-90, 2012.
    N. Yousefian, P.C. Loizou and J.H.L. Hansen, “A coherencebased noise reduction algorithm for binaural hearing aids”, Speech Communication, Vol.58, No.1, pp.101-110, 2014.
    P.C. Loizou, Speech Enhancement: Theory and Practice (The 2nd ed). New York: CRC Press, 2013.
    S.F. Boll, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Transactions on Acoustics, Speech, Signal Processing, Vol.27, No.2, pp.113-120, 1979.
    I.Y. Soon and S.N. Koh, “Speech enhancement using 2-D Fourier transform”, IEEE Transactions on Speech and Audio Processing, Vol.11, No.6, pp.717-724, 2003.
    H. Veisi and H. Sameti, “Speech enhancement using hidden Markov models in Mel-frequency domain”, Speech Communication, Vol.55, No.2, pp.205-220, 2013.
    A. Saadoune, et al., “Perceptual subspace speech enhancement using variance of the reconstruction error”, Digital Signal Processing, Vol.24, No.1, pp.187-196, 2014.
    X. Hu, S. Wang, et al., “A cepstrum-based preprocessing and postprocessing for speech enhancement in adverse environments”, Applied Acoustics, Vol.74, No.12, pp.1458-1462, 2013.
    J. Chang, N.S. Kim and S.K. Mitra, “Voice activity detection based on multiple statistical models ”, IEEE Transactions on Signal Processing, Vol.54, No.6, pp.1965-1976, 2006.
    M.W. Mak and H.B. Yu, “A study of voice activity detection techniques for NIST speaker recognition evaluations”, Computer Speech and Language, Vol.28, No.1, pp.295-313, 2014.
    R. Martin, “Noise power spectral density estimation based on optimal smoothing and minimum statistics”, IEEE Transactions on Speech and Audio Processing, Vol.9, No.5, pp.504-512, 2001.
    I. Cohen and B. Berdugo, “Noise estimation by minima controlled recursive averaging for robust speech enhancement”, IEEE Signal Processing Letters, Vol.9, No.1, pp.12-15, 2002.
    I. Cohen, “Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging”, IEEE Transactions on Speech and Audio Processing, Vol.1, No.5, pp.466-475, 2003.
    J.M. Kum, Y.S. Park and J.H. Chang, “Speech enhancement based on minima controlled recursive averaging incorporating conditional maximum a posteriori criterion”, IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, China, pp.4417-4420, 2009.
    N. Fan, J. Rosca and R. Bala, “Speech noise estimation using enhanced minima controlled recursive averaging”, IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, Hawaii, USA, pp.581-584, 2007.
    T. Gerkmann and R.C. Hendriks, “Unbiased MMSE-based noise power estimation with low complexity and low tracking delay”, IEEE Transactions on Audio, Speech, and Language Processing, Vol.20, No.4, pp.1383-1393, 2012.
    R. Yu, “A low-complexity noise estimation algorithm based on smoothing of noise power estimation and estimation bias correction”, IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, China, pp.4421-4424, 2009.
    R.C. Hendriks, R. Heusdens and J. Jensen, “MMSE based noise PSD tracking with low complexity ”, IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, Texas, USA, pp.4266-4269, 2010.
    Y.S. Park and J.H. Chang, “A probabilistic combination method of minimum statistics and soft decision for robust noise power estimation in speech enhancement”, IEEE Signal Processing Letters, Vol.15, No.1, pp.95-98, 2008.
    R.C. Hendriks, et al., “Noise tracking using DFT domain subspace decompositions”, IEEE Transactions on Audio, Speech, and Language Processing, Vol.16, No.3, pp.541-553, 2008.
    J. Taghia, J. Taghia, N. Mohammadiha, et al., “An evaluation of noise power spectral density estimation algorithms in adverse acoustic environments”, IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, pp.4640-4643, 2011.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (158) PDF downloads(193) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return