LI Jie, YOU Datao. Enhanced Speech Based Jointly Statistical Probability Distribution Function for Voice Activity Detection[J]. Chinese Journal of Electronics, 2017, 26(2): 325-330. doi: 10.1049/cje.2017.01.001
Enhanced Speech Based Jointly Statistical Probability Distribution Function for Voice Activity Detection

doi: 10.1049/cje.2017.01.001
Funds:  This work is supported by the National Natural Science Foundation of China (No.F020809), and Key Scientific Research Project in University of Henen Province Subsidy Scheme (No.16A520003).
  • Corresponding author: YOU Datao (corresponding author) was born in 1981. He received the B.E. degree in network engineering from the PLA Information Engineering University in 2005, and the M.S. degree in computer software and theory from Zhengzhou University in 2008, and the Ph.D. degree in Computer Science from Harbin Institute of Technology in 2013. His is now a lecturer of Henan University. His research interests mainly include Speech Signal Processing and Machine Learning. (
  • Received Date: 2014-08-08
  • Rev Recd Date: 2015-03-18
  • Publish Date: 2017-03-10
  • Most of Voice activity detection (VAD) methods are based on statistical model. In these methods, the noise signal is always assumed to satisfy and characterized by Gaussian distribution, while the assumption of noise does not always hold in practice and which causes that these kinds of method fail to distinguish speech from noise at low Signal-noise-ratio (SNR) level in non-stationary noise condition. For going further to improve the robustness of VAD, a enhanced speech based method is proposed. In the proposed method, the Laplacian distribution is used to model the remained noise since we find that the remained noise in enhanced speech satisfy Laplacian distribution; in addition, Gaussian mixture model is used to characterize the Discrete Fourier transform (DFT) coefficients of reconstructed speech in enhanced speech. Experimental results show that the proposed method performs better than the baseline method, especially in low SNR and non-stationary noise conditions.
