Enhanced Speech Based Jointly Statistical Probability Distribution Function for Voice Activity Detection

LI Jie; YOU Datao

doi:10.1049/cje.2017.01.001

Volume 26 Issue 2

Turn off MathJax

Article Contents

Article Navigation > Chinese Journal of Electronics > 2017 > 26(2): 325-330

LI Jie and YOU Datao, “Enhanced Speech Based Jointly Statistical Probability Distribution Function for Voice Activity Detection,” Chinese Journal of Electronics, vol. 26, no. 2, pp. 325-330, 2017, doi: 10.1049/cje.2017.01.001

Citation:

LI Jie and YOU Datao, “Enhanced Speech Based Jointly Statistical Probability Distribution Function for Voice Activity Detection,” Chinese Journal of Electronics, vol. 26, no. 2, pp. 325-330, 2017, doi: 10.1049/cje.2017.01.001

Citation:

PDF( 520 KB)

Enhanced Speech Based Jointly Statistical Probability Distribution Function for Voice Activity Detection

doi: 10.1049/cje.2017.01.001

LI Jie,
YOU Datao

College of Software Engineering, Henan University, Kaifeng 475001, China

Funds: This work is supported by the National Natural Science Foundation of China (No.F020809), and Key Scientific Research Project in University of Henen Province Subsidy Scheme (No.16A520003).

More Information

Corresponding author: YOU Datao (corresponding author) was born in 1981. He received the B.E. degree in network engineering from the PLA Information Engineering University in 2005, and the M.S. degree in computer software and theory from Zhengzhou University in 2008, and the Ph.D. degree in Computer Science from Harbin Institute of Technology in 2013. His is now a lecturer of Henan University. His research interests mainly include Speech Signal Processing and Machine Learning. (Email:youdatao@163.com)
Received Date: 2014-08-08
Rev Recd Date: 2015-03-18
Publish Date: 2017-03-10

Abstract

Abstract

Most of Voice activity detection (VAD) methods are based on statistical model. In these methods, the noise signal is always assumed to satisfy and characterized by Gaussian distribution, while the assumption of noise does not always hold in practice and which causes that these kinds of method fail to distinguish speech from noise at low Signal-noise-ratio (SNR) level in non-stationary noise condition. For going further to improve the robustness of VAD, a enhanced speech based method is proposed. In the proposed method, the Laplacian distribution is used to model the remained noise since we find that the remained noise in enhanced speech satisfy Laplacian distribution; in addition, Gaussian mixture model is used to characterize the Discrete Fourier transform (DFT) coefficients of reconstructed speech in enhanced speech. Experimental results show that the proposed method performs better than the baseline method, especially in low SNR and non-stationary noise conditions.
- Statistical probability distribution,
- Voice activity detection (VAD),
- Reconstructed speech

FullText(HTML)

References(26)

References

D.K. Freeman, G. Cosier, C.B. Southcott and I. Boyd, "Avoice activity detector for pan-European digital cellular mobile telephone service", CDSCMR IEEE Colloquium on Digitized Speech Communication via Mobile Radio, Glasgow, U.K., Vol.1, No.1, pp.369-372, 1989.

D. Vlaj, B. Kotnik, B. Horvat and Z. Kacic, "A computationally efficient mel-flter bank VAD algorithm for distributed speech recognition systems", EURASIP J. Appl. Signal Process., No.4, pp.487-497, 2005.

E. Dong, H. Zhao and Y. Li, "Lowbit and variable rate speech coding using local cosine transform", TENCON'02, Vol.1, No.1, pp.423-426, 2002.

SH. Chen, HT. Wu, YK. Chang, et al., "Robust voice activity detection using perceptual wavelet-packet transform and Teagerenergy operator", PRL, Vol.28, No.11, pp.1327-1332, 2007.

M. Fujimoto, K. Ishizuka and T. Nakatani, "A voice activity detection based on the adaptive integration of multiple speech features and a signal decision scheme", ICASSP, pp.4441-4444, 2008.

K. Li, M.N.S. Swamy and M.O. Ahmad, "An improved voice activity detection using higher order statistics", TSAP, Vol.13, No.5, pp.956-974, 2005.

S.A. Soleimani and S.M. Ahadi, "Voice activity detection based on combination of multiple features using linear/kernel discriminant analyses", ICTTA, pp.1-5, 2008.

R. Padmanabhan, P.S.H. Krishnan and H.A. Murthy, "A pattern recognition approach to VAD using modified group delay", National Conference on Communications, IIT Bombay, pp.432-436, 2008.

J.A. Haigh and J.S. Mason, "Robust voice activity detection using cepstral feature", TENCON'93, pp.321-324, 1993.

R. Prasad, H. Saruwatari and K. Shikano, "Noise estimation using negentropy based voice-activity detector" The 200447th Midwest Symposium on Circuits Systems, Vol.3, pp.149-152, 2004.

A. Craciun and M. Gabrea, "Correlation coefficient-based voice activity detector algorithm", Canadian Conference on Electrical and Computer Engineering, Vol.3, pp.1789-1792, 2004.

J. Ramirez, J.C. Segura, C. Benitez, A. Torre and A. Rubio, "Efficient voice activity detection algorithms using long-term speech information", Speech Communication, No.42, pp.271-287, 2004.

P.K. Ghosh, A. Tsiartas and S. Narayanan, "Robust voice activity detection using long-term signal variability", TASLP, Vol.19, No.3, pp.600-613, 2011.

C. Breithaupt, T. Gerkmann and R. Martin, "A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing", ICASSP, pp.4897-4900, 2008.

J. Sohn, N.S. Kim and W.A. Sung, "A statistical model-based voice activity detection", SPL, Vol.6, No.1, pp.1-3, 1999.

J. Ramirez, J.C. Segura, C. Benitez, L. Garcia and A. Rubio, "Statistical voice detection using a multiple observation likelihood ratio test", SPL, Vol.12, No.10, pp.689-692, 2005.

J.H. Chang and N.S. Kim, "Voice activity detection based on complex Laplacianmodel", Electronics Letter, Vol.39, No.7, pp.632-634, 2003.

F.G. Jort, H. Antti, V. Tuomas and S. Yang, "Toward a practical implementation of exemplar-based noise robust ASR", ESPC, pp.1490-1494, 2011.

D. You, J. Han, G. Zheng and T. Zheng,"Sparse power spectrum based robust voice activity detector", ICASSP, pp.289-292, 2012.

F. Amin and C. Shantanu, "Sparse auditory reproducing kernel (SPARK) features for noise-robust speech recognition", TASLP, Vol.20, No.4, pp.1362-1371, 2012.

M. Julien, B. Francis, P. Jean and S. Guillermo, "Online dictionary learning for sparse coding", ICML, pp.689-696, 2009.

S. Mallat, A Wavelet Tour of Signal Processing, the Sparse Way, Academic Press, 2009.

D. You, J. Han, G. Zheng, T. Zheng and J. Li, "Sparse representation with optimized learned dictionary for robust voice activity detection", CSSP, DOI:10.1007/s00034-014-9748-y, 2014.

G. Ding, X. Wang, Y. Cao, F. Ding and Y. Tang, "Speech enhancement based on speech spectral complex gaussian mixture model", ICASSP, Vol.1, pp.165-168, 2005.

J. Bilmes, "A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models", ICSI, No.4, 1998.

Y. Ephraim and D. Malah, "Speech enhancement using a minimummean square error short-time spectral amplitude estimator", TASLP, Vol.32, No.6, pp.1109-1121, 1984.

Relative Articles

Supplements(0)

Cited By

Proportional views