Speech Magnitude Spectrum Reconstruction from MFCCs Using Deep Neural Network

JIANG Wenbin; LIU Peilin; WEN Fei

doi:10.1049/cje.2017.09.018

Volume 27 Issue 2

Turn off MathJax

Article Contents

Article Navigation > Chinese Journal of Electronics > 2018 > 27(2): 393-398

JIANG Wenbin, LIU Peilin, WEN Fei, “Speech Magnitude Spectrum Reconstruction from MFCCs Using Deep Neural Network,” Chinese Journal of Electronics, vol. 27, no. 2, pp. 393-398, 2018, doi: 10.1049/cje.2017.09.018

Citation:

JIANG Wenbin, LIU Peilin, WEN Fei, “Speech Magnitude Spectrum Reconstruction from MFCCs Using Deep Neural Network,” Chinese Journal of Electronics, vol. 27, no. 2, pp. 393-398, 2018, doi: 10.1049/cje.2017.09.018

Citation:

PDF( 421 KB)

Speech Magnitude Spectrum Reconstruction from MFCCs Using Deep Neural Network

doi: 10.1049/cje.2017.09.018

1.
Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China;
2.
Air Control and Navigation Institution, Air Force Engineering University, Xi'an 710000, China

Funds: This work is supported by the National Natural Science Foundation of China (No.61401501).

Received Date: 2016-04-28
Rev Recd Date: 2017-01-06
Publish Date: 2018-03-10

Abstract

Abstract

This work proposes a Deep neural network (DNN) based method for reconstructing speech magnitude spectrum from Mel-frequency cepstral coefficients (MFCCs). We train a DNN using MFCC vectors as input and the corresponding speech magnitude spectrum as desired output. Exploiting the strong inference power of DNN, the proposed method has the capability to accurately estimate the speech magnitude spectrum even from truncated MFCC vectors. Experiments on TIMIT corpus demonstrate that the proposed method achieves significantly better performance compared with traditional methods.
- Deep neural network (DNN),
- Melfrequency cepstral coefficients (MFCCs),
- Spectrum reconstruction,
- Speech reconstruction

FullText(HTML)

References(25)

References

G. Hinton, L. Deng, D. Yu, et al., "Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups", IEEE Signal Processing Magazine, Vol.29, No.6, pp.82-97, 2012.

J. Xu, J. Pan and Y. Yan, "Agglutinative language speech recognition using automatic allophone deriving", Chinese Journal of Electronics, Vol.25, No.2, pp.328-333, 2016.

R. Togneri and D. Pullella, "An overview of speaker identification:Accuracy and robustness issues", IEEE Circuits & Systems Magazine, Vol.11, No.2, pp.23-61, 2011.

C. Liang, X. Zhang and Y. Yan, "Discriminative decision function based scoring method used in speaker verification", Chinese Journal of Electronics, Vol.21, No.4, pp.692-696, 2012.

T. Ramabadran, A. Sorin, M. McLaughlin, et al., "The ETSI extended distributed speech recognition (DSR) standards:Serverside speech reconstruction", Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Quebec, Canada, pp.129-132, 2004.

ETSI ES 202212:2005, Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Extended Advanced Front-end Feature Extraction Algorithm; Compression Algorithms; Back-end Speech Reconstruction Algorithm.

T. Ramabadran, J. Meunier, M. Jasiuk, et al., "Enhancing distributed speech recognition with back-end speech reconstruction", Proc. Europe Conference on Speech Communication and Technology, Scandinavia, pp.1859-1862, 2001.

B. Milner and X. Shao, "Speech reconstruction from Melfrequency cepstral coefficients using a source-filter model", Proc. Europe Conference on Speech Communication and Technology, Denver, USA, pp.2421-2424, 2002.

X. Milner and X. Shao, "Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end", Speech Communication, Vol.48, No.6, pp.697-715, 2006.

L. E. Boucheron, P. L. De Leon and S. Sandoval, "Low bit-rate speech coding through quantization of Mel-frequency cepstral coefficients", IEEE Transaction on Audio, Speech, and Language Processing, Vol.20, No.2, pp.610-619, 2012.

G. Hinton, S. Osindero and Y. Teh, "A fast learning algorithm for deep belief nets", Neural computation, Vol.18, No.7, pp.1527-1554, 2006.

G. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks", Science, Vol.313, No.5786, pp.504-507, 2006.

Y. Xu, J. Du, L. Dai, et al., "A regression approach to speech enhancement based on deep neural networks", IEEE Transaction on Audio, Speech, and Language Processing, Vol.23, No.1, pp.7-19, 2015.

Y. Bengio, "Learning deep architectures for AI", Foundations and Trends in Machine Learning, Vol.2, No.1, pp.1-127, 2009.

S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences", IEEE Transaction on Audio, Speech, and Language Processing, Vol.28, No.4, pp.357-366, 1980.

G. Hinton, "A practical guide to training restricted Boltzmann machines", Momentum, Vol.9, No.1, pp.3-17, 2010.

D. Rumelhart, G. Hinton and R. Williams, "Learning representations by back-propagating errors", Nature, Vol.323, No.6088, pp.533-538, 1986.

J. S. Garofolo, "Getting started with the DARPA TIMIT CDROM:An acoustic phonetic continuous speech database", National Institute of Standards and Technology, Gaithersburgh, Page 107, 1998.

J. Bergstra, O. Breuleux, F. Bastien, et al., "Theano:A CPU and GPU math compiler in python", Proc. of the Python for Scientific Computing Conference, Austin, TX, USA, pp.3-10, 2010.

P. C. Loizou, Speech Enhancement:Theory and Practice, CRC press, USA, 2013.

D. W. Griffin and J. S. Lim, "Signal estimation from modified short-time Fourier transform", IEEE Transaction on Audio, Speech, and Language Processing, Vol.32, No.2, pp.236-243, 1984.

ITU-T Recommendition P.862:2001, Perceptual Evaluation of Speech Quality (PESQ):An Objective Method for End-to-end Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs.

D. Wang and X.W. Zhang, "Thchs-30:A free chinese speech corpus", http://arxiv.org/abs/1512.01882,2015-10-7.

W.B. Jiang, R.D. Ying and P.L. Liu, "Speech reconstruction for mfcc-based low bit-rate speech coding", Porc. of IEEE International Conference on Multimedia and Expo Workshops, Chengdu, China, pp.1-6, 2014.

W.B. Jiang, P.L. Liu, and F. Wen, "An improved vector quantization method using deep neural network", AEU-International Journal of Electronics and Communications, Vol.72, No.1, pp.178-183, 2017.

Relative Articles

Supplements(0)

Cited By

Proportional views