Citation: | JIANG Wenbin, LIU Peilin, WEN Fei, “Speech Magnitude Spectrum Reconstruction from MFCCs Using Deep Neural Network,” Chinese Journal of Electronics, vol. 27, no. 2, pp. 393-398, 2018, doi: 10.1049/cje.2017.09.018 |
G. Hinton, L. Deng, D. Yu, et al., "Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups", IEEE Signal Processing Magazine, Vol.29, No.6, pp.82-97, 2012.
|
J. Xu, J. Pan and Y. Yan, "Agglutinative language speech recognition using automatic allophone deriving", Chinese Journal of Electronics, Vol.25, No.2, pp.328-333, 2016.
|
R. Togneri and D. Pullella, "An overview of speaker identification:Accuracy and robustness issues", IEEE Circuits & Systems Magazine, Vol.11, No.2, pp.23-61, 2011.
|
C. Liang, X. Zhang and Y. Yan, "Discriminative decision function based scoring method used in speaker verification", Chinese Journal of Electronics, Vol.21, No.4, pp.692-696, 2012.
|
T. Ramabadran, A. Sorin, M. McLaughlin, et al., "The ETSI extended distributed speech recognition (DSR) standards:Serverside speech reconstruction", Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Quebec, Canada, pp.129-132, 2004.
|
ETSI ES 202212:2005, Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Extended Advanced Front-end Feature Extraction Algorithm; Compression Algorithms; Back-end Speech Reconstruction Algorithm.
|
T. Ramabadran, J. Meunier, M. Jasiuk, et al., "Enhancing distributed speech recognition with back-end speech reconstruction", Proc. Europe Conference on Speech Communication and Technology, Scandinavia, pp.1859-1862, 2001.
|
B. Milner and X. Shao, "Speech reconstruction from Melfrequency cepstral coefficients using a source-filter model", Proc. Europe Conference on Speech Communication and Technology, Denver, USA, pp.2421-2424, 2002.
|
X. Milner and X. Shao, "Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end", Speech Communication, Vol.48, No.6, pp.697-715, 2006.
|
L. E. Boucheron, P. L. De Leon and S. Sandoval, "Low bit-rate speech coding through quantization of Mel-frequency cepstral coefficients", IEEE Transaction on Audio, Speech, and Language Processing, Vol.20, No.2, pp.610-619, 2012.
|
G. Hinton, S. Osindero and Y. Teh, "A fast learning algorithm for deep belief nets", Neural computation, Vol.18, No.7, pp.1527-1554, 2006.
|
G. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks", Science, Vol.313, No.5786, pp.504-507, 2006.
|
Y. Xu, J. Du, L. Dai, et al., "A regression approach to speech enhancement based on deep neural networks", IEEE Transaction on Audio, Speech, and Language Processing, Vol.23, No.1, pp.7-19, 2015.
|
Y. Bengio, "Learning deep architectures for AI", Foundations and Trends in Machine Learning, Vol.2, No.1, pp.1-127, 2009.
|
S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences", IEEE Transaction on Audio, Speech, and Language Processing, Vol.28, No.4, pp.357-366, 1980.
|
G. Hinton, "A practical guide to training restricted Boltzmann machines", Momentum, Vol.9, No.1, pp.3-17, 2010.
|
D. Rumelhart, G. Hinton and R. Williams, "Learning representations by back-propagating errors", Nature, Vol.323, No.6088, pp.533-538, 1986.
|
J. S. Garofolo, "Getting started with the DARPA TIMIT CDROM:An acoustic phonetic continuous speech database", National Institute of Standards and Technology, Gaithersburgh, Page 107, 1998.
|
J. Bergstra, O. Breuleux, F. Bastien, et al., "Theano:A CPU and GPU math compiler in python", Proc. of the Python for Scientific Computing Conference, Austin, TX, USA, pp.3-10, 2010.
|
P. C. Loizou, Speech Enhancement:Theory and Practice, CRC press, USA, 2013.
|
D. W. Griffin and J. S. Lim, "Signal estimation from modified short-time Fourier transform", IEEE Transaction on Audio, Speech, and Language Processing, Vol.32, No.2, pp.236-243, 1984.
|
ITU-T Recommendition P.862:2001, Perceptual Evaluation of Speech Quality (PESQ):An Objective Method for End-to-end Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs.
|
D. Wang and X.W. Zhang, "Thchs-30:A free chinese speech corpus", http://arxiv.org/abs/1512.01882,2015-10-7.
|
W.B. Jiang, R.D. Ying and P.L. Liu, "Speech reconstruction for mfcc-based low bit-rate speech coding", Porc. of IEEE International Conference on Multimedia and Expo Workshops, Chengdu, China, pp.1-6, 2014.
|
W.B. Jiang, P.L. Liu, and F. Wen, "An improved vector quantization method using deep neural network", AEU-International Journal of Electronics and Communications, Vol.72, No.1, pp.178-183, 2017.
|