CHENG Gaofeng, LI Xin, YAN Yonghong. Using Highway Connections to Enable Deep Small-footprint LSTM-RNNs for Speech Recognition[J]. Chinese Journal of Electronics, 2019, 28(1): 107-112. doi: 10.1049/cje.2018.11.008
Citation: CHENG Gaofeng, LI Xin, YAN Yonghong. Using Highway Connections to Enable Deep Small-footprint LSTM-RNNs for Speech Recognition[J]. Chinese Journal of Electronics, 2019, 28(1): 107-112. doi: 10.1049/cje.2018.11.008

Using Highway Connections to Enable Deep Small-footprint LSTM-RNNs for Speech Recognition

doi: 10.1049/cje.2018.11.008
Funds:  This work is supported by the National Key Research and Development Program (No.2016YFB0801203, No.2016YFB0801200), the National Natural Science Foundation of China (No.11590774, No.11590770).
  • Received Date: 2016-11-11
  • Rev Recd Date: 2018-08-02
  • Publish Date: 2019-01-10
  • Long short-term memory RNNs (LSTMRNNs) have shown great success in the Automatic speech recognition (ASR) field and have become the state-ofthe-art acoustic model for time-sequence modeling tasks. However, it is still difficult to train deep LSTM-RNNs while keeping the parameter number small. We use the highway connections between memory cells in adjacent layers to train a small-footprint highway LSTM-RNNs (HLSTM-RNNs), which are deeper and thinner compared to conventional LSTM-RNNs. The experiments on the Switchboard (SWBD) indicate that we can train thinner and deeper HLSTM-RNNs with a smaller parameter number than the conventional 3-layer LSTM-RNNs and a lower Word error rate (WER) than the conventional one. Compared with the counterparts of small-footprint LSTMRNNs, the small-footprint HLSTM-RNNs show greater reduction in WER.
  • loading
  • G. Hinton, L. Deng, D. Yu, et al., "Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups" Signal Processing Magazine, IEEE, Vol.29, No.6, pp.82-97, 2012.
    H. A. Bourlard and N. Morgan, "Connectionist speech recognition:A hybrid approach", Springer Science and Business Media, 2012.
    G. E. Dahl, D. Yu, L. Deng, et al, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition", IEEE Transactions on Audio, Speech and Language Processing, Vol.20, No.1, pp.30-42, 2012.
    F. Seide, G. Li, and D. Yu, "Conversational speech transcription using context-dependent deep neural networks" Proc. Annual Conference of International Speech Communication Association (Interspeech), pp.437-440, 2011.
    P. Swietojanski, A. Ghoshal, and S. Renals, "Convolutional neural networks for distant speech recognition," Signal Processing Letters, IEEE, Vol.21, No.9, pp.1120-1124, 2014.
    J. Xu, J. Pan, and Y. Yan, "Agglutinative language speech recognition using automatic allophone deriving", Chinese Journal of Electronics, Vol.25, No.2, pp.328-333, 2016.
    W. Jiang, P. Liu, and F. Wen, "Speech magnitude spectrum reconstruction from MFCCs using deep neural network", Chinese Journal of Electronics, Vol.27, No.2, pp.393-398, 2018.
    H. Zhang, Q. Fu, and Y. Yan, "Speech Enhancement Using Compact Microphone Array and Applications in Distant Speech Acquisition," Chinese Journal of Electronics, Vol.18, No.3, pp.481-486, 2009.
    Y. Xie, J. Huang, and Y. He, "One Dictionary vs. Two Dictionaries in Sparse Coding Based Denoising", Chinese Journal of Electronics, Vol.26, No.2, pp.367-371, 2017.
    A. Graves, A. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013.
    H. Zen, and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for lowlatency speech synthesis," Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
    H. Sak, A. Senior, and F. Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," Annual Conference of the International Speech Communication Association (Interspeech), 2014.
    Y. Zhang, G. Chen, D. Yu, et al., "Highway long shortterm memory RNNs for distant speech recognition," Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
    Y. Bengio, P. Simard, P. Frasconi,"Learning long-term dependencies with gradient descent is difficult", IEEE Transactions on Neural Networks, Vol.5, No.2, pp.157-166, 1994.
    L. LU, S. Renals,"Small-footprint deep neural networks with highway connections for speech recognition", IEEE Transactions on Audio, Speech and Lan-guage Processing, Vol.25, No.7, pp.1502-1511, 2017.
    S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, Vol.9, No.8, pp.17351438, 1997.
    H. Sak, A. Senior, and F. Beaufays, "Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition," Feb. 2014. Available:http://arxiv.org/abs/1402.1128.
    C.Y. Lee, S. Xie, P. Gallagher, et al., "Deeply-supervised nets," Artificial Intelligence and Statistics, 2015.
    Y. Bengio, P. Lamblin, D. Popovici, et al., "Greedy layer-wise training of deep networks," Proc. NIPS, 2007, Vol.19, pp.153.
    G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, Vol.313, No.5786, pp.504-507, 2006.
    R. K. Srivastava, K. Greff, and J. Schmidhuber, "Training very deep networks," Proc. NIPS, 2015.
    D. Povey, V. Peddinti, D. Galvez, et al., "Purely sequencetrained neural networks for ASR based on lattice-free MMI", Annual Conference of International Speech Communication Association (Interspeech), 2016.
    K. Vesely, A. Ghoshal, L. Burget, et al., "Sequencediscriminative training of deep neural networks." Annual Conference of International Speech Communication Association (Interspeech), pp.2345-2349, 2013.
    G. Saon, H. Soltau, D. Nahamoo, et al., "Speaker adaption of neural network acoustic models using i-vectors." Proc. IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp.55-59, 2013.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (164) PDF downloads(312) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return