ZHANG Yike, ZHANG Pengyuan, YAN Yonghong. Language Model Score Regularization for Speech Recognition[J]. Chinese Journal of Electronics, 2019, 28(3): 604-609. doi: 10.1049/cje.2019.03.015
Citation: ZHANG Yike, ZHANG Pengyuan, YAN Yonghong. Language Model Score Regularization for Speech Recognition[J]. Chinese Journal of Electronics, 2019, 28(3): 604-609. doi: 10.1049/cje.2019.03.015

Language Model Score Regularization for Speech Recognition

doi: 10.1049/cje.2019.03.015
Funds:  This work is supported by the National Natural Science Foundation of China (No.U1536117, No.11590770-4), the National Key Research and Development Plan (No.2016YFB0801203, No.2016YFB0801200), the Key Science and Technology Project of the Xinjiang Uygur Autonomous Region (No.2016A03007-1), and the Pre-research Project for Equipment of General Information System (No.JZX2017-0994/Y306).
More Information
  • Corresponding author: ZHANG Pengyuan (corresponding author) was born in 1978. He received the Ph.D. degree in information and signal processing from Institute of Acoustics, Chinese Academy of Sciences University, China, in 2007. He is now a researcher at the Key Laboratory of Speech Acoustics and Content Understanding, Chinese Academy of Sciences. His research interests include spontaneous speech recognition. (Email:zhangpengyuan@hccl.ioa.ac.cn)
  • Received Date: 2017-06-21
  • Publish Date: 2019-05-10
  • Inspired by the fact that back-off and interpolated smoothing algorithms have significant effect on statistical language modeling, this paper proposes a sentence-level Language model (LM) score regularization algorithm to improve the fault-tolerance of LMs for recognition errors. The proposed algorithm is applicable to both count-based LMs and neural network LMs. Instead of predicting the occurrence of a sequence of words under a fixed order Markov assumption, we use a composite model consisting of different order models with either n-gram or skip-gram features to estimate the probability of the sequence of words. In order to simplify implementations, we derive a connection between bidirectional neural networks and the proposed algorithm. Experiments were carried out on the Switchboard corpus. Results on N-best lists re-scoring show that the proposed algorithm achieves consistent word error rate reduction when it is applied to count-based LMs, Feedforward neural network (FNN) LMs, and Recurrent neural network (RNN) LMs.
  • loading
  • J. Xu, J. Pan and Y. Yan, "Agglutinative language speech recognition using automatic allophone deriving", Chinese Journal of Electronics, Vol.25, No.2, pp.328-333, 2016.
    J. Su, Z. Wang, Q. Wu, et al., " A topic-triggered translation model for statistical machine translation", Chinese Journal of Electronics, Vol.26, No.1, pp.65-72, 2017.
    P. Li, L. Peng and J. Wen, "Rejecting character recognition errors using CNN based confidence estimation", Chinese Journal of Electronics, Vol.25, No.3, pp.520-526, 2016.
    Z. Yang, F. Yao, K. Fan, et al., "Text dimensionality reduction with mutual information preserving mapping", Chinese Journal of Electronics, Vol.26, No.5, pp.919-925, 2017.
    F. Jelinek, "Interpolated estimation of markov source parameters from sparse data", Proc. of the workshop on Pattern Recognition in Practice, Amsterdam, North-Holland, Netherlands, pp.381-397, 1980.
    S.M. Katz, "Estimation of probabilities from sparse data for the language model component of a speech recognizer", IEEE Transactions on Acoustics Speech and Signal Processing, Vol.35, No.3, pp.400-401, 1987.
    S.F. Chen and J. Goodman, "An empirical study of smoothing techniques for language modeling", Proc. of the 34th Annual Meeting on Association for Computational Linguistics (ACL), Santa Cruz, California, USA, pp.310-318, 1999.
    J.T. Goodman, "A bit of progress in language modeling", Computer Speech and Language, Vol.15, No.4, pp.403-434, 2001.
    I. Pouzyrevsky, "Scalable modified Kneser-Ney language model estimation", Proc. of the 51th Annual Meeting on Association for Computational Linguistics (ACL), Sofia, Bulgaria, pp.690-696, 2013.
    Y. Bengio, R. Ducharme, P. Vincent, et al., "A neural probabilistic language model", Journal of Machine Learning Research, Vol.3, No.6, pp.1137-1155, 2003.
    T. Mikolov, M. Karafit, L. Burget, et al., "Recurrent neural network based language model", Proc. of Conference of the International Speech Communication Association (INTERSPEECH), Makuhari, Chiba, Japan, pp.1045-1048, 2010.
    M. Sundermeyer, H. Ney and R. Schluter, "From feedforward to recurrent LSTM neural networks for language modeling", IEEE Transactions on Audio Speech and Language Processing, Vol.23, No.3, pp.517-529, 2015.
    W.D. Mulder, S. Bethard and M.F. Moens, "A survey on the application of recurrent neural networks to statistical language modeling", Computer Speech and Language, Vol.30, No.1, pp.61-98, 2015.
    Y. Gal and Z. Ghahramani, "A theoretically grounded application of dropout in recurrent neural networks", Proc. of the 30th Annual Conference on Neural Information Processing Systems (NIPS), Barcelona, Spain, pp.1019-1027, 2016.
    J.T. Chien and Y.C. Ku, "Bayesian recurrent neural network language model", Proc. of IEEE Workshop on Spoken Language Technology (SLT), South Lake Tahoe, Nevada, USA, pp.206-211, 2014.
    E. Arisoy, A. Sethy, B. Ramabhadran, et al., "Bidirectional recurrent neural network language models for automatic speech recognition", Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Queensland, Australia, pp.5421-5425, 2015.
    X. Chen, A. Ragni, X. Liu, et al., "Investigating bidirectional recurrent neural network language models for speech recognition", Proc. of Conference of the International Speech Communication Association (INTERSPEECH), Stockholm, Sweden, pp.269-273, 2017.
    H. Ney, U. Essen and R. Kneser, "On structruing porbabilittic dependences in stochastic language modelling", Computer Speech and Language, Vol.8, No.1, PP.1-38, 199.
    R. Pickhardt, T. Gottron, M. Körner, et al., "A generalized language model as the combination of skipped n-grams and modified Kneser-Ney smoothing", Proc. of the 52nd Annual Meeting on Association for Computational Linguistics (ACL), Baltimore, Maryland, USA, pp.1145-1154, 2014.
    N. Shazeer, J. Pelemans and C. Chelba, "Sparse non-negative matrix language modeling for skip-grams", Proc. of Conference of the International Speech Communication Association (INTERSPEECH), Dresden, Germany, pp.1428-1432, 2015.
    D. Guthrie, B. Allison, W. Liu, et al., "A closer look at skip-gram modelling", In Proc. of the 5th International Conference on Language Resources and Evaluation, Genoa, Italy, pp.1222-1225, 2006.
    Z. Xie, S.I. Wang, J. Li, et al., "Data noising as smoothing in neural network language models", In Proc, of the 5th International Conference on Learning Representations (ICLR), Toulon, France, pp.1-11,2017.
    M. Schuster and K.K. Paliwal, "Bidirectional recurrent neural networks", IEEE Transactions on Signal Processing, Vol.45, No.11, pp.2673-2681, 1997.
    D. Povey, A. Ghoshal, G. Boulianne, et al., "The kaldi speech recognition toolkit", Proc. of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Hawaii, USA, 2011.
    A. Stolcke, "Srilm-An extensible language modeling toolkit", Proc. of Conference of the International Speech Communication Association (INTERSPEECH), Denver, Colorado, USA, pp.901-904, 2002.
    F. Seide and A. Agarwal, "CNTK:Microsoft's open-source deep-learning toolkit", Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, California, USA, pp.2135-2135, 2016.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (137) PDF downloads(178) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return