WANG Xuyang, ZHANG Pengyuan, NA Xingyu, PAN Jielin, YAN Yonghong. Handling OOV Words in Mandarin Spoken Term Detection with an Hierarchical n-Gram Language Model[J]. Chinese Journal of Electronics, 2017, 26(6): 1239-1244. doi: 10.1049/cje.2017.07.004
Citation: WANG Xuyang, ZHANG Pengyuan, NA Xingyu, PAN Jielin, YAN Yonghong. Handling OOV Words in Mandarin Spoken Term Detection with an Hierarchical n-Gram Language Model[J]. Chinese Journal of Electronics, 2017, 26(6): 1239-1244. doi: 10.1049/cje.2017.07.004

Handling OOV Words in Mandarin Spoken Term Detection with an Hierarchical n-Gram Language Model

doi: 10.1049/cje.2017.07.004
Funds:  This work is supported by the National Natural Science Foundation of China (No.11461141004, No.61271426, No.11504406, No.11590770, No.11590771, No.11590772, No.11590773, No.11590774), the Strategic Priority Research Program of the Chinese Academy of Sciences (No.XDA06030100, No.XDA06030500, No.XDA06040603), National 863 Program (No.2015AA016306), National 973 Program (No.2013CB329302) and the Key Science and Technology Project of the Xinjiang Uygur Autonomous Region (No.201230118-3).
More Information
  • Corresponding author: ZHANG Pengyuan (corresponding author) received the Ph.D. degree in information and signal processing from Institute of Acoustics, Chinese Academy of Sciences, in 2007. From 2013 to 2014, he was a research scholar of University of Sheffield. (
  • Received Date: 2015-08-28
  • Rev Recd Date: 2015-12-24
  • Publish Date: 2017-11-10
  • In this paper, an hierarchical n-gram Language model (LM) combining words and characters is explored to improve the detection of Out-of-vocabulary (OOV) words in Mandarin Spoken term detection (STD). The hierarchical LM is based on a word-level LM, with a character-level LM estimating probabilities of OOV words in a class-based way. The region containing OOV words in the sentence to be decoded is detected with the help of the word-level LM and the probabilities of OOV words are derived from the character-level LM. The implementation of the proposed approach is based on a dynamic decoder. The proposed approach is evaluated in terms of Actual term weighted value (ATWV) on two Mandarin data sets. Experiment results show that more than 10% relative improvement for OOV word detection is achieved on both sets. In addition, the detection of In-vocabulary (IV) words is barely influenced as well.
  • loading
  • M. Lidia, E. Brill and A. Stolcke, "Finding consensus in speech recognition:Word error minimization and other applications of confusion networks", Computer Speech and Language, Vol.14, No.4, pp.373-400, 2000.
    M. Mohri, F. Pereira and M. Riley, "Speech recognition with weighted finite-state transducers", Springer Handbook of Speech Processing, Springer, Berlin, Germany, pp.559-584, 2008.
    C. Doan and M. Saraclar, "Lattice indexing for spoken term detection", IEEE Transactions on Audio, Speech, and Language Processing, Vol.19, No,8, pp.2338-2347, 2011.
    I. Szoke, L. Burget, J. Cernocky, et al., "Sub-word modeling of out of vocabulary words in spoken term detection", Spoken Language Technology Workshop (SLT 2008), Goa, India, pp.273-276, 2008.
    A. Murat, D. Vergyri and A. Stolcke, "Open-vocabulary spoken term detection using graphone-based hybrid recognition systems", Acoustics, Speech and Signal Processing (ICASSP 2008), Las Vegas, Nevada, USA, pp.5240-5243, 2008.
    I. Bulyko, J. Herrero, C. Mihelich, et al., "Subword speech recognition for detection of unseen words", Thirteenth Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, pp.2446-2449, 2012.
    M. Timo and D. Schneider, "Efficient subword lattice retrieval for German spoken term detectio", Acoustics, Speech and Signal Processing, Taipei, Taiwan, China, pp.4885-4888, 2009.
    J. Gao, J. Shao, Q. Zhao, et al., "Efficient system combination for chinese spoken term detection", Chinese Journal of Electronics, Vol.19, No.3, pp.457-462, 2010.
    Welly Naptali, Masatoshi Tsuchiya and Seiichi Nakagawa, "Class-based n-gram language model for new words using outof-vocabulary to in-vocabulary similarity", IEICE Transactions on Information and Systems, Vol.95, No.9, pp.2308-2317, 2012.
    B. Réveil, K. Demuynck and J. Martens, "An improved twostage mixed language model approach for handling out-ofvocabulary words in large vocabulary continuous speech recognition", Computer Speech and Language, Vol.28, No.1, pp.141-162, 2014.
    X. Liu, J. L Hieronymus, M. JF Gales, et al., "Syllable language models for mandarin speech recognition:Exploiting character language models", The Journal of the Acoustical Society of America, Vol.133, No.1, pp.519-528, 2013.
    I. Chen, C. Ni, B.P. Lim, et al., "A keyword-aware grammar framework for lvcsr-based spoken keyword search", International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), Brisbane, Australia, pp.5196-5200, 2015.
    P. Zhang, J. Shao, J. Han, et al., "Keyword spotting based on phoneme confusion matrix", International Symposium on Chinese Spoken Language Processing (ISCSLP 2006), Singapore, Vol.2, pp.408-419, 2006.
    P.F. Brown, P.V. Desouza, R.L. Mercer, et al., "Class-based ngram models of natural language", Computational Linguistics, Vol.18, No.4, pp.467-479, 1992.
    J. Shao, T. Li, Q. Zhang, et al., "A one-pass real-time decoder using memory-efficient state network", IEICE Transactions on Information and Systems, Vol.91, No.3, pp.529-537, 2008.
    H. Ney and S. Ortmanns, "Progress in dynamic programming search for lvcsr", Proceedings of the IEEE, Vol.88, No.8, pp.1224-1240, 2000.
    B.G. Secrest and G.R. Doddington, "An integrated pitch tracking algorithm for speech systems", International Conference on Acoustics, Speech and Signal Processing (ICASSP 1983), Boston, Massachusetts, USA, Vol.8, pp.1352-1355, 1983.
    S. Young, J. Odell and P. Woodland, "Tree-based state tying for high accuracy acoustic modelling", Proceedings of the Workshop on Human Language Technology, Plainsboro, New Jerey, USA, pp.307-312, 1994.
    A. Stolcke, "Srilm-an extensible language modeling toolkit", INTERSPEECH 2002, Denver, Colorado, USA, pp.257-286, 2002.
    H. Zhang, H. Yu, D. Xiong, et al., "Hhmm-based chinese lexical analyzer ictclas", Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, Vol.17, pp.184-187, 2003.
    J.G. Fiscus, J. Ajot, J.S Garofolo, et al., "Results of the 2006 spoken term detection evaluation", Proceeding of SIGIR 2007, Amsterdam, Netherlands, Vol.7, pp.51-57, 2007.
  • 加载中


    通讯作者: 陈斌,
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (172) PDF downloads(187) Cited by()
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint