QIAN Yanmin, XU Ji, LIU Jia. Multi-Stream Posterior Features and Combining Subspace Gmms for Low Resource LVCSR[J]. Chinese Journal of Electronics, 2013, 22(2): 291-295.
Citation: QIAN Yanmin, XU Ji, LIU Jia. Multi-Stream Posterior Features and Combining Subspace Gmms for Low Resource LVCSR[J]. Chinese Journal of Electronics, 2013, 22(2): 291-295.

Multi-Stream Posterior Features and Combining Subspace Gmms for Low Resource LVCSR

Funds:  This work is supported by the National High Technology Research and Development Program of China (863 Program) (No.2008AA040201), National Science and Technology Pillar Program of China (No.2009- BAH41B01), NSFC (National Natural Science Foundation of China) (No.90920302), NSFC and RGC (No.60931160443).
  • Received Date: 2012-01-01
  • Rev Recd Date: 2012-04-01
  • Publish Date: 2013-04-25
  • Large vocabulary continuous speech recognition is particularly difficult for low-resource languages. In the scenario we focus on here is that there is a very limited amount of acoustic training data in the target language, but more plentiful data in other languages. We investigate both feature-level and model-level approaches. The first is based on theMLPframework, inwhich we train the multi-streams based on the Automatic speech attribute transcription strategy and data sampling method individually, and a multilingual training mode using the non-target languages data is presented to obtain more discriminative features. At the model level we apply the recently proposed Subspace Gaussian mixture model to obtain more improvement. Finally, combining these two strategies in a multilingual training mode we get a large improvement of more than 13% absolute versus a conventional baseline.
  • loading
  • P. Fung and T. Schultz, "Multilingual spoken language processing", IEEE Signal Processing Magazine, Vol.25, No.3, pp.89-97, 2008.
    X. Cui, J. Xue et al., "Acoustic modeling with bootstrap and restructuring for low-resourced langauges", Proc. of Interspeech, Makuhari, Japan, pp.2974-2977, 2010.
    H. Lin, L. Deng et al., "A study on multilingual acoustic modeling for large vocabulary ASR", Proc. of ICASSP, Taipei, Taiwan, China, pp.4333-4336, 2009.
    B.D. Walker, B.C. Lackey, J.S. Muller and P.J. Schone, "Language-reconfigurable universal phone recognition", Proc. of Eurospeech, Geneva, Switzerland, 2003.
    S.M. Siniscalchi, T. Svendsen and C.H. Lee, "Toward bottom-up continuous phone recognition", Proc. of ASRU, Kyoto, Japan, pp.566-569, 2007.
    S.M. Siniscalchi, T. Svendsen and C.H. Lee, "Toward a detectorbased universal phone recognizer", Proc. of ICASSP, Las Vegas, Nevada, USA, pp.4261-4264, 2008.
    D. Povey, L. Burget et al., "The subspace gaussian mixture model-a structured model for speech recognition", Computer Speech and Language, Vol.25, No.2, pp.404-439, 2011.
    Y. Qian, D. Povey, J. Liu, "State-level data borrowing for lowresource speech recognition based on subspace gmms", Proc. of Interspeech, Florence, Italy, pp.553-556, 2011.
    A. Stolcke, "SRILM - an extensible language modeling toolkit", Proc. of ICSLP, Denver, Colorado, USA, pp.901-904, 2002.
    ICSI QuickNet Software Package, http://www. icsi.Berkeley.deu/ speech/qn.htm.
    H. Hermansky, D.P.W. Ellis and S. Sharma, "Tandem connectionist feature extraction for conventional hmm systems", Proc. of ICASSP, Istanbul, Turkey, pp.1635-1638, 2000.
    O. Cetin et al., "An articulatory feature-based tandem approach and factored observation modeling", Proc. of ICASSP, Honolulu, Hawaii, USA, pp.645-648, 2007.
    P. Schwarz, P. Matejaka and J. Cernocky, "Hierarchical structures of neural networks for phoneme recognition", Proc. of ICASSP, Toulouse, France, pp.325-328, 2006.
    X. Chen and Y. Zhao, "Data sampling ensemble acoustic modeling", Proc. of ICASSP, Taipei, Taiwan,China, pp.3805-3808, 2009.
    S. Stuker, T. Schultz, F. Metze and A. Waibel, "Multilingual articulatory features", Proc. of ICASSP, Hong Kong, China, pp.144-147, 2003.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (300) PDF downloads(1085) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return