XU Ji, PAN Jielin, YAN Yonghong. Agglutinative Language Speech Recognition Using Automatic Allophone Deriving[J]. Chinese Journal of Electronics, 2016, 25(2): 328-333. doi: 10.1049/cje.2016.03.020
Citation: XU Ji, PAN Jielin, YAN Yonghong. Agglutinative Language Speech Recognition Using Automatic Allophone Deriving[J]. Chinese Journal of Electronics, 2016, 25(2): 328-333. doi: 10.1049/cje.2016.03.020

Agglutinative Language Speech Recognition Using Automatic Allophone Deriving

doi: 10.1049/cje.2016.03.020
Funds:  This work is supported by the National Natural Science Foundation of China (No.10925419, No.90920302, No.61072124, No.11074275, No.11161140319, No.91120001, No.61271426), the Strategic Priority Research Program of the Chinese Academy of Sciences (No.XDA06030100, No.XDA06030500), the National High Technology Research and Development Program of China (863 Program) (No.2012AA012503) and the CAS Priority Deployment Project (No.KGZD-EW-103-2).
  • Received Date: 2014-02-27
  • Rev Recd Date: 2014-05-13
  • Publish Date: 2016-03-10
  • Agglutinative language involves agglutination extensively, which results in the significant pronunciation variations in different contexts. Therefore, it is a problem to use phoneme sets translated from their written forms as basic units for acoustic modeling, due to the incapability to capture the pronunciation variations in Large-vocabulary continuous speech recognition (LVCSR). This paper presented a novel approach called Automatic allophone deriving (AAD) to create allophone candidates without any linguistic prior knowledge. Furthermore, an enhanced approach AAD-LT is proposed in which longtime features are used in AAD approach. Experiments are conducted on three languages which contains two agglutinative ones and an analytic one. The experiments suggest that AAD Long-time (AAD-LT) is very effective for the agglutinative languages in which more than 10% relative CER reduction is obtained.
  • loading
  • D. Kiecza, T. Schultz and A. Waibel, "Data-driven determination of appropriate dictionary units for Korean speech recognition", Proc. of ICSP, Seoul, Korea, pp.323-327, 1999.
    K. Cark, P. Geutner and T. Schultz, "Turkish LVCSR: Towards better speech recognition for agglutinative language", Proc. of ICASSP, Istanbul, Turkey, pp.1563-1566, 2000.
    T. Hirsimaki, M. Creutz, V. Siivola, et al., "Unlimited vocabulary speech recognition with morph language models applied to Finnish", Computer Speech and Language, pp.515-541, 2006.
    H. Hong, S. Kim and M. Chung, "Effects of Allophones on the Performance of Korean Speech Recognition", Proc. of Interspeech, Brisbane, Australia, pp.2410-2413, 2008.
    J. Xu, Y. Si, J. Pan, et al., "Automatic allophone deriving for Korean speech recognition", Proc. of IEEE 9th International Conference on Computational Intelligence and Security (CIS), Emei Mountain, China, pp.776-779, 2013.
    H. Hermansky and S. Sharma, "Temporal patterns (TRAPs) in ASR of noisy speech", Proc. of ICASSP, Phoenix, USA, Vol.1, pp.289-292, 1999.
    I. Taylor, "The Korean writing system: An alphabet? A syllabary? A logography", Proc. of Visible Language, pp.67-82, 1980.
    O.W. Kwon and J. Park, "Korean large vocabulary continuous speech recognition with morpheme-based recognition units", Speech Communication, Vol.39, No.3-4, pp.287-300, 2003.
    Sakriani Sakti, Andrew Finch, Ryosuke Isotani, et al., "Unsupervised determination of efficient Korean LVCSR units using a Bayesian Dirichlet process mode", Proc. of ICASSP, Prague, Czech Republic, pp.4664-4667, 2011.
    Sakriani Sakti, Andrew Finch, Chiori Hori, et al., "Conditional random fields for modeling Korean pronunciation variation", Proc. of the Paralinguistic Information and its Integration in Spoken Dialogue Systems Workshop Part 2, Granada, Spain, pp.49-55, 2011.
    M. Kim, Y.R. Oh and H.K. Kim, "Non-native pronunciation variation modeling using an indirect data driven method", Proc. of ASRU, Kyoto, Japan, pp.231-236, 2007.
    Huajian Xue, "Applying morphological rules to Uyghur continuous speech recognition", Ph.D.Thesis, The Xinjiang Technical Institute of Physics and Chemistry of Chinese Academy of Science, Urumqi, China, pp.13-15, 2012.
    Gulilaadongbieke. "The research of proofreading for the Uighur character", Proc. of IEEE International Conference on System, Man and Cybernetics, Tucson, U.S.A, Vol.2, pp.874-876, 2001.
    X. Li, S. Cai, J. Pan, et al., "Large vocabulary Uyghur continuous speech recognition based on stems and suffixe", Proc. of IEEE 7th International Symposium on Chinese Spoken Language Processing (ISCSLP), Tainan, China, pp.220-223, 2010.
    Shao Jian, "Chinese spoken term detection towards largescale telephone conversational speech", Ph.D. Thesis, Chinese Academy of Sciences, Beijing, China, pp.41-42, 2008.
    X. Huang, A. Acero and H.W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, Prentice Hall, Englewood Cliffs, U.S.A, pp.177-181, 2001.
    K. Beulen and H. Ney, "Automatic question generation for decision tree based state tying", Proc. of ICASSP, Seattle, U.S.A. Vol.2, pp.805-808, 1998.
    P. Schwarz, "Phoneme recognition based on long temporal context", Ph.D.Thesis, Brno University of Technology, Brno, Czech Republic, pp.11-12, 2008.
  • 加载中


    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (162) PDF downloads(1031) Cited by()
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint