ZHANG Jian, YUAN Qingsheng, BAO Xiuguo, et al., “PLF Optimization for Target Language Detection,” Chinese Journal of Electronics, vol. 26, no. 1, pp. 118-121, 2017, doi: 10.1049/cje.2016.11.014
Citation: ZHANG Jian, YUAN Qingsheng, BAO Xiuguo, et al., “PLF Optimization for Target Language Detection,” Chinese Journal of Electronics, vol. 26, no. 1, pp. 118-121, 2017, doi: 10.1049/cje.2016.11.014

PLF Optimization for Target Language Detection

doi: 10.1049/cje.2016.11.014
Funds:  This work is supported by the National Natural Science Foundation of China (No.11161140319, No.91120001, No.61271426), the Strategic Priority Research Program of the Chinese Academy of Sciences (No.XDA06030100, No.XDA06030500), the National High Technology Research and Development Program of China (No.2012AA012503), and the Chinese Academy of Sciences Priority Deployment Project (No.KGZD-EW-103-2).
More Information
  • Corresponding author: ZHOU Ruohua (corresponding author) received the B.S. degree from the Electronics Engineering Department, Beijing Institute of Technology, Beijing, China, in 1994, the M.S. degree of engineering in microelectronics and semiconductor devices from Microelectronics R&D Center, CAS, Beijing, in 1997, and the Ph.D. degree from the Signal Processing Laboratory (LTS), Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland. Currently he is a professor at Key Laboratory of Speech Acoustics and Content Understanding at Institute of Acoustics, CAS. (Email:zhouruohua@hccl.ioa.ac.cn)
  • Received Date: 2015-04-09
  • Rev Recd Date: 2015-06-24
  • Publish Date: 2017-01-10
  • The objective of traditional feature studies in Spoken language recognition (SLR) is extracting the linguistic discrimination between each language. However, applications of security area always interested in a particular language, which requires the features should be the best reflection of the differences between target language and the other languages. To address this problems, the frame level Phone log-posteriors feature (PLF), which has been recently introduced as a novel and effective feature in SLR, is optimized to get a better performance on Target language detection (TLD) task. The F-Ratio analysis method is used to analyze the contribution of each dimension in feature vector for TLD. In this work, frame level phone posterior probabilities are estimated by a phone recognizer, and processed through taking logarithm. Then the feature is optimized through weighting each dimension according to the F-Ratio values. Finally, Principal component analysis (PCA) is used to decorrelate the feature and reduce vector size. Experiments carried out on the NIST LRE 2007 dataset show that the effectiveness of the optimized feature, which yields significant relative improvements in term of Equal error rate (EER) with regard to the Gaussian mixture models-Support vector machines (GMM-SVM) system based on the original feature.
  • loading
  • H. Li, B. Ma and K. Lee, "Spoken language recognition:From fundamentals to practice", Proceedings of the IEEE, Vol.101, No.5, pp.1136-1159, 2013.
    H. Wang, C. Leung, T. Lee, et al., "Shifted-delta mlp features for spoken language recognition", IEEE Signal Process. Lett., Vol.20, No.1, pp.15-18, 2013.
    M. Diez, A. Varona, M. Penagarikano, et al., "On the use of phone log-likelihood ratios as features in spoken language recognition", Proc. of IEEE Spoken Language Technology Workshop, Miami, Florida, USA, pp.274-279, 2012.
    N. Dehak, P.A. Torres-Carrasquillo, D.A. Reynolds, et al., "Language recognition via i-vectors and dimensionality reduction", Proc. of INTERSPEECH 2011, Florence, Italy, pp.857-860, 2011.
    J. Yang, X. Zhang, H. Suo, et al., "Language recognition with language total variability", Chinese Journal of Electronics, Vol.21, No.1, pp.97-101, 2012.
    W.M. Campbell, D.E. Sturim and D.A. Reynolds, "Support vector machines using GMM supervectors for speaker verification", IEEE Signal Process. Lett., Vol.13, No.5, pp.308-311, 2006.
    X. Lu and J. Dang, "An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification", Speech Communication, Vol.50, No.4, pp.312-322, 2008.
    Y. Sun, Y. Zhou, Q. Zhao, et al., "Acoustic feature optimization based on F-ratio for robust speech recognition", IEICE Transactions on Information and Systems, Vol.93-D, No.9, pp.2417-2430, 2010.
    A.F. Martin and A.N. Le, "NIST 2007 language recognition evaluation", Proc. of The Speaker and Language Recognition Workshop, Stellenbosch, South Africa, p.16, 2008.
    P. Matejka, P. Schwarz, J. Cernocký, et al., "Phonotactic language identification using high quality phoneme recognition", Proc. of INTERSPEECH 2005, Lisbon, Portugal, pp.2237-2240, 2005.
    L.F. DHaro, R. Cordoba, C. Salamea, et al., "Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition", Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, pp.5342-5346, 2014.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (435) PDF downloads(732) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return