XU Yunfei, YANG Hai, YANG Lin, ZHOU Ruohua, YAN Yonghong. A General Bayesian Model for Speaker Verification[J]. Chinese Journal of Electronics, 2016, 25(6): 1045-1051. doi: 10.1049/cje.2016.08.024
Citation: XU Yunfei, YANG Hai, YANG Lin, ZHOU Ruohua, YAN Yonghong. A General Bayesian Model for Speaker Verification[J]. Chinese Journal of Electronics, 2016, 25(6): 1045-1051. doi: 10.1049/cje.2016.08.024

A General Bayesian Model for Speaker Verification

doi: 10.1049/cje.2016.08.024
Funds:  This work is partially supported by the National Natural Science Foundation of China (No.11161140319, No.91120001, No.61271426), the Strategic Priority Research Program of the Chinese Academy of Sciences (No.XDA06030100, No.XDA06030500), the National 863 Program (No.2012AA012503), and the CAS Priority Deployment Project (No.KGZD-EW-103-2).
  • Received Date: 2014-08-17
  • Rev Recd Date: 2014-11-26
  • Publish Date: 2016-11-10
  • This paper presents a general Bayesian model for speaker verification tasks. It is a generative probability model. Due to its simple analytical property, a computationally efficient expectation-maximization algorithm can be derived to obtain the model parameters. A closed-form solution, which allows the scalable size of enrollment set, is given in a full Bayesian way for making speaker verification decisions. Factor analysis technique is employed to model the speaker-specific components, then the redundant information in this model will be dropped. Experimental results are evaluated by both equal error rate and minimum detection cost function. The proposed approach shows promising results on the National institute of standards and technology (NIST) Speaker recognition evaluation (SRE) 2010 extended and 2012 core tasks. Significant improvement is obtained when comparing with Gaussian probabilistic linear discriminant analysis, especially under phone-call conditions and mismatched train-test channel conditions. Contrast experimental results with other popular generative probability models are also presented in this paper.
  • loading
  • T. Kinnunen and H.Z. Li, "An overview of text-independent speaker recognition: From features to supervectors", Speech Communication, Vol.52, No.1, pp.12-40, 2010.
    N. Dehak, P.J. Kenny, R. Dehak, et al., "Front-end factor analysis for speaker verification", IEEE Transactions on Audio, Speech, and Language Processing, Vol.19, No.4, pp.788-798, 2011.
    N. Dehak, Z.N. Karam, D.A. Reynolds, et al., "A channel-blind system for speaker verification", Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, pp.4536-4539, 2011.
    N. Brummer, "EM for probabilistic LDA", Agnitio Research, Cape Town (South Africa), Tech. Rep., 2010.
    D. Garcia-Romero and C.Y. Espy-Wilson, "Analysis of i-vector length normalization in speaker recognition systems", Proc. of Annual Conference of the International Speech Communication Association, Florence, Italy, pp.249-252, 2011.
    Y. Lei, L. Burget and N. Scheffer, "Bilinear factor analysis for ivector based speaker verification", Proc. of Annual Conference of the International Speech Communication Association, Portland, Oregon, 2012.
    P. Kenny, "Bayesian speaker verification with heavy-tailed priors", Proc. of The Speaker and Language Recognition Workshop, Brno, Czech Republic, 2010.
    H. Yang, Y.F. Xu, Q.W. Zhao and Y.H. Yan, "Speaker recognition using sparse probabilistic linear discriminant analysis", IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol.96, No.10, pp.938-1945, 2013.
    H. Yang, C.Y. Liang, Y.F. Xu, L. Yang and Y.H. Yan, "Sparse probabilistic linear discriminant analysis for speaker verification", Proc. of Annual Conference of the International Speech Communication Association, Portland, Oregon, 2012.
    N. Brummer and E.D. Villiers, "The speaker partitioning problem", Proc. of The Speaker and Language Recognition Workshop, Brno, Czech Republic, 2010.
    J. Villalba and N. Brummer, "Towards fully Bayesian speaker recognition: Integrating out the between-speaker covariance", Proc. of Annual Conference of the International Speech Communication Association, Florence, Italy, pp.505-508, 2011.
    B.J. Borgstrom and A. McCree, "Discriminatively trained Bayesian speaker comparison of i-vectors", Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, pp.7644-7648, 2013.
    C.M. Bishop and N.M. Nasrabadi, Pattern Recognition and Machine Learning, Springer, New York, USA, 2006.
    I. Jolliffe, Principal Component Analysis, John Wiley & Sons, New York, USA, 2005.
    S.J.D. Prince and J.H. Elder, "Probabilistic linear discriminant analysis for inferences about identity", Proc. of IEEE 11th International Conference on Computer Vision, pp.1-8, 2007.
    T. Hasan and J.H.L. Hansen, "Acoustic factor analysis based universal background model for robust speaker verification in noise", Proc. of Annual Conference of the International Speech Communication Association, Lyon, France, pp.3127- 3131, 2013.
    D.A. Reynolds, T.F. Quatieri and R.B. Dunn, "Speaker verification using adapted Gaussian mixture models", Digital Signal Processing, Vol.10, No.1, pp.19-41, 2000.
    J.C. Yang, X. Zhang, H.B Suo, et al., "Language recognition with language total variability", Chinese Journal of Electronics, Vol.21, No.1, pp.97-101, 2012.
    P. Kenny, "Joint factor analysis of speaker and session variability: theory and algorithms", CRIM Report, CRIM-06/08-13, 2005.
    B. Chigier, "Automatic speech recognition", Patent, US Patent 5, 638, 487, 1997.
    P. Schwarz, P. Matejka and J. Cernocky, "Hierarchical structures of neural networks for phoneme recognition recognition", Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France, pp325-328, 2006.
    S. Young, G. Evermann, D. Kershaw, et al., The HTK book, Cambridge University, 2002.
    A.F. Martin and C.S. Greenberg, "The NIST 2010 speaker recognition evaluation", Proc. of Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp.2726-2729, 2010.
    C. Liang, X. Zhang and Y.H. Yan, "Discriminative decision function based scoring method used in speaker verification", Chinese Journal of Electronics, Vol.21, No.4, pp.692-696, 2012.
    C.S. Greenberg, V.M. Stanford, A.F. Martin, et al., "The 2012 NIST Speaber recognition evalution", Proc. of Annual Conference of the International Speech Communication Association, Lyon, France, pp.1971-1975, 2013.
    L. Burget, O. Plchot, S. Cumani, O. Glembek, P. Matejka and N. Brummer, "Discriminatively trained probabilistic linear discriminant analysis for speaker verification" Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, pp.4832-4835, 2011.
    P. Kenny, T. Stafylakis, P. Ouellet, M.J. Alam and P. Dumouchel, "PLDA for speaker verification with utterances of arbitrary duration" Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, pp.7649-7653, 2013.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (211) PDF downloads(589) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return