WANG Wenchao, XU Ji, YAN Yonghong, “Identity Vector Extraction Using Shared Mixture of PLDA for Short-Time Speaker Recognition,” Chinese Journal of Electronics, vol. 28, no. 2, pp. 357-363, 2019, doi: 10.1049/cje.2018.06.005
Citation: WANG Wenchao, XU Ji, YAN Yonghong, “Identity Vector Extraction Using Shared Mixture of PLDA for Short-Time Speaker Recognition,” Chinese Journal of Electronics, vol. 28, no. 2, pp. 357-363, 2019, doi: 10.1049/cje.2018.06.005

Identity Vector Extraction Using Shared Mixture of PLDA for Short-Time Speaker Recognition

doi: 10.1049/cje.2018.06.005
Funds:  This work is partially supported by the National Natural Science Foundation of China (No.11590770-4, No.U1536117, No.11504406, No.11461141004), the National Key Research and Development Plan (No.2016YFB0801203, No.2016YFB0801200), the Key Science and Technology Project of the Xinjiang Uygur Autonomous Region (No.2016A03007-1),and the Pre-research Project for Equipment of General Information System (No.JZX2017-0994/Y306).
  • Received Date: 2017-07-10
  • Rev Recd Date: 2018-03-20
  • Publish Date: 2019-03-10
  • The state-of-the-art speaker recognition system degrades performance rapidly dealing with shorttime utterances. It is known to all that identity vectors (i-vectors) extracted from short utterances have large uncertainties and standard Probabilistic linear discriminant analysis (PLDA) method can not exploit this uncertainty to reduce the effect of duration variation. In this work, we use Shared mixture of PLDA (SM-PLDA) to remodel the i-vectors utilizing their uncertainties. SM-PLDA is an improved generative model with a shared intrinsic factor, and this factor can be regarded as an identity vector containing speaker indentification information. This identity vector can be modeled by PLDA. Experimental results are evaluated by both equal error rate and minimum detection cost function. The results conducted on the National institute of standards and technology (NIST) Speaker recognition evaluation (SRE) 2010 extended tasks show that the proposed method has achieved significant improvements compared with ivector/PLDA and some other advanced methods.
  • loading
  • N. Dehak, P.J. Kenny, R. Dehak, et al., “Front-end factor analysis for speaker verification”, IEEE Transactions on Audio, Speech, and Language Processing, Vol.19, No.4, pp.788-798, 2011.
    S.J.D. Prince and J.H. Elder, “Probabilistic linear discriminant analysis for inferences about identity”, Proc. of IEEE 11th International Conference on Computer Vision, pp.1-8, 2007.
    D.A. Reynolds, T.F. Quatieri and R.B. Dunn, “Speaker verification using adapted gaussian mixture models”, Digital signal processing, Vol. 10, No.1, pp.19-41, 2000.
    P. Kenny, “Joint factor analysis of speaker and session variability: Theory and algorithms”, CRIM, Report, CRIM-06/08-13, 2005.
    N. Dehak, Z.N. Karam, D.A. Reynolds, et al., “A channelblind system for speaker verification”, Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, pp.4536-4539, 2011.
    P. Kenny, “Bayesian speaker verification with heavy-tailed prior”, Proc. of The Speaker and Language Recognition Workshop, Brno, Czech Republic, 2010.
    L.T. Xu, Z. Yang and L. Sun, “Simplification of I-Vector Extraction for Speaker Identification”, Chinese Journal of Electronics, Vol.25, No.6, pp.1121-1126, 2016.
    Y.F. Xu, H. Yang, L. Yang, et al., “A general Bayesian model for speaker verification”, Chinese Journal of Electronics, Vol.25, No.6, pp.1045-1051, 2016.
    Y. Lei, N. Scheffer, L. Ferrer, et al., “A novel scheme for speaker recognition using a phonetically-aware deep neural network”, Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, pp.1695-1699, 2014.
    P. Kenny, T. Stafylakis, P. Ouellet, et al., “Plda for speaker verification with utterances of arbitrary duration”, Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, pp.7649-7653, 2013.
    S. Cumani, O. Plchot and P. Laface, “Probabilistic linear discriminant analysis of i-vector posterior distributions”, Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada, pp.7644-7648, 2013.
    S. Cumani, “Fast scoring of full posterior plda models”, IEEE Transactions on Audio, Speech, and Language Processing, Vol.23, No.11, pp.2036-2045, 2015.
    S. Cumani, O. Plchot and P. Laface, “On the use of i-vector posterior distributions in probabilistic linear discriminant analysis”, IEEE Transactions on Audio, Speech, and Language Processing, Vol.22, No.4, pp.846-857, 2014.
    Q.Y. Hong, L. Li, M. Li, et al., “Modified-prior plda and score calibration for duration mismatch compensation in speaker recognition system”, Proc. of Conference of the International Speech Communication Association (INTERSPEECH), Dresden, Germany, pp.1037-1041, 2015.
    M.I. Mandasari, R. Saeidi, M. McLaren, et al., “Quality measure functions for calibration of speaker recognition systems in various duration conditions”, IEEE Transactions on Audio, Speech, and Language Processing, Vol.21, No.11, pp.2425-2438, 2013.
    M.I. Mandasari, R. Saeidi and D.A.V. Leeuwen, “Quality measures based calibration with duration and noise dependency for speaker recognition”, Speech Communication, Vol.72, pp.126-137, 2015.
    Z. Ghahramani and G.E. Hinton, “The EM algorithm for mixtures of factor analyzers”, Technical Report, CRG-TR-96-1, 1996.
    M. Senoussaoui, P. Kenny, N. Brummer, et al., “Mixture of plda models in i-vector space for gender-independent speaker recognition”, Proc. of Conference of the International Speech Communication Association (INTERSPEECH), Florence, Italy, pp.25-28, 2011.
    A.P. Dempster, N.M. Laird and D.B. Rubin, “Maximum likelihood from incomplete data via the em algorithm”, Journal of the Royal Statistical Society, Vol.39, No.1, pp.1-38, 1977.
    D. Garcia-Romero and C.Y. Espy-Wilson, “Analysis of ivector length normalization in speaker recognition systems”, Proc. of Conference of the International Speech Communication Association (INTERSPEECH), Florence, Italy, pp.249-252, 2011.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (388) PDF downloads(210) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return