Citation: | LI Xu, TU Ming, WANG Xiaofei, et al., “Single-Channel Speech Separation Based on Non-negative Matrix Factorization and Factorial Conditional Random Field,” Chinese Journal of Electronics, vol. 27, no. 5, pp. 1063-1070, 2018, doi: 10.1049/cje.2018.06.016 |
S.T. Roweis, “One microphone source separation”, International Conference on Neural Information Processing Systems (NIPS), Denver, USA, pp.763-769, 2000.
|
G.J. Jang and T.W. Lee, “A maximum likelihood approach to single-channel source separation”, Journal of Machine Learning Research, Vol.4, No.12, pp.1365-1392, 2003.
|
P.S. Huang, M. Kim and M. Hasegawa-Johnson, “Deep learning for speech separation”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, pp.1562-1566, 2014.
|
Y. Tu, J. Du and Y. Xu, “Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers”, International Symposium on Chinese Spoken Language Processing (ISCSLP), Singapore, pp.250-254, 2014.
|
Y. Wang, A. Narayanan and D.L. Wang, “On training targets for supervised speech separation”, IEEE Transactions on Audio, Speech, and Language Processing, Vol.22, No.12, pp.1849- 1858, 2014.
|
F. Weninger, J.R. Hershey and J. Le Roux, “Discriminatively trained recurrent neural networks for single-channel speech separation”, IEEE Global Conference on Signal and Information Processing (GlobalSIP), Atlanta, USA, pp.577-581, 2014.
|
D.D. Lee and H.S. Seung, “Learning the parts of objects by non-negative matrix factorization”, Nature, Vol.401, No.6755, pp.788-791, 1999.
|
P. Smaragdis, B. Raj and M. Shashanka, “Supervised and semisupervised separation of sounds from single-channel mixtures”, International Conference on Independent Component Analysis and Signal Separation, London, UK, pp.414-421, 2007
|
M.N. Schmidt and R.K. Olsson, “Single-channel speech separation using sparse non-negative matrix factorization”, ISCA International Conference on Spoken Language Processing (INTERSPEECH), Pittsburgh, USA, pp.2614-2617, 2006.
|
T. Virtanen, “Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria”, IEEE Transactions on Audio, Speech, and Language Processing, Vol.15, No.3, pp.1066-1074, 2007.
|
K.W. Wilson, B. Raj and P.Smaragdis, “Regularized nonnegative matrix factorization with temporal dependencies for speech denoising”, INTERSPEECH, Brisbane, Australia, pp.411-414, 2008.
|
K.W. Wilson, B. Raj, P.Smaragdis, et al., “Speech denoising using nonnegative matrix factorization with priors”, ICASSP, Toulouse, France, pp.4029-4032, 2008.
|
C. Fevotte, “Majorization-minimization algorithm for smooth Itakura-Saito nonnegative matrix factorization”, ICASSP, Prague, Czech Republic, pp.1980-1983, 2011.
|
J. Nam, G.J. Mysore and P. Smaragdis, “Sound recognition in mixtures”, International Conference on Latent Variable Analysis and Signal Separation, Tel Aviv, Israel, pp.405-413, 2012.
|
C. Fevotte, J.Le Roux and J.R. Hershey, “Non-negative dynamical system with application to speech and audio”, ICASSP, Vancouver, BC, Canada, pp.3158-3162, 2013.
|
N. Mohammadiha, P. Smaragdis and A. Leijon, “Prediction based filtering and smoothing to exploit temporal dependencies in NMF”, ICASSP, Vancouver, BC, Canada, pp.873-877, 2013.
|
G.J. Mysore, P. Smaragdis and B. Raj, “Non-negative hidden markov modeling of audio with application to source separation”, International Conference on Latent Variable Analysis and Signal Separation, St. Malo, France, pp.140-148, 2010.
|
G.J. Mysore and P. Smaragdis, “A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics”, ICASSP, Prague, Czech Republic, pp.17-20, 2011.
|
Y.T. Yeung, T. Lee and C.C. Leung, “Integrating multiple observations for model-based single-microphone speech separation with conditional random fields”, ICASSP, Kyoto, Japan, pp.257-260, 2012.
|
Y.T. Yeung, T. Lee and C.C. Leung, “Using dynamic conditional random field on single-microphone speech separation”, ICASSP, Vancouver, BC, Canada, pp.146-150, 2013.
|
J.F. Gemmeke and H. Van hamme, “An hierarchical exemplarbased sparse model of speech, with an application to ASR”, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Hawaii, USA, pp.101-106, 2011.
|
D.D. Lee and H.S. Seung, “Algorithms for non-negative matrix factorization”, Advances in Neural Information Processing Systems, Vol.13, No.6, pp.556-562, 2001.
|
T. Virtanen, J.F. Gemmeke and B. Raj, “Active-set newton algorithm for overcomplete non-negative representations of audio”, IEEE Transactions on Audio, Speech, and Language Processing, Vol.21, No.11, pp.2277-2289, 2013.
|
T. Virtanen, J.F. Gemmeke, B. Raj, et al., “Compositional models for audio processing: Uncovering the structure of sound mixtures”, IEEE Signal Processing Magazine, Vol.32, No.2, pp.125-144, 2015.
|
C. Sutton, A. McCallum and K. Rohanimanesh, “Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data”, Journal of Machine Learning Research, Vol.8, No.3, pp.693-723, 2007.
|
M. Cooke, J. Barker, S. Cunningham, et al., “An audio-visual corpus for speech perception and automatic speech recognition”, The Journal of the Acoustical Society of America, Vol.120, No.5, pp.2421-2424, 2006.
|
E. Vincent, R. Gribonval and C. Fevotte, “Performance measurement in in blind audio source separation”, IEEE Transactions on Audio, Speech, and Language Processing, Vol.14, No.4, pp.1462-1469, 2006.
|
E.H. Rothauser, W.D. Chapman and N. Guttman, “IEEE recommended practice for speech quality measurements”, IEEE Transactions on Audio Electroacoust, Vol.17, No.3, pp.225-246, 1969.
|