Citation: | LIN Long and TAN Liang, “Multi-Distributed Speech Emotion Recognition Based on Mel Frequency Cepstogram and Parameter Transfer,” Chinese Journal of Electronics, vol. 31, no. 1, pp. 155-167, 2022, doi: 10.1049/cje.2020.00.080 |
[1] |
WANG Haikun, PAN Jia, and LIU Cong, “Research development and forecast of automatic speech recognition technologies,” Telecommunications Science, vol.34, no.2, pp.1–11, 2018.
|
[2] |
Han WJ, Li HF, Ruan HB, et al., “Review on speech emotion recognition,” Journal of Software, vol.25, no.1, pp.37–50, 2014.
|
[3] |
SONG Peng, ZHENG Wenming, and ZHAO Li, “Cross-corpus speech emotion recognition based on a feature transfer learning method,” Journal of Tsinghua University (Science and Technology), vol.56, no.11, pp.1179–1183, 2016.
|
[4] |
TENG Z and JI W, “Speech emotion recognition with i-vector feature and rnn model,” 2015 IEEE China Summit and International Conference on Signal and Information Processing (China SIP), Chengdu, pp.524–528, 2015.
|
[5] |
Basu A, Chakraborty J, and Aftabuddin M, “Emotion recognition from speech using convolutional neural network with recurrent neural network architecture,” 2017 2nd International Conference on Communication and Electronics Systems, Coimbatore, DOI: 10.1109/CESYS.2017.8321292, 2017.
|
[6] |
Sak H, Senior A, and Beaufays F, “Long short-term memory recurrent neural network architectures for large scale acoustic modeling,” in Proc. of the Annual Conference of the International Speech Communication Association, Singapore, pp.338–342, 2014.
|
[7] |
Badshah A M, Ahmad J, and Rahim N, “Speech emotion recognition from spectrograms with deep convolutional neural network,” in Proc. of the International Conference on Platform Technology and Service, Busan, DOI: 10.1109/PlatCon.2017.7883728, 2017.
|
[8] |
LU Guanming, YUAN Liang, YANG Wenjuan, et al., “Speech emotion recognition based on long-term and short-term memory and convolutional neural networks,” Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition), vol.38, no.5, pp.63–69, 2018.
|
[9] |
Cowie R, Douglas-Cowie E, Savvidou S, et al., “FEELTRACE: An instrument for recording perceived emotion in real time,” in Proc. of the 2000 ISCA Workshop on Speech and Emotion: A Conceptual Frame Work for Research, Newcastle, pp.19–24, 2000.
|
[10] |
McGilloway S, Cowie R, Douglas-Cowie E, et al., “Approaching automatic recognition of emotion from speech: A rough benchmark,” in Proc. of the 2000 ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research, Newcastle, pp.207–212, 2000.
|
[11] |
Burkhardt F, Paeschke A, Rolfes M, et al., “A database of german emotional speech,” in Proc. of the 2005 INTERSPEECH, Lisbon, pp.1517–1520, 2005.
|
[12] |
Steidl S, “Automatic classification of emotion-related user states in spontaneous children’s speech,” Ph.D. Thesis, Erlangen: University at Erlangen Nurberg, 2009.
|
[13] |
Grimm M, Kroschel K, and Narayanan S, “The Vera am Mittag German audio-visual emotional speech database,” in Proc. of the 2008 IEEE Int. Conf. on Multimedia and Expo (ICME), Hannover, pp.865–868, 2008
|
[14] |
McKeown G, Valstar MF, Cowie R, et al., “The semaine corpus of emotionally coloured character interactions,” in Proc. of the 2010 IEEE Int. Conf. on Multimedia and Expo (ICME), Singapore, pp.1079-1084, 2010.
|
[15] |
Schuller B, Valstar M, Eyben F, et al., “AVEC 2012 the continuous audio/visual emotion challenge,” in Proc. of the 2012 Int. Audio/Visual Emotion Challenge and Workshop (AVEC), Grand Challenge and Satellite of ACM ICMI 2012, Santa Monica, California, available at: https://mediatum.ub.tum.de/doc/1137896/1137896.pdf, 2012.
|
[16] |
van Bezooijen R, Otto SA, and Heenan TA, “Recognition of vocal expressions of emotion: A three-nation study to identify universal characteristics,” Journal of Cross-Cultural Psychology, vol.14, no.4, pp.387–406, 1983. doi: 10.1177/0022002183014004001
|
[17] |
Tolkmitt FJ and Scherer KR, “Effect of experimentally induced stress on vocal parameters,” Journal of Experimental Psychology Human Perception Performance, vol.12, no.3, pp.302–313, 1986. doi: 10.1037/0096-1523.12.3.302
|
[18] |
Cahn JE, “The generation of affect in synthesized speech,” Journal of the American Speech Input/Output Society, vol.8, pp.1–19, 1990.
|
[19] |
Moriyama T and Ozawa S, “Emotion recognition and synthesis system on speech,” the 1999 IEEE Int. Conf. on Multimedia Computing and Systems (ICMCS), IEEE Computer Society, Florence, pp.840–844, 1999.
|
[20] |
Cowie R, Douglas-Cowie E, Savvidou S, et al., “Feeltrace: An instrument for recording perceived emotion in real time,” the 2000 ISCA Workshop on Speech and Emotion: A Conceptual Frame Work for Research, ISCA, Belfast, pp.19–24, 2000.
|
[21] |
Grimm M and Kroschel K, “Evaluation of natural emotions using self assessment manikins,” the 2005 IEEE Workshop on Automatic Speech Recognition and Understanding, Cancun, pp.381–385, 2005.
|
[22] |
Grimm M, Kroschel K, and Narayanan S, “Support vector regression for automatic recognition of spontaneous emotions in speech,” the 2007 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), IEEE Computer Society, Honolulu, HI, pp.1085–1088, 2007.
|
[23] |
Giannakopoulos T, Pikrakis A, and Theodoridis S, “A dimensional approach to emotion recognition of speech from movies,” the 2009 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) , IEEE Computer Society, Taipei, pp.65–68, 2009.
|
[24] |
Wu D R, Parsons T D, Mower E, et al., “Speech emotion estimation in 3d space,” the 2010 IEEE Int. Conf. on Multimedia and Expo (ICME), IEEE Computer Society, Singapore, pp.737–742. 2010.
|
[25] |
Eyben F, Wollmer M, Graves A, et al., “On-Line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues,” Journal on Multimodal User Interfaces, vol.3, no.1-2, pp.7–19, 2010. doi: 10.1007/s12193-009-0032-6
|
[26] |
Karadogan SG and Larsen J, “Combining semantic and acoustic features for valence and arousal recognition in speech,” the 2012 Int. Workshop on Cognitive Information Processing (CIP), IEEE Computer Society, Baiona, pp.1–6, 2012.
|
[27] |
Eyben F, Wollmer M, and Schuller B, “OpenSMILE—The Munich versatile and fast open-source audio feature extractor,” in Proc. of the 9th ACM International Conference on Multimedia, Firenze, pp.1459–1462, 2010.
|
[28] |
Schuller B, Valstar M, Eyben F, et al., “AVEC 2011—The first international audio/visual emotion challenge,” 2011 International Conference on Affective Computing and Intelligent Interaction, Memphis, TN, pp.415–424, 2011.
|
[29] |
Yamada T, Hashimoto H, and Tosa N, “Pattern recognition of emotion with neural network,” The 1995 IEEE IECON 21st International Conference on Industrial Electronics, Control, and Instrumentation, Orlando, FL, pp.183–187, 1995.
|
[30] |
Shi B, Bai X, and Yao C, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.39, no.11, pp.12–15, 2017.
|
[31] |
Chen B, Yin Q, and Guo P, “A study of deep belief network based Chinese speech emotion recognition,” 10th International Conference on Computational Intelligence and Security, IEEE, Kunming, pp.180–184, 2014.
|
[32] |
Lozano-Dez A, Zazo C R, and Gonzlez D J, “An end-to-end approach to language identification in short utterances using convolutional neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30, no.10, pp.112–115, 2015.
|
[33] |
Zazo R, Lozano-Diez A, and Gonzalez D J, “Language identification in short utterances using long short-term memory (LSTM),” Recurrent Neural Networks, vol.23, no.1, pp.23–27, 2016.
|
[34] |
Gelly G, Gauvain J L, Le V, et al., “A divide-and-conquer approach for language identification based on recurrent neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.29, no.5, pp.22–25, 2016.
|
[35] |
Xinran Z, Peng S, and Gchen Z, “Auditory attention model based on Chirplet for cross-corpus speech emotion recognition,” Journal of Southeast University, vol.32, no.4, pp.402–407, 2016.
|