Improved Cross-Corpus Speech Emotion Recognition Using Deep Local Domain Adaptation
-
Abstract
Due to the scarcity of high-quality labeled speech emotion data, it is natural to apply transfer learning to emotion recognition. However, transfer learning-based speech emotion recognition becomes more challenging because of the complexity and ambiguity of emotion. Domain adaptation based on maximum mean discrepancy considers marginal alignment of source domain and target domain, but not pay regard to class prior distribution in both domains, which results in the reduction of transfer efficiency. In order to address the problem, this study proposes a novel cross-corpus speech emotion recognition framework based on local domain adaption. A category-grained discrepancy is used to evaluate the distance between two relevant domains. According to research findings, the generalization ability of the model is enhanced by using the local adaptive method. Compared with global adaptive and non-adaptive methods, the effectiveness of cross-corpus speech emotion recognition is significantly improved.
-
-