国家自然科学基金 (61976149); 浙江省自然科学基金 (LZ20F020002)
语音情感识别在人机交互过程中发挥极为重要的作用, 近年来备受关注. 目前, 大多数的语音情感识别方法主要在单一情感数据库上进行训练和测试 . 然而, 在实际应用中训练集和测试集可能来自不同的情感数据库. 由于这种不同情感数据库的分布存在巨大差异性, 导致大多数的语音情感识别方法取得的跨库识别性能不尽人意. 为此, 近年来不少研究者开始聚焦跨库语音情感识别方法的研究. 本文系统性综述了近年来跨库语音情感识别方法的研究现状与进展, 尤其对新发展起来的深度学习技术在跨库语音情感识别中的应用进行了重点分析与归纳. 首先, 介绍了语音情感识别中常用的情感数据库, 然后结合深度学习技术, 从监督、无监督和半监督学习角度出发, 总结和比较了现有基于手工特征和深度特征的跨库语音情感识别方法的研究进展情况, 最后对当前跨库语音情感识别领域存在的挑战和机遇进行了讨论与展望.
Speech emotion recognition (SER) plays an extremely important role in the process of human-computer interaction (HCI), which has attracted much attention in recent years. At present, most SER approaches are mainly trained and tested on a single emotion corpus. In practical applications, however, the training set and testing set may come from different emotion corpora. Due to the huge difference in the distribution of different emotion corpora, the cross-corpus recognition performance achieved by most SER methods is unsatisfactory. To address this issue, many researchers have started focusing on the studies of cross-corpus SER methods in recent years. This study systematically reviews the research status and progress of cross-corpus SER methods in recent years. In particular, the application of the newly developed deep learning techniques on cross-corpus SER tasks is analyzed and summarized. Firstly, the emotion corpora commonly used in SER are introduced. Then, on the basis of deep learning techniques, the research progress of existing cross-corpus SER methods based on hand-designed features and deep features is summarized and compared from the perspectives of supervised, unsupervised, and semi-supervised learning. Finally, the challenges and opportunities in the field of cross-corpus SER are discussed and predicted.