广州市科技计划重点领域研发计划(202007030005); 广东省自然科学基金面上项目(2019A1515011375); 广东大学生科技创新培育专项资金(“攀登计划”专项资金) (pdjh2020a0145)
基于多模态生理数据的连续情绪识别技术在多个领域有重要用途, 但碍于被试数据的缺乏和情绪的主观性, 情绪识别模型的训练仍需更多的生理模态数据, 且依赖于同源被试数据. 本文基于人脸图像和脑电提出了多种连续情绪识别方法. 在人脸图像模态, 为解决人脸图像数据集少而造成的过拟合问题, 本文提出了利用迁移学习技术训练的多任务卷积神经网络模型. 在脑电信号模态, 本文提出了两种情绪识别模型: 第一个是基于支持向量机的被试依赖型模型, 当测试数据与训练数据同源时有较高准确率; 第二个是为降低脑电信号的个体差异性和非平稳特性对情绪识别的影响而提出的跨被试型模型, 该模型基于长短时记忆网络, 在测试数据和训练数据不同源的情况下也具有稳定的情绪识别性能. 为提高对同源数据的情绪识别准确率, 本文提出两种融合多模态决策层情绪信息的方法: 枚举权重方法和自适应增强方法. 实验表明: 当测试数据与训练数据同源时, 在最佳情况下, 双模态情绪识别模型在情绪唤醒度维度和效价维度的平均准确率分别达74.23%和80.30%; 而当测试数据与训练数据不同源时, 长短时记忆网络跨被试型模型在情绪唤醒度维度和效价维度的准确率分别为58.65%和51.70%.
Continuous emotion recognition based on multimodal physiological data plays an important role in many fields. However, it needs more physiological data to train emotion recognition models due to the lack of subjects’ data and subjectivity of emotion, and it is largely affected by homologous subjects’ data. In this study, we propose multiple emotion recognition methods based on facial expressions and EEG. Regarding the modality of facial images, we propose a multi-task convolutional neural network trained by transfer learning to avoid over-fitting induced by small datasets of facial images. With respect to the modality of EEG, we propose two emotion recognition models. The first is a subject-dependent model based on support vector machine, possessing high accuracy when the validation and training data are homogeneous. The second is a cross-subject model for reducing the impact caused by the individual variation and non-stationarity of EEG. It is based on a long short-term memory network, performing stably under the circumstance that validation and training data are heterogeneous. To improve the accuracy of emotion recognition for homogeneous data, we propose two methods for decision-level fusion of multimodal emotion prediction: Weight enumeration and adaptive boost. According to the experiments, when the validation and training data are homogeneous, under the best circumstance, the average accuracy that multimodal emotion recognition models reached in both arousal and valence dimensions were 74.23% and 80.30%; as the validation and training data are heterogeneous, the accuracy that the cross-subject model reached in both arousal and valence dimensions are 58.65% and 51.70%.