###
计算机系统应用英文版:2024,33(4):13-25
本文二维码信息
码上扫一扫!
基于双编码器表示学习的多模态情感分析
(华南师范大学 软件学院, 佛山 528225)
Multimodal Sentiment Analysis Based on Dual Encoder Representation Learning
(School of Software, South China Normal University, Foshan 528225, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 73次   下载 216
Received:October 15, 2023    Revised:November 15, 2023
中文摘要: 多模态情感分析旨在通过用户上传在社交平台上的视频来判断用户的情感. 目前的多模态情感分析研究主要是设计复杂的多模态融合网络来学习模态之间的一致性信息, 在一定程度上能够提升模型的性能, 但它们大部分都忽略了模态之间的差异性信息所起到的互补作用, 从而导致情感分析出现偏差. 本文提出了一个基于双编码器表示学习的多模态情感分析模型DERL (dual encoder representation learning), 该模型通过双编码器结构学习模态不变表征和模态特定表征. 具体来说, 我们利用基于层级注意力机制的跨模态交互编码器学习所有模态的模态不变表征, 获取一致性信息; 利用基于自注意力机制的模态内编码器学习模态私有的模态特定表征, 获取差异性信息. 此外, 我们设计两个门控网络单元对编码后的特征进行增强和过滤, 以更好地结合模态不变和模态特定表征, 最后在融合时通过缩小不同多模态表示之间的L2距离以捕获它们之间潜在的相似情感用于情感预测. 在两个公开的数据集CMU-MOSI和CMU-MOSEI上的实验结果表明该模型优于一系列基线模型.
Abstract:Multimodal sentiment analysis aims to assess users’ sentiment by analyzing the videos they upload on social platforms. The current research on multimodal sentiment analysis primarily focuses on designing complex multimodal fusion networks to learn the consistency information among modalities, which enhances the model’s performance to some extent. However, most of the research overlooks the complementary role played by the difference information among modalities, resulting in sentiment analysis biases. This study proposes a multimodal sentiment analysis model called DERL (dual encoder representation learning) based on dual encoder representation learning. This model learns modality-invariant representations and modality-specific representations by a dual encoder structure. Specifically, a cross-modal interaction encoder based on a hierarchical attention mechanism is employed to learn the modality-invariant representations of all modalities to obtain consistency information. Additionally, an intra-modal encoder based on a self-attention mechanism is adopted to learn the modality-specific representations within each modality and thus capture difference information. Furthermore, two gate network units are designed to enhance and filter the encoded features and enable a better combination of modality-invariant and modality-specific representations. Finally, during fusion, potential similar sentiment between different multimodal representations is captured for sentiment prediction by reducing the L2 distance among them. Experimental results on two publicly available datasets CMU-MOSI and CMU-MOSEI show that this model outperforms a range of baselines.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金(61070015)
引用文本:
冼广铭,阳先平,招志锋.基于双编码器表示学习的多模态情感分析.计算机系统应用,2024,33(4):13-25
XIAN Guang-Ming,YANG Xian-Ping,ZHAO Zhi-Feng.Multimodal Sentiment Analysis Based on Dual Encoder Representation Learning.COMPUTER SYSTEMS APPLICATIONS,2024,33(4):13-25