基于视听觉感知系统的多模态情感识别
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61876067); 广东省普通高校人工智能重点领域专项(2019KZDZX1033); 广东省信息物理融合系统重点实验室建设专项(2020B1212060069)


Emotion Recognition Based on Visual and Audiovisual Perception System
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 增强出版
  • |
  • 文章评论
    摘要:

    情绪识别作为人机交互的热门领域, 其技术已经被应用于医学、教育、安全驾驶、电子商务等领域.情绪主要由面部表情、声音、话语等进行表达, 不同情绪表达时的面部肌肉、语气、语调等特征也不相同, 使用单一模态特征确定的情绪的不准确性偏高, 考虑到情绪表达主要通过视觉和听觉进行感知, 本文提出了一种基于视听觉感知系统的多模态表情识别算法, 分别从语音和图像模态出发, 提取两种模态的情感特征, 并设计多个分类器为单特征进行情绪分类实验, 得到多个基于单特征的表情识别模型. 在语音和图像的多模态实验中, 提出了晚期融合策略进行特征融合, 考虑到不同模型间的弱依赖性, 采用加权投票法进行模型融合, 得到基于多个单特征模型的融合表情识别模型. 本文使用AFEW数据集进行实验, 通过对比融合表情识别模型与单特征的表情识别模型的识别结果, 验证了基于视听觉感知系统的多模态情感识别效果要优于基于单模态的识别效果.

    Abstract:

    As a hot spot of human-computer interaction, emotion recognition has been applied in many fields, such as medicine, education, safe driving and e-commerce. Emotions are mainly expressed by facial expression, voice, discourse and so on. Other characteristics such as facial muscles, mood and intonation vary when different kinds of emotions are expressed. Thus, the inaccuracy of emotions determined using a single modal feature is high. Considering that the expressed emotions are mainly perceived by vision and hearing, this study proposes a multimodal expression recognition algorithm based on an audiovisual perception system. Specifically, the emotion features of speech and image modalities are first extracted, and a plurality of classifiers are designed to perform emotion classification experiments for a single feature, from which multiple expression recognition models based on single features are obtained. In the multimodal experiments of speech and images, a late fusion strategy is put forward for feature fusion. Taking into account the weak dependence of different models, this work uses the weighted voting method for model fusion and obtains the integrated expression recognition model based on multiple single-feature models. The AFEW dataset is adopted for facial expression recognition in this study. The comparison of recognition results between the integrated model and the single-feature models for expression recognition verifies that the effect of multimodal emotion recognition based on the audiovisual perception system is better than that of single-modal emotion recognition.

    参考文献
    相似文献
    引证文献
引用本文

龙英潮,丁美荣,林桂锦,刘鸿业,曾碧卿.基于视听觉感知系统的多模态情感识别.计算机系统应用,2021,30(12):218-225

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-03-05
  • 最后修改日期:2021-04-07
  • 录用日期:
  • 在线发布日期: 2021-12-10
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号