###

计算机系统应用英文版:2021,30(8):194-200

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于卷积块注意力模块的图像描述生成模型

余海波^1,2, 陈金广^1,2

(1.西安工程大学计算机科学学院, 西安 710600;2.河南省电子商务大数据处理与分析重点实验室, 洛阳 471934)

Image Caption Generation Model Based on Convolutional Block Attention Module

YU Hai-Bo^1,2, CHEN Jin-Guang^1,2

(1.School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China;2.Henan Key Laboratory for Big Data Processing & Analytics of Electronic Commerce, Luoyang 471934, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 670次下载 1458次
Received:November 24, 2020 Revised:December 22, 2020

中文摘要: 图像描述生成模型是使用自然语言描述图片的内容及其属性之间关系的算法模型. 对现有模型描述质量不高、图片重要部分特征提取不足和模型过于复杂的问题进行了研究, 提出了一种基于卷积块注意力机制模块(CBAM)的图像描述生成模型. 该模型采用编码器-解码器结构, 在特征提取网络Inception-v4中加入CBAM, 并作为编码器提取图片的重要特征信息, 将其送入解码器长短期记忆网络(LSTM)中, 生成对应图片的描述语句. 采用MSCOCO2014数据集中训练集和验证集进行训练和测试, 使用多个评价准则评估模型的准确性. 实验结果表明, 改进后模型的评价准则得分优于其他模型, 其中Model2实验能够更好地提取到图像特征, 生成更加准确的描述.

中文关键词: 图像描述生成卷积块注意力模块卷积神经网络长短期记忆网络

Abstract:The image caption generation model uses natural language to describe the content of images and the relationship between attributes. In the existing models, there are problems of low description quality, insufficient feature extraction of important parts of images, and high complexity. Therefore, this study proposes an image caption generation model based on a Convolutional Block Attention Module (CBAM), which has an encoder-decoder structure. CBAM is added into the feature extraction network Inception-v4 and as an encoder, extracts the important feature information of the images. The information is then sent into the Long Short-Term Memory (LSTM) of the decoder to generate the caption of the corresponding pictures. The MSCOCO2014 data set is applied to training and testing, and multiple evaluation criteria are used to evaluate the accuracy of the model. The experimental results show that the improved model has a higher evaluation criterion score than other models, and Model2 can better extract image features and generate a more accurate description.

keywords: image caption generation Convolutional Block Attention Module (CBAM) Convolution Neural Network (CNN) Long Short-Term Memory (LSTM)

文章编号： 中图分类号： 文献标志码：

基金项目:河南省电子商务大数据处理与分析重点实验室开放课题（2020-KF-7）；陕西省教育厅科研计划（21JP049）

Author Name	Affiliation	E-mail
YU Hai-Bo	School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China Henan Key Laboratory for Big Data Processing & Analytics of Electronic Commerce, Luoyang 471934, China
CHEN Jin-Guang	School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China Henan Key Laboratory for Big Data Processing & Analytics of Electronic Commerce, Luoyang 471934, China	xacjg@163.com

Author Name	Affiliation	E-mail
YU Hai-Bo	School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China Henan Key Laboratory for Big Data Processing & Analytics of Electronic Commerce, Luoyang 471934, China
CHEN Jin-Guang	School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China Henan Key Laboratory for Big Data Processing & Analytics of Electronic Commerce, Luoyang 471934, China	xacjg@163.com

引用文本：
余海波,陈金广.基于卷积块注意力模块的图像描述生成模型.计算机系统应用,2021,30(8):194-200
YU Hai-Bo,CHEN Jin-Guang.Image Caption Generation Model Based on Convolutional Block Attention Module.COMPUTER SYSTEMS APPLICATIONS,2021,30(8):194-200