双分支注意力与FasterNet相融合的航拍场景分类
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(62173171); 国家自然科学基金青年基金(41801368)


Aerial Scene Classification by Fusion of Dual-branch Attention and FasterNet
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 增强出版
  • |
  • 文章评论
    摘要:

    航拍高分辨率图像的场景类别多且类间相似度高, 经典的基于深度学习的分类方法, 由于在提取特征过程中会产生冗余浮点运算, 运行效率较低, FasterNet通过部分卷积提高了运行效率但会降低模型的特征提取能力, 从而降低模型的分类精度. 针对上述问题, 提出了一种融合FasterNet和注意力机制的混合结构分类方法. 首先采用“十字型卷积模块”对场景特征进行部分提取, 以提高模型运行效率. 然后采用坐标注意力与通道注意力相融合的双分支注意力机制, 以增强模型对于特征的提取能力. 最后将“十字型卷积模块”与双分支注意力模块之间进行残差连接, 使网络能训练到更多与任务相关的特征, 从而在提高分类精度的同时, 减小运行代价, 提高运行效率. 实验结果表明, 与现有基于深度学习的分类模型相比, 所提出的方法, 推理时间短而且准确率高, 参数量为19M, 平均一张图像的推理时间为7.1 ms, 在公开的数据集NWPU-RESISC45、EuroSAT、VArcGIS (10%)和VArcGIS (20%)的分类精度分别为96.12%、98.64%、95.42%和97.87%, 与FasterNet相比分别提升了2.06%、0.77%、1.34%和0.65%.

    Abstract:

    The scenes in high-resolution aerial images are of many highly similar categories. The classic classification method based on deep learning offers low operational efficiency because of the redundant floating-point operations generated in the feature extraction process. FasterNet improves the operational efficiency through partial convolution but reduces the feature extraction ability and hence the classification accuracy of the model. To address the above problems, this study proposes a hybrid structure classification method integrating FasterNet and the attention mechanism. Specifically, the “cross-shaped convolution module” is used to partially extract scene features and thereby improve the operational efficiency of the model. Then, a dual-branch attention mechanism that integrates coordinate attention and channel attention is used to enable the model to better extract features. Finally, a residual connection is made between the “cross-shaped convolution module” and the dual-branch attention module so that more task-related features can be obtained from network training, thereby reducing operational costs and improving operational efficiency in addition to improving classification accuracy. The experimental results show that compared with the existing classification models based on deep learning, the proposed method has a short inference time and high accuracy. Its number of parameters is 19M, and its average inference time for one image is 7.1 ms. The classification accuracy of the proposed method on the public datasets NWPU-RESISC45, EuroSAT, VArcGIS (10%), and VArcGIS (20%) is 96.12%, 98.64%, 95.42%, and 97.87%, respectively, which is 2.06%, 0.77%, 1.34%, and 0.65% higher than that of the FasterNet model, respectively.

    参考文献
    相似文献
    引证文献
引用本文

杨本臣,曲业田,金海波.双分支注意力与FasterNet相融合的航拍场景分类.计算机系统应用,2024,33(5):15-27

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-11-30
  • 最后修改日期:2023-12-29
  • 录用日期:
  • 在线发布日期: 2024-04-07
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号