融合自适应双重注意力和轴向注意力Transformer的多光谱行人检测

doi:10.15888/j.cnki.csa.009974

AIPUB归智期刊联盟

微信公众号

网站二维码

首页 > 过刊浏览>2025年第34卷第10期 >86-100. DOI:10.15888/j.cnki.csa.009974

PDF HTML阅读 XML下载导出引用引用提醒

融合自适应双重注意力和轴向注意力Transformer的多光谱行人检测
DOI:
                        10.15888/j.cnki.csa.009974
                    
CSTR:
                        32024.14.csa.009974
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:陕西省自然科学基础研究项目(2020JM499, 2020JQ684)

Fusion of Adaptive Dual Attention and Axial Attention Transformer for Multi-spectral Pedestrian Detection

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

针对现有的多光谱行人检测算法存在多模态相互作用不足和融合方法缺乏远程依赖性, 导致在低光照背景下小尺度行人检测性能不足的问题, 提出了一种融合自适应双重注意力和轴向注意力Transformer的多光谱小尺度行人检测算法(adaptive dual attention and axial attention Transformer network, ADATNet). 采用双分支CSPDarknet53网络分别提取可见光和红外图像中的深度特征, 充分保留两种模态的特有信息. 设计两个特征交叉融合模块: 自适应双重注意力模块(adaptive dual attention module, ADAM)和轴向注意力Transformer特征增强(axial attention Transformer feature enhancement, ATFE)模块, 其中ADAM旨在强化模型对关键特征的关注, 同时抑制不相关或冗余的信息; ATFE关联多模态特征的位置编码来融合增强的特征, 在确保计算效率的同时捕获长距离依赖关系. 将融合后的特征输入至检测头以输出最终检测结果. 实验结果表明, ADATNet在KAIST数据集上的MR^–2降低至7.08%, 同时在FLIR和LLVIP数据集上的mAP50分别达到82.8%和97.6%, 较基线方法提升4.7%和1.9%, 具有良好的检测性能.

Abstract:

Existing multi-spectral pedestrian detection algorithms suffer from insufficient multimodal interaction and lack of long-distance dependence of fusion algorithms, thus resulting in poor performance in small-scale pedestrian detection in low-light conditions. To this end, this study proposes an fusion adaptive dual attention and axial attention Transformer network (ADATNet) for multi-spectral small-scale pedestrian detection. The method adopts the dual-branch CSPDarknet53 network to separately extract deep features from visible and infrared images, preserving the unique information from each modality. Meanwhile, two feature fusion modules are designed, including the adaptive dual attention module (ADAM) and axial attention Transformer feature enhancement module (ATFE). ADAM aims to enhance the model’s focus on critical features and suppress irrelevant or redundant information, while ATFE correlates the positional encoding of multimodal features to fuse the enhanced features, thereby both ensuring computational efficiency and capturing long-distance dependence. The fused features are then fed into the detection head to output the final detection results. Experimental results indicate that MR^–2 of ADATNet decreases to 7.08% on the KAIST dataset, while the mAP50 for the FLIR and LLVIP datasets reaches 82.8% and 97.6% respectively. They have an improvement of 4.7% and 1.9% over the baseline methods, demonstrating excellent detection performance.

参考文献

相似文献

引证文献

引用本文

罗建国,王燕妮,韩世鹏,吕昊,张耀荣,吴雪.融合自适应双重注意力和轴向注意力Transformer的多光谱行人检测.计算机系统应用,2025,34(10):86-100

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-02-27
最后修改日期:2025-04-08
录用日期:
在线发布日期: 2025-09-01
出版日期:

微信公众号

网站二维码

引用本文

分享

相关视频

文章指标

历史

文章二维码