###
计算机系统应用英文版:2023,32(3):291-299
本文二维码信息
码上扫一扫!
基于多层级上下文投票的三维密集字幕
(中国石油大学(华东) 计算机科学与技术学院, 青岛 266580)
3D Dense Captioning Method Based on Multi-level Context Voting
(College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 354次   下载 821
Received:August 03, 2022    Revised:September 07, 2022
中文摘要: 传统的三维密集字幕方法存在未充分考虑上下文信息、点云特征信息丢失以及隐藏状态信息量单一等问题. 为了应对这些挑战, 提出了多层级上下文投票网络, 该网络在投票过程中使用自注意力机制捕获点云的上下文信息并加以多层级利用, 提升检测对象的准确率. 同时, 还设计了隐藏状态-注意力时序融合模块, 将当前时刻隐藏状态融合与前一时刻注意力结果融合, 丰富隐藏状态信息量, 从而提高模型表达能力. 除此之外, 采用“两阶段”训练方法, 有效过滤掉生成的低质量对象提案, 增强描述效果. 在官方数据集ScanNet和ScanRefer上的大量实验表明, 该方法与基线方法相比取得了更有竞争力的结果.
Abstract:Traditional three-dimensional (3D) dense captioning methods have problems such as insufficient consideration of point-cloud context information, loss of feature information, and thin hidden state information. Therefore, a multi-level context voting network is proposed. It uses the self-attention mechanism to capture the context information of point clouds in the voting process and utilizes it at multiple levels to improve the accuracy of object detection. Meanwhile, the temporal fusion of hidden state and attention module is designed to fuse the hidden state of the current moment with the attention result of the previous moment to enrich the information of the hidden state and thus improve the expressiveness of the model. In addition, a “two-stage” training method is adopted in the model, which can effectively filter out the generated low-quality object proposals and enhance the description effect. Extensive experiments on official datasets ScanNet and ScanRefer show that this method achieves more competitive results compared to baseline methods.
文章编号:     中图分类号:    文献标志码:
基金项目:山东省自然科学基金(ZR2020MF136)
引用文本:
吴春雷,郝宇钦,李阳.基于多层级上下文投票的三维密集字幕.计算机系统应用,2023,32(3):291-299
WU Chun-Lei,HAO Yu-Qin,LI Yang.3D Dense Captioning Method Based on Multi-level Context Voting.COMPUTER SYSTEMS APPLICATIONS,2023,32(3):291-299