###

计算机系统应用英文版:2022,31(1):145-151

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于值分布的多智能体分布式深度强化学习算法

陈妙云, 王雷, 盛捷

(中国科学技术大学信息科学与技术学院, 合肥 230027)

Multi-agent Distributed Deep Reinforcement Learning Algorithm Based on Value Distribution

CHEN Miao-Yun, WANG Lei, SHENG Jie

(School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 926次下载 1852次
Received:March 11, 2021 Revised:April 07, 2021

中文摘要: 近年来深度强化学习在一系列顺序决策问题中取得了巨大的成功，使其为复杂高维的多智能体系统提供有效优化的决策策略成为可能.然而在复杂的多智能体场景中，现有的多智能体深度强化学习算法不仅收敛速度慢，而且算法的稳定性无法保证.本文提出了基于值分布的多智能体分布式深度确定性策略梯度算法（multi-agent distributed distributional deep deterministic policy gradient，MA-D4PG），将值分布的思想引入到多智能体场景中，保留预期回报完整的分布信息，使智能体能够获得更加稳定有效的学习信号；引入多步回报，提高算法的稳定性；引入了分布式数据生成框架将经验数据生成和网络更新解耦，从而可以充分利用计算资源，加快算法的收敛.实验证明，本文提出的算法在多个连续/离散控制的多智能体场景中均具有更好的稳定性和收敛速度，并且智能体的决策能力也得到了明显的增强.

中文关键词: 多智能体深度强化学习值分布多步回报分布式数据生成

Abstract:In recent years, deep reinforcement learning has achieved great success in many sequential decision-making problems, which makes it possible to provide effective and optimized decision-making strategies for complex and high-dimensional multi-agent systems. However, in complex multi-agent scenarios, the existing multi-agent deep reinforcement learning algorithm has a low continuous convergence speed, and the stability of the algorithm cannot be guaranteed. Herein, we propose a new multi-agent deep reinforcement learning algorithm, which is called multi-agent distributed distributional deep deterministic policy gradient (MA-D4PG). We adapt the idea of value distribution to multi-agent scenarios and retain the complete distribution information of expected return, so that agents can obtain a more stable and effective learning signal. We also introduce a multi-step return to improve the stability of the algorithm. In addition, we use a distributed data generation framework to decouple empirical data generation and network update for the purpose of taking full advantage of computing resources to speed up the convergence. Experiments show that the proposed method has better stability and a higher convergence speed in multiple continuous/discrete controlled multi-agent scenarios and the decision-making ability of agents has also been significantly enhanced.

keywords: multi-agent deep reinforcement learning value distribution multi-step return distributed data generation

文章编号： 中图分类号： 文献标志码：

基金项目:中国科学技术大学预研基金（YZ2101900004）

Author Name	Affiliation	E-mail
CHEN Miao-Yun	School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China
WANG Lei	School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China	wangl@ustc.edu.cn
SHENG Jie	School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China

Author Name	Affiliation	E-mail
CHEN Miao-Yun	School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China
WANG Lei	School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China	wangl@ustc.edu.cn
SHENG Jie	School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China

引用文本：
陈妙云,王雷,盛捷.基于值分布的多智能体分布式深度强化学习算法.计算机系统应用,2022,31(1):145-151
CHEN Miao-Yun,WANG Lei,SHENG Jie.Multi-agent Distributed Deep Reinforcement Learning Algorithm Based on Value Distribution.COMPUTER SYSTEMS APPLICATIONS,2022,31(1):145-151