###

计算机系统应用英文版:2023,32(4):177-186

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

双裁切近端策略优化算法

张骏^1,2, 王红成¹

(1.东莞理工学院电子工程与智能化学院, 东莞 523808;2.东莞理工学院计算机科学与技术学院, 东莞 523808)

Proximal Policy Optimization with Double Clipping Boundaries

ZHANG Jun^1,2, WANG Hong-Cheng¹

(1.School of Electrical Engineering and Intelligentization, Dongguan University of Technology, Dongguan 523808, China;2.School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523808, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 500次下载 976次
Received:August 23, 2022 Revised:September 27, 2022

中文摘要: 近端策略优化(proximal policy optimization, PPO)是一种稳定的深度强化学习算法, 该算法的关键点之一是使用裁切后的代理目标限制更新步长. 实验发现当使用经验最优的裁切系数时, KL散度 (Kullback-Leibler divergence)无法被确立上界, 这有悖于置信域优化理论. 本文提出一种改进的双裁切近端策略优化算法(proximal policy optimization with double clipping boundaries, PPO-DC). 该算法通过基于概率的两段裁切边界调整KL散度, 将参数限制在置信域内, 以保证样本数据得到充分利用. 在多个连续控制任务中, PPO-DC算法取得了好于其他算法的性能.

中文关键词: 强化学习策略梯度近端策略优化裁切机制

Abstract:Proximal policy optimization (PPO) is a stable deep reinforcement learning algorithm. The key process of the algorithm is to use clipped surrogate targets to limit step size updates. Experiments have found that when a clipping coefficient with optimal experience is employed, the upper bound of Kullback-Leibler (KL) divergence cannot be determined. This phenomenon is against the optimization theory of trust region. In this study, an improved PPO with double clipping boundaries (PPO-DC) algorithm is proposed. The algorithm adjusts the KL divergence based on two probability-based clipping boundaries and limits parameters to the trust region, so as to ensure that the sample data are fully utilized. In several continuous control tasks, the PPO-DC algorithm achieves better performance than other algorithms.

keywords: reinforcement learning policy gradient (PG) proximal policy optimization (PPO) clipping mechanism

文章编号： 中图分类号： 文献标志码：

基金项目:广东省普通高校重点科研平台和项目(2020ZDZX3075)

Author Name	Affiliation	E-mail
ZHANG Jun	School of Electrical Engineering and Intelligentization, Dongguan University of Technology, Dongguan 523808, China School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523808, China
WANG Hong-Cheng	School of Electrical Engineering and Intelligentization, Dongguan University of Technology, Dongguan 523808, China	wanghc@dgut.edu.cn

Author Name	Affiliation	E-mail
ZHANG Jun	School of Electrical Engineering and Intelligentization, Dongguan University of Technology, Dongguan 523808, China School of Computer Science and Technology, Dongguan University of Technology, Dongguan 523808, China
WANG Hong-Cheng	School of Electrical Engineering and Intelligentization, Dongguan University of Technology, Dongguan 523808, China	wanghc@dgut.edu.cn

引用文本：
张骏,王红成.双裁切近端策略优化算法.计算机系统应用,2023,32(4):177-186
ZHANG Jun,WANG Hong-Cheng.Proximal Policy Optimization with Double Clipping Boundaries.COMPUTER SYSTEMS APPLICATIONS,2023,32(4):177-186