Reinforcement Learning Representation Algorithm Combining Forward State Prediction and Latent Space Regularization

doi:10.15888/j.cnki.csa.008801

WeChat

Mobile website

Home > Archive>Volume 31, Issue 11, 2022 >148-156. DOI:10.15888/j.cnki.csa.008801

PDF HTML XML Export Cite reminder

Reinforcement Learning Representation Algorithm Combining Forward State Prediction and Latent Space Regularization
DOI:
                        10.15888/j.cnki.csa.008801
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Although deep reinforcement learning can solve many complex control problems, it needs to pay the cost of a large number of interactions with the environment, which is a major challenge for deep reinforcement learning. One of the reasons for this problem is that it is difficult for an agent to extract effective features from a high-dimensional complex input only by relying on the loss of value function. As a result, the agent has an insufficient understanding of the state and cannot correctly assign value to the state. Therefore, this study proposes a regularized predictive representation learning (RPRL) method combining forward state prediction and latent space constraint to make agents know the environment and improve the sample efficiency of reinforcement learning. The method helps agents to learn and extract state features from high-dimensional visual observations to improve the sample efficiency of reinforcement learning. The forward state transfer loss is used as the auxiliary loss so that the features learned by agents contain dynamic information related to environmental transition. At the same time, the state representation of latent space is regularized on the basis of forward prediction, which further helps the agent to learn the smooth and regular representation of the high-dimensional input. In DeepMind Control (DMControl) environment, the proposed method achieves better performance than other model-based methods and model-free methods with representation learning.

Reference

Cited by

Get Citation

项宇,秦进,袁琳琳.结合向前状态预测和隐空间约束的强化学习表示算法.计算机系统应用,2022,31(11):148-156

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:February 12,2022
Revised:March 23,2022
Adopted:
Online: July 15,2022
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

WeChat

Mobile website

Get Citation

Share

Article Metrics

History