Exploration via Flow-Based Intrinsic Rewards

Playing SuperMario via flow-based curiosity exploration & RL agent.

ArxivMedium (English)Medium (JP)Github

In order to overcome the obstacles and develop a much more effective approach for providing an agent with sufficient intrinsic rewards, we introduce the new methodology, called Flow-Based Intrinsic Curiosity Module (FICM).

Abstract

Exploration bonuses derived from the novelty of observations in an environment have become a popular approach to motivate exploration for reinforcement learning (RL) agents in the past few years. Recent methods such as curiosity-driven exploration usually estimate the novelty of new observations by the prediction errors of their system dynamics models. In this paper, we introduce the concept of optical flow estimation from the field of computer vision to the RL domain and utilize the errors from optical flow estimation to evaluate the novelty of new observations. We introduce a flow-based intrinsic curiosity module (FICM) capable of learning the motion features and understanding the observations in a more comprehensive and efficient fashion. We evaluate our method and compare it with a number of baselines on several benchmark environments, including Atari games, Super Mario Bros., and ViZDoom. Our results show that the proposed method is superior to the baselines in certain environments, especially for those featuring sophisticated moving patterns or with high-dimensional observation spaces. We further analyze the hyper-parameters used in the training phase and discuss our insights into them.

Overview

Demo video


© 2019. All rights reserved.