## Virtual-to-Real: Learning to Controlin Visual Semantic Segmentation

 Figure 1. (a) Semantic image segmentation $s_t$ serves as the meta-state for relating the former two modules. The perception module generates $s_t$ from an RGB input image $x_t$, which comes from different sources in the training ($x^{sys}_t$) and execution ($x^{real}_t$) phases. The control policy module takes $s_t$ as its input, and reacts with $a_t$ according to $\pi$. (b) The visual guidance module enables high-level planning by modifying $s_t$.

Abstract

Collecting training data from the physical world is usually time-consuming and even dangerous for fragile robots, and thus, recent advances in robot learning advocate the use of simulators as the training platform. Unfortunately, the reality gap between synthetic and real visual data prohibits direct migration of the models trained in virtual worlds to the real world. This project proposes a modular architecture for tackling the virtual-to-real problem. The proposed architecture separates the learning model into a perception module and a control policy module, and uses semantic image segmentation as the meta representation for relating these two modules. The perception module translates the perceived RGB image to semantic image segmentation. The control policy module is implemented as a deep reinforcement learning agent, which performs actions based on the translated image segmentation. Our architecture is evaluated in an obstacle avoidance task and a target following task. Experimental results show that our architecture significantly outperforms all of the baseline methods in both virtual and real environments, and demonstrates a faster learning curve than them. We also present a detailed analysis for a variety of variant configurations, and validate the transferability of our modular architecture.

Demo Video

Simulated Environment Setup

Tasks.     The proposed model is evaluated against the baselines (domain randomization and depth map) in virtual and real environments: obstacle avoidance and target following.

(1) Obstacle Avoidance:   The agent's goal is to navigate in a diverse set of scenes, and avoid colliding with obstacles.

(2) Target Following:   The agent's objective is to follow a moving target (e.g., human beings) while avoiding collisions.

Senarios.     We evaluate the models with the following three scenarios. Fig. 2 illustrates a few sample scenes of them.

 Figure 2. Samples of evaluation scenes. From left to right: simple corridor, cluttered hallway, and outdoor. Top to bottom rows are RGB image, segmentation, and depth.

(1) Simple corridor:   This scenario features indoor straight passages, sharp turns, static obstacles (e.g., chairs, boxes, tables, walls, etc.), and moving obstacles (e.g., human).

(2) Cluttered hallway:   This scenario features a hallway crammed with static and moving obstacles for evaluating an agent's capability of avoiding collision in narrow space.

(3) Outdoor:   This scenario features an outdoor roadway with sidewalks, buildings, terrain, as well as moving cars and pedestrians. This is used to evaluate how well the control policy can be transfered from an indoor environment to an outdoor environment. Note that the agent is not allowed to move on the sidewalks in this scenario.

Robotic Platform Setup

 (a) Kobuki (b) Segway RMP 220

Figure 3. Robotic platforms

Fig. 3 shows the robotic platforms for validating the proposed architecture in the real world. We evaluate the pre-trained perception and control policy modules on two robots. These two robots are designed and developed to navigate in different scenarios. Fig. 3(a) shows the Segway RMP 220 robotic platform used for the Simple Corridor and Outdoor scenarios in the obstacle avoidance task, as well as the target following task described above. Segway features a higher motor power and a self-balancing system, enabling it to handle a wider variety of road surfaces. Fig. 3(b) shows the Kobuki robotic platform. We mainly use Kobuki to perform experiments in the Cluttered Hallway scenario because it is smaller in size. Kobuki is especially suitable for navigation in indoor environments containing multiple obstacles. We use robot operating system (ROS) as the interface between the proposed architecture and the robotic platforms. All models are executed on an NVIDIA Jetson TX2 development board. Motion commands are sent to the robots via Ethernet or USB ports. RGB images are fetched from NVIDIA Jetson TX2's onboard camera. No other sensor is used in our system.

Experimental Results and Analysis

Learning curve comparison.     Fig. 4 plots the learning curves of the models. Figs. 4(a) and 4(b) show the curves for the obstacle avoidance tasks and the target following tasks, respectively.

(1) Obstacle Avoidance:   While most of the models achieve nearly optimal performance at the end of the training phase, our methods learn significantly faster than the baseline models. This is due to the fact that scene semantics have simplified the original image representations to structured forms.

(2) Target Following:   It can be observed that our models are much superior to the other baseline models. We also notice that our agents learn to chase the target in the early stages of the training phase, while the baseline models never learn to follow the target.

 Figure 4. Learning curve comparison. (a) Average rewards for obstacle avoidance tasks. (b) Average rewards for target following tasks.

Evaluation results.     We comprehensively analyze the results and perform ablative study of our methodology.

 Figure 5. Evaluation results. (a) Mean rewards of the agents in the obstacle avoidance tasks. (b) Collision rate in the obstacle avoidance tasks. (c) Comparison of the target following tasks in simulated environments. (d) Comparison of target following tasks in the real world.