Class Project

Introduction


This poster session will show off the term project of deep reinforcement learning course. All participants will briefly introduce the project for 3 minutes. After all the presentations, a poster session will be open.
  • Date: 2018/06/07 (Thursday)
  • Presentation: 11:00AM - 12:00PM
  • Poster Session: 12:00PM - 1:00PM
  • Location: Building 301 Room 106 

Project Information

Name
Project Title
Presentation Schedule
권재운
sample-efficient reinforcement learning by exploitation bonus
11:00
김동욱
Continuous Control of a Jumping Robot via Behavioral Cloning
11:03 
문경식
3D hand pose estimation using deep reinforcement learning
11:06
백성용
Visual Tracking by Memory Management with Reinforcement Learning
11:09 
김누리
Reinforcement Learning for end-to-end Multi-Class Object Detection
11:12
안찬호
2D Human Pose Estimation via Deep Reinforcement Learning
11:15
이규철
multi-task reinforcement learning using multi-critics
11:18
장재희
 Decentralized training of TORCS using deep reinforcement learning
11:21
홍대영
Dynamically Ordering for Text Comprehension
11:24
하디모데
User adaptive game AI with meta reinforcement learning
11:27
김정훈
Lane change decision making of autonomous vehicle using MIRL approach
11:30
유휘연
Visual Analysis of Sparse Deep Q-Network
11:33
정다흰
Learning to Reason: Neural Module Networks and Policy Optimization via Genetic Operators for Visual Question Answering
11:36
최윤호
Sparse Categorical Distributional Reinforcement Learning
11:39
최재구
Sparse actor-critic with Tsallis entropy regularizer
11:42
김영진
Consistent Representation Learning For Model-based Control
11:45
배상환
Sequence Level Training for Context Representation with Reinforcement Learning
11:48

 

Sample-Efficient Reinforcement Learning by Exploitation Bonus (Jaewoon Kwen)

권재운.jpg Deep learning has brought a great success to reinforcement learning for high-dimensional complex tasks. To tackle MDP problems with sparse reward, exploration methods that facilitates explicit exploration have achieved an impressive progress. However, the increase of sample-complexity that requires excessive amount of samples and resources make it impractical in some sense. We propose a novel method that presents a transition from exploration to exploitation based on reward shaping technique. The narrow-exploration near the trajectories of high return makes exploration more efficient and the learning speed can be significantly enhanced. The proposed method is experimentally evaluated on ‘SparseHalfCheetah’ environment of MuJoCo simulator.

Continuous Control of a Jumping Robot via Behavioral Cloning (Kim DongWook)

김동욱.jpg Reinforcement learning should be applied not only for a simulation based testing, but also for a real robot application. This work implemented a behavioral cloning method to control the robot that has a foot-roll jumping locomotion. We first demonstrated the policies to avoid obstacles and find a goal. Next, we tested the policy and reinforced by aggregating datasets earned by test data. We found that the robot can find the path to the goal in finite episodes for three test cases. This work will be further extended to be so for dynamic obstacles and more diverse situations.

3D hand pose estimation using deep reinforcement learning (Gyeongsik Moon)

문경식.jpg Accurate 3D hand and human pose estimation is an important requirement for activity recognition with diverse applications, such as human-computer interaction or augmented reality [1]. It has been studied for decades in computer vision community and has attracted considerable research interest again due to the introduction of low-cost depth cameras. Recently, powerful discriminative approaches based on convolutional neural networks (CNNs) are outperforming existing methods in various computer vision tasks including 3D hand and human pose estimation from a single depth map [2]–[6]. Although these approaches achieved significant advancement in 3D hand and human pose estimation, they still suffer from inaccurate estimation because of severe self-occlusions, highly articulated shapes of target objects, and low quality of depth images. Analyzing previous deep  learning-based methods for 3D hand pose estimation from a single depth image, all of them are formulated as simple supervised learning, which provides groundtruth joint locations as a groundtruth to the network. In this project, I propose a deep Q-learning [7] based 3D hand pose estimation system to investigate the undiscovered benefit of the reinforcement learning for accurate keypoint localization. The network design is based on V2V-PoseNet [6], which is the stateof- the-art on the three publicly available 3D hand pose estimation datasets [8]–[10].

Visual Tracking by Memory Management with Reinforcement Learning (Sungyong Baik)

백승용.jpg In visual object tracking, targets can sometimes vary its appearances a lot over the course of video frames. Siamesebased trackers do not take this into account and simply use the target’s appearance features in the first frame to track the target. Then, the following question rises: how to update the target appearance such that it can adapt to appearance changes and yet remain robust against occlusion and other noise factors. To model appearance changes, we augment a Siamese network with Convolutional LSTM to predict target’s features in the next frame by getting hints from how the appearance features evolve over time. LSTM, in turn, is augmented with memory to maintain relevant and important past target information to prevent frequent updates from resulting in drifts due to noise, such as occlusion.

Reinforcement Learning for end-to-end Multi-Class Object Detection (Nuri Kim)

김누리.jpg Active object detection is a problem to localize objects in an image with sequential decisions. While active detection can detect  objects with high recall given only small number of candidate bounding boxes, it requires a single network for a single category, which needs a lot of memory for detecting images with multiple categories, e.g., 20 agents for 20 categories. In this work, we address the problem by proposing an end-to-end multi-class active object searching algorithm, which only needs a single network for categorization. Since detectors usually use pre-trained classification networks, we efficiently utilize the network for classification after localizing objects. Experimental results show that fine-tuning a classification network while learning to localize multi-class objects can
substitute a current active search method.

2D Human Pose Estimation via Deep Reinforcement Learning (Chanho Ahn)

안찬호.jpg Human pose estimation problem is applicable to many applications in the field of computer vision, such as action recognition, human re-identification, and 3D-reconstruction. In this paper, we propose an algorithm to solve human pose estimation problem using reinforcement learning. Recent algorithms use a deep learning network to output heat maps of joint coordinates. The use of heat maps dramatically improve the performance of traditional methods. When the heat map is used, the estimated joint coordinates are defined as the position of the maximum value of the heat maps. We analysis it as the Markov Decision Process (MDP) to find the coordinates of the maximum value in the grid maps. Unlike other algorithms, the proposed algorithm can be used as a post-processing method of existing algorithms. Experimental results show that our algorithm improve the Performance of existing algorithms for a benchmark dataset.

Multi-Task Reinforcement Learning using Multi-Critics (Kyoochul Lee)

  Multi-task reinforcement learning has been considered one of solutions to tackle the data-efficiency issue which has been dominating  n reinforcement learning area especially when being applied to large-scale or real-world problems. The core is making the agent less dependent on task-consistency and making it easier to collect training data from various similar tasks to train the agent. However, reality observed was that the policy was biased to one specific task and even overall performance has dropped in the end. This
was because gradients from each task interfere negatively to the optimization of other tasks. I propose new method to attack
this problem, single-actor-multi-critic method, which is a multitask reinforcement learning version of the actor-critic method,
one of the most popular reinforcement learning technics. There exists single actor acting over all the given tasks and each
environment has its own critic, and those multi-critics are fed back and used simultaneously in the actors policy update. I
show that it outperformed the existing methods in several gridworld environments in both performance and learning stability
aspects.

Decentralized training of TORCS using deep reinforcement learning (Jaehee Jang)

장재희.jpg Deep reinforcement learning is an emerging subfield of reinforcement learning (RL). Since recent studies on deep RL rely on empirical evaluation, the long training time is a hindrance to many researchers. Hence different schemes to train deep RL in the distributed manner has been considered. In this project, I propose the implementation of efficient decentralized deep RL frameworks by taking lessons from the distributed machine learning methods. The reason why I chose decentralized topologies over centralized ones is because the machine learning algorithms so far are rather ”error tolerant,” than deep RL algorithms. That is, even if
a limited number of updates are incorrectly computed or transmitted, ML algorithms showed or even mathematically guaranteed to converge, which is not common when training deep RL.

Dynamically Ordering for Text Comprehension (Daeyoung Hong)

  The problem of the project is text classification, which is finding the class or category of a text, given the set of classes.

User Adaptive Game AI with Meta Reinforcement Learning (Timothy Ha)

하디모데.jpg In this project, an autonomous driving agent will be trained to go to the goal position in the 2D space. Our method uses RRT path planning algorithm and a heuristic controller. This controller generates guiding trajectories which helps the training of the policy with policy gradient method. The use of our method make the update of the policy more efficient. In the experiment, we use a simple car simulator. Consequently, we show that the proposed driving agent can be updated with our data augmentation, and our method converges to the optimal policy more rapidly.

Lane Change Decision Making of Autonomous Vehicle using MIRL Approach (Junghoon Kim)

  A recent autonomous driving research has shown remarkable and promising results. However, safe, and also sociable driving in urban environment still has many challenges ahead. For realizing safe, sociable driving in complex urban scenario, It is required to understand surrounding vehicles’, mostly driven by humans, intention. Based on this information, an automated vehicle should be cooperative, or competitive in some sense, which should also enable humans to understand its behavior. But understanding and predicting others intention is usually very hard because each of traffic participants affects each other simultaneously. Fortunately,
there have been significant advances in machine learning and robotics for solving this interaction problem. This project try to solve this interaction problem adopting game-theoretic approach, one of mathematical models of representing conflict and cooperation between intelligent rational decision-makers. With the assumption that driving environment is also an zero-sum or general sum stochastic game, we can interpret expert participants follow an optimal policy in the sense of the Nash Equilibrium, and with this formulation, we can solve multiagent(two player) IRL(GAIL) problem.

Visual Analysis of Sparse Deep Q-Network (Hwiyeon Yoo)

유휘연.jpg In this paper, we investigate the interpretable causes of the actions of a deep Q-Network(DQN) agent that plays Atari games. We visualize the interest region of each selected action by Grad-CAM to analyze the implicit factors of environment that cannot be represented by explicit values. Also we compare the effect of the number of skipped frames to receive from the Atari environment by visualizing the agent’s interest map.

Learning to Reason: Neural Module Networks and Policy Optimization via Genetic Operators for Visual Question Answering (Dahuin Jung)

정다흰.jpg The development of deep learning mainly has focused on images and texts. Multimodal tasks recently have been attracting attention as one of the interesting research fields in deep learning. Among the multimodal tasks, currently, the fastest growing one is Visual Question Answering (VQA). This paper is based on two papers, by Hu (N2NMNs) [1] and Gangwani (GPO) [2]. Hu argues that most of current VQA papers do not structurally analyze questions through meticulous reasoning, but merely express statistical biases of the distribution of given data (question-image pairs) in a gradient-descent manner [1]. Hu’s approach was to parse each question into multiple subtasks and sequentially predict the neural network layouts corresponding to each substack and then assemble them. In other words, N2NMNs predict question-specific network layouts with respect to each question and simultaneously train the predicted network parameters with given question pair.

Sparse Categorical Distributional Reinforcement Learning (Yunho Choi)

최윤호.jpg In recent studies on reinforcement learning, approaches based on the distributional perspective which models the randomness of the value distribution have been shown to outperform traditional approaches where an agent estimates a expectation of the value. In this paper, we build on recent work, categorical deep Q-netowrk (DQN) in which a distribution over a state-action value is modeled explicitly with a categorical distribution. To further enhance the sample complexity of the categorical DQN by injecting the notion of sparsity into the algorithm, we propose to use the sparsemax distribution for categorical value distribution instead of using conventional softmax distribution. To deal with disjoint support problem which arises when using sparsemax distribution, we also propose a new metric for training loss, which is shown to make Bellman distributional operator a contraction mapping with respect to itself. We have conducted a various set of experiments to validate the approaches to enforce sparsity in the categorical DQN setting. As a result, the proposed method outperforms the state-of-the-art algorithm, C51.

Sparse Actor-Critic with Tsallis Entropy Regularizer (Jaegu Choy)

최재구.jpg In this paper, a sparse Markov decision process (MDP) with Tsallis entropy regularizer is proposed. The proposed method applies a sparse MDP to an actor-critic method, so that multi-modal but sparse policy can be learned separately from the value function. Based on the optimality condition of a sparse MDP, we derive loss functions for updating each function through the relation between optimal value function and policy function of a sparse MDP. The proposed loss functions has the advantage that it does not need to do cumbersome calculation for supporting set of existing sparse MDP. Thanks to this advantage, we were able to extend the proposed method to continuous action space where finding a supporting set is generally intractable. We use mixture of Gaussian distribution to represent a policy in continuous action space and we make weight of mixtures sparse to get a more sparse policy function than the original mixture of Gaussian distributions. In experiments, we apply sparse actor-critic to reinforcement learning problems and show that our proposed method successfully optimizes given problems.

Learning Consistent Representation For Model-based Control (Youngjin Kim)

김영진.jpg Model-based deep reinforcement learning (RL) algorithms shed light on dramatic improvement of sample-efficiency, but they learn consistent dynamics only in severely limited environments. In this work, we introduce two complementary obstacles that hinder neural network based explicit transition models  from scaling to high-dimensional sequential decision making problems, in terms of model architecture and optimization. We propose four principles and corresponding novel methods to learn consistent state representation based on each diagnosis. We propose energy-based architecture that is designed for improved sample efficiency and generalizability. Proposed architecture requires smaller number of model parameters and is more sample-efficient than previous deep transition models. Further more, it does not make assumptions about state representation, state dimensionality or parametric form of distributions. Second, we also propose an entropy-regularizing replay buffer that can re-order sequential correlated samples so that mini-batch samples are fed as if they are approximately independent and identically distributed samples of all previously observed trajectories. We design experiments to show that using our approaches neural network based transition model learns latent dynamics that is consistent along trajectories in different scale, degree of correlation.

Sequence Level Training for Context Representation with Reinforcement Learning (Sanghwan Bae)

배상환.jpg Distributed representations – dense real-valued vectors that encode the semantics of linguistic units – are ubiquitous in today's NLP research. Similar to word embeddings, context vectors (CoVe) represent words in the context of sentences. However, the loss function used to train CoVe is at word level. With reinforcement learning, it can be possible to use sequence level metrics like BLEU score as objectives. This work shows that the sequence level training have advantages in context representations. From experiements with several sentence classification tasks, the result showed that the CoVe with sequence level training unifromly outperforms CoVe without it. And the proposed model also achives the comparable scores with the pretrained original CoVe, which was trained on 35 times bigger dataset.