Introduction
This poster session will show off the term project of deep reinforcement learning course. All participants will briefly introduce the project for 3 minutes. After all the presentations, a poster session will be open.
 Date: 2018/06/07 (Thursday)
 Presentation: 11:00AM  12:00PM
 Poster Session: 12:00PM  1:00PM
 Location: Building 301 Room 106
Project Information
Name

Project Title

Presentation Schedule

권재운

sampleefficient reinforcement learning by exploitation bonus 
11:00

김동욱

Continuous Control of a Jumping Robot via Behavioral Cloning 
11:03

문경식

3D hand pose estimation using deep reinforcement learning 
11:06

백성용

Visual Tracking by Memory Management with Reinforcement Learning 
11:09

김누리

Reinforcement Learning for endtoend MultiClass Object Detection 
11:12

안찬호

2D Human Pose Estimation via Deep Reinforcement Learning 
11:15

이규철

multitask reinforcement learning using multicritics 
11:18

장재희

Decentralized training of TORCS using deep reinforcement learning 
11:21

홍대영

Dynamically Ordering for Text Comprehension 
11:24

하디모데

User adaptive game AI with meta reinforcement learning 
11:27

김정훈

Lane change decision making of autonomous vehicle using MIRL approach 
11:30

유휘연

Visual Analysis of Sparse Deep QNetwork 
11:33

정다흰

Learning to Reason: Neural Module Networks and Policy Optimization via Genetic Operators for Visual Question Answering

11:36

최윤호

Sparse Categorical Distributional Reinforcement Learning 
11:39

최재구

Sparse actorcritic with Tsallis entropy regularizer 
11:42

김영진

Consistent Representation Learning For Modelbased Control 
11:45

배상환

Sequence Level Training for Context Representation with Reinforcement Learning 
11:48

SampleEfficient Reinforcement Learning by Exploitation Bonus (Jaewoon Kwen)

Deep learning has brought a great success to reinforcement learning for highdimensional complex tasks. To tackle MDP problems with sparse reward, exploration methods that facilitates explicit exploration have achieved an impressive progress. However, the increase of samplecomplexity that requires excessive amount of samples and resources make it impractical in some sense. We propose a novel method that presents a transition from exploration to exploitation based on reward shaping technique. The narrowexploration near the trajectories of high return makes exploration more efficient and the learning speed can be significantly enhanced. The proposed method is experimentally evaluated on ‘SparseHalfCheetah’ environment of MuJoCo simulator. 
Continuous Control of a Jumping Robot via Behavioral Cloning (Kim DongWook)

Reinforcement learning should be applied not only for a simulation based testing, but also for a real robot application. This work implemented a behavioral cloning method to control the robot that has a footroll jumping locomotion. We first demonstrated the policies to avoid obstacles and find a goal. Next, we tested the policy and reinforced by aggregating datasets earned by test data. We found that the robot can find the path to the goal in finite episodes for three test cases. This work will be further extended to be so for dynamic obstacles and more diverse situations. 
3D hand pose estimation using deep reinforcement learning (Gyeongsik Moon)

Accurate 3D hand and human pose estimation is an important requirement for activity recognition with diverse applications, such as humancomputer interaction or augmented reality [1]. It has been studied for decades in computer vision community and has attracted considerable research interest again due to the introduction of lowcost depth cameras. Recently, powerful discriminative approaches based on convolutional neural networks (CNNs) are outperforming existing methods in various computer vision tasks including 3D hand and human pose estimation from a single depth map [2]–[6]. Although these approaches achieved significant advancement in 3D hand and human pose estimation, they still suffer from inaccurate estimation because of severe selfocclusions, highly articulated shapes of target objects, and low quality of depth images. Analyzing previous deep learningbased methods for 3D hand pose estimation from a single depth image, all of them are formulated as simple supervised learning, which provides groundtruth joint locations as a groundtruth to the network. In this project, I propose a deep Qlearning [7] based 3D hand pose estimation system to investigate the undiscovered benefit of the reinforcement learning for accurate keypoint localization. The network design is based on V2VPoseNet [6], which is the stateof theart on the three publicly available 3D hand pose estimation datasets [8]–[10]. 
Visual Tracking by Memory Management with Reinforcement Learning (Sungyong Baik)

In visual object tracking, targets can sometimes vary its appearances a lot over the course of video frames. Siamesebased trackers do not take this into account and simply use the target’s appearance features in the first frame to track the target. Then, the following question rises: how to update the target appearance such that it can adapt to appearance changes and yet remain robust against occlusion and other noise factors. To model appearance changes, we augment a Siamese network with Convolutional LSTM to predict target’s features in the next frame by getting hints from how the appearance features evolve over time. LSTM, in turn, is augmented with memory to maintain relevant and important past target information to prevent frequent updates from resulting in drifts due to noise, such as occlusion. 
Reinforcement Learning for endtoend MultiClass Object Detection (Nuri Kim)

Active object detection is a problem to localize objects in an image with sequential decisions. While active detection can detect objects with high recall given only small number of candidate bounding boxes, it requires a single network for a single category, which needs a lot of memory for detecting images with multiple categories, e.g., 20 agents for 20 categories. In this work, we address the problem by proposing an endtoend multiclass active object searching algorithm, which only needs a single network for categorization. Since detectors usually use pretrained classification networks, we efficiently utilize the network for classification after localizing objects. Experimental results show that finetuning a classification network while learning to localize multiclass objects can substitute a current active search method. 
2D Human Pose Estimation via Deep Reinforcement Learning (Chanho Ahn)

Human pose estimation problem is applicable to many applications in the field of computer vision, such as action recognition, human reidentification, and 3Dreconstruction. In this paper, we propose an algorithm to solve human pose estimation problem using reinforcement learning. Recent algorithms use a deep learning network to output heat maps of joint coordinates. The use of heat maps dramatically improve the performance of traditional methods. When the heat map is used, the estimated joint coordinates are defined as the position of the maximum value of the heat maps. We analysis it as the Markov Decision Process (MDP) to find the coordinates of the maximum value in the grid maps. Unlike other algorithms, the proposed algorithm can be used as a postprocessing method of existing algorithms. Experimental results show that our algorithm improve the Performance of existing algorithms for a benchmark dataset. 
MultiTask Reinforcement Learning using MultiCritics (Kyoochul Lee)

Multitask reinforcement learning has been considered one of solutions to tackle the dataefficiency issue which has been dominating n reinforcement learning area especially when being applied to largescale or realworld problems. The core is making the agent less dependent on taskconsistency and making it easier to collect training data from various similar tasks to train the agent. However, reality observed was that the policy was biased to one specific task and even overall performance has dropped in the end. This was because gradients from each task interfere negatively to the optimization of other tasks. I propose new method to attack this problem, singleactormulticritic method, which is a multitask reinforcement learning version of the actorcritic method, one of the most popular reinforcement learning technics. There exists single actor acting over all the given tasks and each environment has its own critic, and those multicritics are fed back and used simultaneously in the actors policy update. I show that it outperformed the existing methods in several gridworld environments in both performance and learning stability aspects. 
Decentralized training of TORCS using deep reinforcement learning (Jaehee Jang)

Deep reinforcement learning is an emerging subfield of reinforcement learning (RL). Since recent studies on deep RL rely on empirical evaluation, the long training time is a hindrance to many researchers. Hence different schemes to train deep RL in the distributed manner has been considered. In this project, I propose the implementation of efficient decentralized deep RL frameworks by taking lessons from the distributed machine learning methods. The reason why I chose decentralized topologies over centralized ones is because the machine learning algorithms so far are rather ”error tolerant,” than deep RL algorithms. That is, even if a limited number of updates are incorrectly computed or transmitted, ML algorithms showed or even mathematically guaranteed to converge, which is not common when training deep RL. 
Dynamically Ordering for Text Comprehension (Daeyoung Hong)

The problem of the project is text classification, which is finding the class or category of a text, given the set of classes. 
User Adaptive Game AI with Meta Reinforcement Learning (Timothy Ha)

In this project, an autonomous driving agent will be trained to go to the goal position in the 2D space. Our method uses RRT path planning algorithm and a heuristic controller. This controller generates guiding trajectories which helps the training of the policy with policy gradient method. The use of our method make the update of the policy more efficient. In the experiment, we use a simple car simulator. Consequently, we show that the proposed driving agent can be updated with our data augmentation, and our method converges to the optimal policy more rapidly. 
Lane Change Decision Making of Autonomous Vehicle using MIRL Approach (Junghoon Kim)

A recent autonomous driving research has shown remarkable and promising results. However, safe, and also sociable driving in urban environment still has many challenges ahead. For realizing safe, sociable driving in complex urban scenario, It is required to understand surrounding vehicles’, mostly driven by humans, intention. Based on this information, an automated vehicle should be cooperative, or competitive in some sense, which should also enable humans to understand its behavior. But understanding and predicting others intention is usually very hard because each of traffic participants affects each other simultaneously. Fortunately, there have been significant advances in machine learning and robotics for solving this interaction problem. This project try to solve this interaction problem adopting gametheoretic approach, one of mathematical models of representing conflict and cooperation between intelligent rational decisionmakers. With the assumption that driving environment is also an zerosum or general sum stochastic game, we can interpret expert participants follow an optimal policy in the sense of the Nash Equilibrium, and with this formulation, we can solve multiagent(two player) IRL(GAIL) problem. 
Visual Analysis of Sparse Deep QNetwork (Hwiyeon Yoo)

In this paper, we investigate the interpretable causes of the actions of a deep QNetwork(DQN) agent that plays Atari games. We visualize the interest region of each selected action by GradCAM to analyze the implicit factors of environment that cannot be represented by explicit values. Also we compare the effect of the number of skipped frames to receive from the Atari environment by visualizing the agent’s interest map. 
Learning to Reason: Neural Module Networks and Policy Optimization via Genetic Operators for Visual Question Answering (Dahuin Jung)

The development of deep learning mainly has focused on images and texts. Multimodal tasks recently have been attracting attention as one of the interesting research fields in deep learning. Among the multimodal tasks, currently, the fastest growing one is Visual Question Answering (VQA). This paper is based on two papers, by Hu (N2NMNs) [1] and Gangwani (GPO) [2]. Hu argues that most of current VQA papers do not structurally analyze questions through meticulous reasoning, but merely express statistical biases of the distribution of given data (questionimage pairs) in a gradientdescent manner [1]. Hu’s approach was to parse each question into multiple subtasks and sequentially predict the neural network layouts corresponding to each substack and then assemble them. In other words, N2NMNs predict questionspecific network layouts with respect to each question and simultaneously train the predicted network parameters with given question pair. 
Sparse Categorical Distributional Reinforcement Learning (Yunho Choi)

In recent studies on reinforcement learning, approaches based on the distributional perspective which models the randomness of the value distribution have been shown to outperform traditional approaches where an agent estimates a expectation of the value. In this paper, we build on recent work, categorical deep Qnetowrk (DQN) in which a distribution over a stateaction value is modeled explicitly with a categorical distribution. To further enhance the sample complexity of the categorical DQN by injecting the notion of sparsity into the algorithm, we propose to use the sparsemax distribution for categorical value distribution instead of using conventional softmax distribution. To deal with disjoint support problem which arises when using sparsemax distribution, we also propose a new metric for training loss, which is shown to make Bellman distributional operator a contraction mapping with respect to itself. We have conducted a various set of experiments to validate the approaches to enforce sparsity in the categorical DQN setting. As a result, the proposed method outperforms the stateoftheart algorithm, C51. 
Sparse ActorCritic with Tsallis Entropy Regularizer (Jaegu Choy)

In this paper, a sparse Markov decision process (MDP) with Tsallis entropy regularizer is proposed. The proposed method applies a sparse MDP to an actorcritic method, so that multimodal but sparse policy can be learned separately from the value function. Based on the optimality condition of a sparse MDP, we derive loss functions for updating each function through the relation between optimal value function and policy function of a sparse MDP. The proposed loss functions has the advantage that it does not need to do cumbersome calculation for supporting set of existing sparse MDP. Thanks to this advantage, we were able to extend the proposed method to continuous action space where finding a supporting set is generally intractable. We use mixture of Gaussian distribution to represent a policy in continuous action space and we make weight of mixtures sparse to get a more sparse policy function than the original mixture of Gaussian distributions. In experiments, we apply sparse actorcritic to reinforcement learning problems and show that our proposed method successfully optimizes given problems. 
Learning Consistent Representation For Modelbased Control (Youngjin Kim)

Modelbased deep reinforcement learning (RL) algorithms shed light on dramatic improvement of sampleefficiency, but they learn consistent dynamics only in severely limited environments. In this work, we introduce two complementary obstacles that hinder neural network based explicit transition models from scaling to highdimensional sequential decision making problems, in terms of model architecture and optimization. We propose four principles and corresponding novel methods to learn consistent state representation based on each diagnosis. We propose energybased architecture that is designed for improved sample efficiency and generalizability. Proposed architecture requires smaller number of model parameters and is more sampleefficient than previous deep transition models. Further more, it does not make assumptions about state representation, state dimensionality or parametric form of distributions. Second, we also propose an entropyregularizing replay buffer that can reorder sequential correlated samples so that minibatch samples are fed as if they are approximately independent and identically distributed samples of all previously observed trajectories. We design experiments to show that using our approaches neural network based transition model learns latent dynamics that is consistent along trajectories in different scale, degree of correlation. 
Sequence Level Training for Context Representation with Reinforcement Learning (Sanghwan Bae)

Distributed representations – dense realvalued vectors that encode the semantics of linguistic units – are ubiquitous in today's NLP research. Similar to word embeddings, context vectors (CoVe) represent words in the context of sentences. However, the loss function used to train CoVe is at word level. With reinforcement learning, it can be possible to use sequence level metrics like BLEU score as objectives. This work shows that the sequence level training have advantages in context representations. From experiements with several sentence classification tasks, the result showed that the CoVe with sequence level training unifromly outperforms CoVe without it. And the proposed model also achives the comparable scores with the pretrained original CoVe, which was trained on 35 times bigger dataset. 