Deep RL Papers
Spring 2021
Learning from Demonstration (Behavior Cloning)
- S. Ross and D. Bagnell, “Efficient reductions for imitation learning,” in Proc. of the International Conference on Artificial Intelligence and Statistics, May, 2010.
- S. Ross, G. Gordon, and D. Bagnell, "A reduction of imitation learning and structured prediction to no-regret online learning," in Proc. of the international conference on artificial intelligence and statistics, 2011.
- Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, Songhwai Oh, "Real-Time Nonparametric Reactive Navigation of Mobile Robots in Dynamic Environments," Robotics and Autonomous Systems, vol. 91, pp. 11–24, May 2017.
- Sungjoon Choi, Eunwoo Kim, and Songhwai Oh, "Real-Time Navigation in Crowded Dynamic Environments Using Gaussian Process Motion Control," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), Jun. 2014. [Video]
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstrations with Mixed Qualities Using Leveraged Gaussian Processes," IEEE Transactions on Robotics, vol. 35, no. 3, pp. 564-576, Jun. 2019.
- Sungjoon Choi, Eunwoo Kim, Kyungjae Lee, and Songhwai Oh, "Leveraged Non-Stationary Gaussian Process Regression for Autonomous Robot Navigation," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2015.
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Robust Learning from Demonstration Using Leveraged Gaussian Processes and Sparse-Constrained Optimization," in Proc. of the IEEE International Conference on Robotics and Automation (ICRA), May 2016.
- Giusti, Alessandro, Jérôme Guzzi, Dan C. Cireşan, Fang-Lin He, Juan P. Rodríguez, Flavio Fontana, Matthias Faessler et al. "A machine learning approach to visual perception of forest trails for mobile robots." IEEE Robotics and Automation Letters 1, no. 2 (2016): 661-667.
- Sungjoon Choi, Kyungjae Lee, and Songhwai Oh, "Scalable Robust Learning from Demonstration with Leveraged Deep Neural Networks," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sep. 2017.
- Loquercio, Antonio, Ana Isabel Maqueda, Carlos R. Del Blanco, and Davide Scaramuzza. "DroNet: Learning to Fly by Driving." IEEE Robotics and Automation Letters (2018). (http://rpg.ifi.uzh.ch/dronet.html)
Deep RL
- Sergey Levine, Vladlen Koltun, "Guided Policy Search," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2013.
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., and Petersen, S. "Human-level control through deep reinforcement learning," Nature, 2015.
- T. Schaul, J. Quan, I. Antonoglou, and D. Silver, "Prioritized Experience Replay," arXiv, 2015.
- J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, "Trust Region Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Jul, 2015. [arXiv]
- H. van Hasselt, A. Guez, and D. Silver, "Deep reinforcement learning with double q-learning," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb, 2016.
- Z. Wang, N. de Freitas, and M. Lanctot, "Dueling Network Architectures for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- Junhyuk Oh, Valliappa Chockalingam, Satinder P. Singh, Honglak Lee, "Control of Memory, Active Perception, and Action in Minecraft," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, "Asynchronous Methods for Deep Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- Shixiang Gu, Timothy P. Lillicrap, Ilya Sutskever, Sergey Levine, "Continuous Deep Q-Learning with Model-based Acceleration," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy, "Deep Exploration via Bootstrapped DQN," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- Aviv Tamar, Sergey Levine, Pieter Abbeel, Yi Wu, Garrett Thomas, "Value Iteration Networks," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Rémi Munos, "Unifying Count-Based Exploration and Intrinsic Motivation," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- Rein Houthooft, Xi Chen, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel, "VIME: Variational Information Maximizing Exploration," Advances in Neural Information Processing Systems (NIPS), Dec, 2016.
- Tejas D. Kulkarni*, Karthik R. Narasimhan*, Ardavan Saeedi, Joshua B. Tenenbaum, "Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation." Advances in neural information processing systems (NIPS), Dec, 2016.
- John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, and Pieter Abbeel, "High-Dimensional Continuous Control Using Generalized Advantage Estimation," in Proc. of the International Conference of Learning Representations (ICLR), May, 2016.
- Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel, "End-to-End Training of Deep Visuomotor Policies," Journal of Machine Learning Research (JMLR), 2016.
- David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis, "Mastering the game of Go with deep neural networks and tree search," Nature, 2016.
- David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis, "Mastering the game of Go without human knowledge," Nature, 2017.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. M. O. Heess, T. Erez, Y. Tassa, and D. P. Wierstra, "Continuous control with deep reinforcement learning," U.S. Patent Application No. 15/217,758, 2017. [arXiv]
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017.
- J. Schulman, P. Abbeel, and X. Chen, "Equivalence between policy gradients and soft Q-Learning," arXiv preprint arXiv:1704.06440, 2017.
- P. H. Richemond, and Brendan Maginnis, "A short variational proof of equivalence between policy gradients and soft Q learning," arXiv preprint arXiv:1712.08650, 2017.
- J. Achiam, D. Held, A. Tamar, P. Abbeel, "Constrained Policy Optimization," in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017. [arXiv]
- Y. Chebotar, K. Hausman, M. Zhang, G. Sukhatme, S. Schaal, and S. Levine, "Combining model-based and model-free updates for trajectory-centric reinforcement learning," In Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine, "Reinforcement Learning with Deep Energy-Based Policies," In Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu, "FeUdal Networks for Hierarchical Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Z. Wang, V. Bapst, N. Heess, V. Mnih,R. Munos, K. Kavukcuoglu, and N. de Freitas, "Sample efficient actor-critic with experience replay," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
- B. O’Donoghue, R. Munos, K. Kavukcuoglu, and V. Mnih, "PGQ: Combining policy gradient and Q-learning," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
- S. Gu, T. Lillicrap, Z. Ghahramani, R. E. Turner, and S. Levine, "Q-Prop: Sample efficient policy gradient with an off-policy critic," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
- Y. Wu, E. Mansimov, R. B. Grosse, S. Liao, and J. Ba, "Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation," Advances in neural information processing systems (NIPS), Dec, 2017.
- O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans, "Bridging the gap between value and policy based reinforcement learning," Advances in neural information processing systems (NIPS), Dec, 2017.
- Junhyuk Oh, Satinder Singh, Honglak Lee, "Value Prediction Network," Advances in neural information processing systems (NIPS), Dec, 2017.
- Paul F. Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, Dario Amodei, "Deep Reinforcement Learning from Human Preferences," Advances in neural information processing systems (NIPS), Dec, 2017.
-
Andrychowicz, Marcin, Dwight Crow, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba, "Hindsight experience replay," Advances in neural information processing systems (NIPS), Dec, 2017.
- Justin Fu, John Co-Reyes, and Sergey Levine, "Ex2: Exploration with exemplar models for deep reinforcement learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
- Yevgen Chebotar, Mrinal Kalakrishnan, Ali Yahya, Adrian Li, Stefan Schaal, Sergey Levine, "Path integral guided policy search," in Proc. of the International Conference on Robotics and Automation (ICRA), May, 2017.
- O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans, "Trust-PCL: An Off-Policy Trust Region Method for Continuous Control," In Proc. of the International Conference on Learning Representations (ICLR), Apr, 2018.
-
Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J, Leibo JZ, "Deep Q-learning from Demonstrations," Association for the Advancement of Artificial Intelligence (AAAI). 2018.
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466-1473, Jul. 2018. [Supplementary Material | Video | arXiv preprint],
- Scott Fujimoto, Herke Hoof, and David Meger. "Addressing Function Approximation Error in Actor-Critic Methods.", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
- Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine, "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
- Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine, "Data-Efficient Hierarchical Reinforcement Learning", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
- Kyungjae Lee, Sungyub Kim, Sungbin Lim, Sungjoon Choi, Mineui Hong, Jaein Kim, Yong-Lae Park, and Songhwai Oh, "Generalized Tsallis Entropy Reinforcement Learning and Its Application to Soft Mobile Robots," in Proc. of Robotics: Science and Systems (RSS), Jul. 2020. [Supplementary Material | arXiv preprint]
- Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov, "Exploration by Random Network Distillation", in Proc. of the International Conference on Learning Representations (ICLR), May, 2019.
- Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew J. Johnson, Sergey Levine, "SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning." International Conference on Machine Learning. 2019.
- David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy Lillicrap, Gregory Wayne, "Experience replay for continual learning." Advances in Neural Information Processing Systems. 2019.
- Fujimoto, Scott, David Meger, and Doina Precup. "Off-Policy Deep Reinforcement Learning without Exploration." International Conference on Machine Learning. 2019.
- Du, Yilun, and Karthik Narasimhan. "Task-agnostic dynamics priors for deep reinforcement learning." arXiv preprint arXiv:1905.04819 (2019).
- Matas, Jan, Stephen James, and Andrew J. Davison. "Sim-to-real reinforcement learning for deformable object manipulation." arXiv preprint arXiv:1806.07851 (2018).
- Dosovitskiy, Alexey, and Vladlen Koltun. "Learning to act by predicting the future." arXiv preprint arXiv:1611.01779 (2016).
Distributional RL
- Marc G. Bellemare, Will Dabney, and Rémi Munos, "A Distributional Perspective on Reinforcement Learning," in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Will Dabney, Mark Rowland, Marc G. Bellemare, and Rémi Munos, "Distributional Reinforcement Learning with Quantile Regression", in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Feb, 2018.
- Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap, "Distributed Distributional Deterministic Policy Gradients ," in Proc. of the International Conference on Learning Representations (ICLR), Feb, 2018.
- Will Dabney*, Georg Ostrovski*, David Silver, and Rémi Munos, "Implicit Quantile Networks for Distributional Reinforcement Learning", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2018.
Meta RL
- Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, and Pieter Abbeel, "RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning", arXiv, 2016.
- Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick, "Learning to Reinforcement Learn", arXiv, 2016.
- Chelsea Finn, Pieter Abbeel, and Sergey Levine, "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks", in Proc. of the International Conference on Machine Learning (ICML), Aug, 2017.
- Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, and Pieter Abbeel, "A Simple Neural Attentive Meta-Learner", in Proc. of the International Conference on Learning Representations (ICLR), Feb, 2018.
-
Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, and Sergey Levine, "Meta-Reinforcement Learning of Structured Exploration Strategies", in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
Inverse RL
- A. Y. Ng and S. Russell, “Algorithms for inverse reinforcement learning,” in Proc. of the 17th International Conference on Machine Learning (ICML), Jun, 2000.
- B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin, “Learning structured prediction models: A large margin approach,” in Proc. of International Conference on Machine learning (ICML), Aug, 2005.
- N. D. Ratliff, J. A. Bagnell, and M. Zinkevich, “Maximum margin planning,” in Proc. of the International Conference on Machine Learning (ICML), Jun, 2006.
- N. D. Ratliff, D. M. Bradley, J. A. Bagnell, and J. E. Chestnutt, “Boosting structured prediction for imitation learning,” Advances in Neural Information Processing Systems (NIPS), Dec, 2007.
- D. Ramachandran and E. Amir, “Bayesian inverse reinforcement learning,” in Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), Jan, 2007.
- B. D. Ziebart, A. L. Maas, J. A. Bagnell, and A. K. Dey, “Maximum entropy inverse reinforcement learning,” in Proc. of the 23rd AAAI Conference on Artificial Intelligence (AAAI), July, 2008.
- N. D. Ratliff, D. Silver, and J. A. Bagnell, “Learning to search: Functional gradient techniques for imitation learning,” Autonomous Robots, Jul, 2009.
- K. Dvijotham and E. Todorov, “Inverse optimal control with linearly-solvable MDPs,” in Proc. of the International Conference on Machine Learning (ICML), Jun, 2010.
- B. D. Ziebart, “Modeling purposeful adaptive behavior with the principle of maximum causal entropy,” Ph.D. dissertation, Carnegie Mellon University, 2010.
- S. Levine, Z. Popovic, and V. Koltun, “Nonlinear inverse reinforcement learning with gaussian processes,” Advances in Neural Information Processing Systems (NIPS), Dec, 2011.
- A. Boularias, J. Kober, and J. Peters, “Relative entropy inverse reinforcement learning,” in Proc. of the International Conference on Artificial Intelligence and Statistics, Apr, 2011.
- Sergey Levine, Vladlen Koltun, "Continuous Inverse Optimal Control with Locally Optimal Examples," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2012.
- J. Choi and K. Kim, “Bayesian nonparametric feature construction for inverse reinforcement learning,” in Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), Aug, 2013.
- J. Zheng, S. Liu, and L. M. Ni, “Robust bayesian inverse reinforcement learning with sparse behavior noise,” in Proc. of the AAAI Conference on Artificial Intelligence (AAAI ), Jul, 2014.
- J. Choi and K.-E. Kim, “Hierarchical bayesian inverse reinforcement learning,” Cybernetics, IEEE Transactions on, 2015.
- Finn, Chelsea, Sergey Levine, and Pieter Abbeel. "Guided cost learning: Deep inverse optimal control via policy optimization," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- K. Shiarlis, J. Messias, M. van Someren, and S. Whiteson, “Inverse reinforcement learning from failure,” In Proc. of the 2016 International Conference on Autonomous Agents & Multiagent Systems, May, 2016.
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Inverse Reinforcement Learning with Leveraged Gaussian Processes," in Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2016.
- J. Ho, and S. Ermon, "Generative adversarial imitation learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2016. [arXiv]
- Nir Baram, Oron Anschel, Itai Caspi, Shie Mannor, "End-to-End Differentiable Adversarial Imitation Learning," in Proc. of International Conference on Machine learning (ICML), Aug, 2017.
- Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, and J. Andrew Bagnell, "Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction," in Proc. of International Conference on Machine learning (ICML), Aug, 2017.
-
Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, and Ken Goldberg, "DART: Noise Injection for Robust Imitation Learning", in Proc. of the Conference on Robot Learning (CoRL), Nov. 2017.
- K. Hausman, Y. Chebotar, S. Schaal, G. S. Sukhatme, J. J. Lim, "Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
- Ziyu Wang, Josh S. Merel, Scott E. Reed, Nando de Freitas, Gregory Wayne, Nicolas Heess, "Robust Imitation of Diverse Behaviors," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
-
Justin Fu, Katie Luo, and Sergey Levine, "Learning Robust Rewards with Adverserial Inverse Reinforcement Learning", in Proc. of the International Conference on Learning Representations (ICLR), Apr, 2018.
-
Wonseok Jeon, Seokin Seo, and Kee-Eung Kim. "A Bayesian Approach to Generative Adversarial Imitation Learning." in Proc. of the Neural Information Processing Systems (NIPS), Dec. 2018.
-
Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Maximum Causal Tsallis Entropy Imitation Learning", in Proc. of Neural Information Processing Systems (NIPS), Dec. 2018. [Supplementary Material | arXiv preprint]
-
Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, and Jonathan Tompson, "Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning", in Proc. of the International Conference on Learning Representations (ICLR), May, 2019.
Applications
MDP
- David Blackwell. "Discounted dynamic programming," The Annals of Mathematical Statistics, 1965.
- Emanuel Todorov. "Linearly-solvable Markov decision problems," Advances in neural information processing systems (NIPS), Dec, 2007.
- Kyungjae Lee, Sungjoon Choi, and Songhwai Oh, "Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning," IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1466-1473, Jul. 2018. [Supplementary Material | Video | arXiv preprint]
RL
- Christopher JCH Watkins, and Peter Dayan, "Q-learning," Machine learning, 1992.
- R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine Learning, 1992.
- R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," Advances in Neural Information Processing Systems (NIPS), Nov. 2000.
- Sham M. Kakade, "A natural policy gradient," Advances in Neural Information Processing Systems (NIPS), Dec. 2002.
- Sham M. Kakade, and John Langford, "Approximately optimal approximate reinforcement learning," in Proc. of the International Conference on Machine Learning (ICML), 2002.
- Rasmussen, Carl Edward, and Malte Kuss, "Gaussian Processes in Reinforcement Learning," Advances in Neural Information Processing Systems (NIPS), Dec. 2003.
- Jens Kober, and Jan R. Peters, "Policy search for motor primitives in robotics," Advances in neural information processing systems (NIPS), Dec, 2008.
- Hado V Hasselt, "Double Q-learning," Advances in Neural Information Processing Systems (NIPS), Dec, 2010.
- Peters, Jan, Katharina Mülling, and Yasemin Altun. "Relative Entropy Policy Search," in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Jul, 2010.
- N. Heess, D. Silver, and Y. W. Teh, "Actor-critic reinforcement learning with energy-based policies. In Proc. of the European Workshop on Reinforcement Learning, Jun, 2012.
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin A. Riedmiller, "Deterministic Policy Gradient Algorithms," in Proc. of the International Conference on Machine Learning (ICML), Jun. 2014.
Other Topics
- S. Ross, “Interactive learning for sequential decisions and predictions,” Ph. D. dissertation, Carnegie Mellon University, 2013.
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, Yoshua Bengio, "Generative Adversarial Nets," Advances in Neural Information Processing Systems (NIPS), Dec, 2014.
- Y. Gal, "Uncertainty in deep learning," University of Cambridge, 2016.
- Y. Gal, Z. Ghahramani, "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning," in Proc. of the International Conference on Machine Learning (ICML), Jun, 2016.
- P. McClure and K. Nikolaus, "Representing inferential uncertainty in deep neural networks through sampling," in Proc. of the International Conference on Learning Representations (ICLR), Apr, 2017.
- A. Kendall and Y. Gal, "What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
- B. Lakshminarayanan, A. Pritzel, and C. Blundell, "Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles," Advances in Neural Information Processing Systems (NIPS), Dec, 2017.
- M. Teye, H. Azizpour, and K. Smith, "Bayesian Uncertainty Estimation for Batch Normalized Deep Networks," in Proc. of the International Conference on Learning Representations (ICLR), Apr, 2018.
- Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine, "Diversity is all you need: Learning skills without a reward function", in Proc. of the International Conference on Learning Representations (ICLR), May, 2019.
- Joshua Achiam, Harrison Edwards, Dario Amodei, and Pieter Abbeel, "Variational option discovery algorithms." arXiv, 2018.