ICML2020 180篇强化学习论文汇总

之前有人已经整理过ICML2020 RL相关Paper,但昨天无意间发现有篇针对评论家高估问题的论文,没有被列出来,后面又无意间看了一篇和TRPO有关的似乎也遗漏了,所以就花了些精力看看自己整理整理论文,然后,发现大概有180篇,还是很多的。

这些论文根据自己的理解分了类(有可能有的分类有问题还请谅解,而且有接近一半还不太明确如何对它们分类比较合适。除此之外可能有的rl论文还是有遗漏,要么有可能有的误划入rl之中,还望以后有大佬能够进一步补充完善整理工作。。。)

很多论文arxiv上已经有了,相关论文链接暂时就不贴出来了,读者可以自行搜寻。

主要分类如下:

一、Model(主要对应Model-based RL)

二、Bandits(赌博机有关,大多涉及探索利用问题)

三、Exploration(不包含上面赌博机中的)

四、Batch RL

五、Imitation Learning

六、Multi-Agent RL

七、Multi-Objective RL

八、Policy Gradient(这里主要选的标题直接带策略梯度的)

九、Off-Policy Evaluation(异策略太多了,有好多可能不能单从标题看出来,时间有限就把其中典型的一个分支——OPE拿出来,其他论文暂时放在Other里面)

十、Application

十一、Other(其他的涵盖各个方面,比如safe RL,HRL,多任务RL,一些基础理论等等,但是有的领域本人不是太熟,有的论文划分感觉不明确,或者有的方面论文比较少等等,暂时就没有进行划分)

详细列表如下:

一、Model

  1. Active World Model Learning in Agent-rich Environments with Progress Curiosity
    Kuno Kim (Stanford University) · Megumi Sano (Stanford University) · Julian De Freitas (Harvard University) · Nick Haber (Stanford University) · Daniel Yamins (Stanford University)
  2. Goal-Aware Prediction: Learning to Model What Matters
    Suraj Nair (Stanford University) · Silvio Savarese (Stanford University) · Chelsea Finn (Stanford)
  3. A Game Theoretic Perspective on Model-Based Reinforcement Learning
    Aravind Rajeswaran (University of Washington) · Igor Mordatch (OpenAI) · Vikash Kumar (Google)
  4. Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning
    Kimin Lee (UC Berkeley) · Younggyo Seo (KAIST) · Seunghyun Lee (KAIST) · Honglak Lee (Google / U. Michigan) · Jinwoo Shin (KAIST)
  5. A Markov Decision Process Model for Socio-Economic Systems Impacted by Climate Change Salman Sadiq Shuvo (University of South Florida) · Yasin Yilmaz (University of South Florida) · Alan Bush (University of South Florida) · Mark Hafen (University of South Florida)
  6. Inverse Active Sensing: Modeling and Understanding Timely Decision-Making
    Daniel Jarrett (University of Cambridge) · Mihaela van der Schaar (University of Cambridge)
  7. Learning and Simulation in Generative Structured World Models
    Zhixuan Lin (Zhejiang University) · Yi-Fu Wu (Rutgers University) · Skand Peri (Rutgers University, New Jersey) · Bofeng Fu (Tianjin University) · Jindong Jiang (Rutgers University) · Sungjin Ahn (Rutgers University)
  8. Provably Efficient Model-based Policy Adaptation
    Yuda Song (University of California, San Diego) · Aditi Mavalankar (University of California San Diego) · Wen Sun (Microsoft Research) · Sicun Gao (University of California, San Diego)
  9. Selective Dyna-style Planning Under Limited Model Capacity
    Muhammad Zaheer (University of Alberta) · Samuel Sokota (University of Alberta) · Erin Talvitie () · Martha White (University of Alberta)
  10. Model-Based Reinforcement Learning with Value-Targeted Regression
    Zeyu Jia (Peking University) · Lin Yang (UCLA) · Csaba Szepesvari (DeepMind/University of Alberta) · Mengdi Wang (Princeton University) · Alex Ayoub (University of Alberta)
  11. Bidirectional Model-based Policy Optimization
    Hang Lai (Shanghai Jiao Tong University) · Jian Shen (Shanghai Jiao Tong University) · Weinan Zhang (Shanghai Jiao Tong University) · Yong Yu (Shanghai Jiao Tong University)
  12. Hallucinative Topological Memory for Zero-Shot Visual Planning
    Thanard Kurutach (UC Berkeley) · Kara Liu (UC Berkeley) · Aviv Tamar (Technion) · Pieter Abbeel (UC Berkeley) · Christine Tung (UC Berkeley)

二、Bandits

  1. My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits
    Ilai Bistritz (Stanford University) · Tavor Z Baharav (Stanford University) · Amir Leshem (Bar-Ilan University) · Nicholas Bambos ()
  2. Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards
    Aadirupa Saha (Indian Institute of Science (IISc), Bangalore) · Pierre Gaillard () · Michal Valko (DeepMind)
  3. Multinomial Logit Bandit with Low Switching Cost
    Kefan Dong (Tsinghua University) · Yingkai Li (Northwestern University) · Qin Zhang (Indiana University Bloomington) · Yuan Zhou (UIUC)
  4. Optimistic Policy Optimization with Bandit Feedback
    Lior Shani (Technion) · Yonathan Efroni (Technion) · Aviv Rosenberg (Tel Aviv University) · Shie Mannor (Technion)
  5. Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition
    Chi Jin (Princeton University) · Tiancheng Jin (University of Southern California) · Haipeng Luo (University of Southern California) · Suvrit Sra (MIT) · Tiancheng Yu (MIT )
  6. Thompson Sampling Algorithms for Mean-Variance Bandits
    Qiuyu Zhu (National University of Singapore) · Vincent Tan (National University of Singapore)
  7. Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
    Dylan Foster (MIT) · Alexander Rakhlin (MIT)
  8. Exploration Through Bias: Revisiting Biased Maximum Likelihood Estimation in Stochastic Multi-Armed Bandits
    Xi Liu (Texas A&M University) · Ping-Chun Hsieh (National Chiao Tung University) · Yu Heng Hung (National Chiao Tung University) · Anirban Bhattacharya (Texas A&M University) · P. Kumar (Texas A&M University)
  9. Non-Stationary Bandits with Intermediate Observations
    Claire Vernade (DeepMind) · Andras Gyorgy (DeepMind) · Timothy Mann (DeepMind)
  10. Linear bandits with Stochastic Delayed Feedback
    Claire Vernade (DeepMind) · Alexandra Carpentier (Otto-von-Guericke University) · Tor Lattimore (DeepMind) · Giovanni Zappella (Amazon) · Beyza Ermis (Amazon Research) · Michael Brueckner (Amazon Research Berlin)
  11. Improved Optimistic Algorithms for Logistic Bandits
    Louis Faury (Criteo) · Marc Abeille (Criteo) · Clement Calauzenes (Criteo) · Olivier Fercoq (Telecom Paris)
  12. Neural Contextual Bandits with UCB-based Exploration
    Dongruo Zhou (UCLA) · Lihong Li (Google Research) · Quanquan Gu (University of California, Los Angeles)
  13. Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits
    Nian Si (Stanford University) · Fan Zhang (Stanford University) · Zhengyuan Zhou (Stanford University) · Jose Blanchet (Stanford University)
  14. Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
    Lin Yang (UCLA) · Mengdi Wang (Princeton University)
  15. Combinatorial Pure Exploration for Dueling Bandit
    Wei Chen (Microsoft) · Yihan Du (IIIS, Tsinghua University) · Longbo Huang (Tsinghua University) · Haoyu Zhao (Tsinghua University)
  16. The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation
    Zhe Feng (Harvard University) · David Parkes (Harvard University) · Haifeng Xu (University of Virginia)
  17. Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis
    Vidyashankar Sivakumar (Walmart Labs) · Steven Wu (University of Minnesota) · Arindam Banerjee (University of Minnesota)
  18. Gamification of Pure Exploration for Linear Bandits
    Rémy Degenne (Inria) · Pierre Menard (Inria) · Xuedong Shang (Inria) · Michal Valko (DeepMind)
  19. Structure Adaptive Algorithms for Stochastic Bandits
    Rémy Degenne (Inria) · Han Shao (Toyota Technological Institute at Chicago) · Wouter Koolen (Centrum Wiskunde & Informatica, Amsterdam)
  20. Meta-learning with Stochastic Linear Bandits
    Leonardo Cella (University of Milan) · Alessandro Lazaric (Facebook AI Research) · Massimiliano Pontil (Istituto Italiano di Tecnologia and University College London)
  21. Learning with Good Feature Representations in Bandits and in RL with a Generative Model
    Gellért Weisz (DeepMind) · Tor Lattimore (DeepMind) · Csaba Szepesvari (DeepMind/University of Alberta)
  22. On conditional versus marginal bias in multi-armed bandits
    Jaehyeok Shin (Carnegie Mellon University) · Aaditya Ramdas (Carnegie Mellon University) · Alessandro Rinaldo (Carnegie Mellon University)
  23. Bandits for BMO Functions
    Tianyu Wang (Duke University) · Cynthia Rudin (Duke)

三、Exploration

  1. Naive Exploration is Optimal for Online LQR
    Max Simchowitz (UC Berkeley) · Dylan Foster (MIT)
  2. Implicit Generative Modeling for Efficient Exploration
    Neale Ratzlaff (Oregon State University) · Qinxun Bai (Horizon Robotics) · Fuxin Li (Oregon State University) · Wei Xu (Horizon Robotics)
  3. No-Regret Exploration in Goal-Oriented Reinforcement Learning
    Jean Tarbouriech (Facebook AI Research Paris & Inria Lille) · Evrard Garcelon (Facebook AI Research ) · Michal Valko (DeepMind) · Matteo Pirotta (Facebook AI Research) · Alessandro Lazaric (Facebook AI Research)
  4. Provably Efficient Exploration in Policy Optimization
    Qi Cai (Northwestern University) · Zhuoran Yang (Princeton University) · Chi Jin (Princeton University) · Zhaoran Wang (Northwestern U)
  5. Reward-Free Exploration for Reinforcement Learning
    Chi Jin (Princeton University) · Akshay Krishnamurthy (Microsoft Research) · Max Simchowitz (UC Berkeley) · Tiancheng Yu (MIT )
  6. Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation
    Marc Abeille (Criteo) · Alessandro Lazaric (Facebook AI Research)
  7. Tightening Exploration in Upper Confidence Reinforcement Learning
    Hippolyte Bourel (ENS Rennes) · Odalric-Ambrym Maillard (Inria Lille - Nord Europe) · Mohammad Sadegh Talebi (University of Copenhagen)
  8. Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning
    Silviu Pitis (University of Toronto) · Harris Chan (University of Toronto, Vector Institute) · Stephen Zhao (University of Toronto) · Bradly Stadie (Vector Institute) · Jimmy Ba (University of Toronto)
  9. Flexible and Efficient Long-Range Planning Through Curious Exploration
    Aidan Curtis (Rice University) · Minjian Xin (Shanghai Jiao Tong University) · Dilip Arumugam (Stanford University) · Kevin Feigelis (Stanford University) · Daniel Yamins (Stanford University)
  10. Planning to Explore via Latent Disagreement
    Ramanan Sekar (University of Pennsylvania) · Oleh Rybkin (University of Pennsylvania / UC Berkeley (Visiting)) · Kostas Daniilidis (University of Pennsylvania) · Pieter Abbeel (UC Berkeley & Covariant) · Danijar Hafner (Google Brain & University of Toronto) · Deepak Pathak (UC Berkeley)
  11. On Thompson Sampling with Langevin Algorithms
    Eric Mazumdar (University of California Berkeley) · Aldo Pacchiano (UC Berkeley) · Yian Ma (Google) · Michael Jordan (UC Berkeley) · Peter Bartlett (UC Berkeley)
  12. Thompson Sampling via Local Uncertainty
    Zhendong Wang (University of Texas, Austin) · Mingyuan Zhou (University of Texas at Austin)
  13. What Can Learned Intrinsic Rewards Capture?
    Zeyu Zheng (University of Michigan) · Junhyuk Oh (DeepMind) · Matteo Hessel (Deep Mind) · Zhongwen Xu (DeepMind) · Manuel Kroiss (DeepMind) · Hado van Hasselt (DeepMind) · David Silver (Google DeepMind) · Satinder Singh (DeepMind)
  14. Learning Near Optimal Policies with Low Inherent Bellman Error Andrea Zanette (Stanford University) · Alessandro Lazaric (Facebook AI Research) · Mykel Kochenderfer (Stanford University) · Emma Brunskill (Stanford University)

四、Batch RL

  1. GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values
    Shangtong Zhang (University of Oxford) · Bo Liu (Auburn University) · Shimon Whiteson (University of Oxford)
  2. An Optimistic Perspective on Offline Deep Reinforcement Learning
    Rishabh Agarwal (Google Research, Brain Team) · Dale Schuurmans (Google / University of Alberta) · Mohammad Norouzi (Google Brain)
  3. Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning
    Alberto Maria Metelli (Politecnico di Milano) · Flavio Mazzolini (Politecnico di Milano) · Lorenzo Bisi (Politecnico di Milano) · Luca Sabbioni (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)
  4. Reducing Sampling Error in Batch Temporal Difference Learning
    Brahma Pavse (University of Texas at Austin) · Ishan Durugkar (University of Texas at Austin) · Josiah Hanna ( University of Edinburgh) · Peter Stone (University of Texas at Austin)
  5. Batch Reinforcement Learning with Hyperparameter Gradients
    Byung-Jun Lee (KAIST) · Jongmin Lee (KAIST) · Peter Vrancx (PROWLER.io) · Dongho Kim (Prowler.io) · Kee-Eung Kim (KAIST)

五、Imitation Learning

  1. Variational Imitation Learning with Diverse-quality Demonstrations
    Voot Tangkaratt (RIKEN AIP) · Bo Han (HKBU / RIKEN) · Mohammad Emtiyaz Khan (RIKEN) · Masashi Sugiyama (RIKEN / The University of Tokyo)
  2. Domain Adaptive Imitation Learning
    Kuno Kim (Stanford University) · Yihong Gu (Tsinghua University) · Jiaming Song (Stanford) · Shengjia Zhao (Stanford University) · Stefano Ermon (Stanford University)
  3. An Imitation Learning Approach for Cache Replacement
    Evan Liu (Google) · Milad Hashemi (Google) · Kevin Swersky (Google Brain) · Parthasarathy Ranganathan (Google, USA) · Junwhan Ahn (Google)
  4. Generative Adversarial Imitation Learning with Neural Network Parameterization: Global Optimality and Convergence Rate
    Yufeng Zhang (Northwestern University) · Qi Cai (Northwestern University) · Zhuoran Yang (Princeton University) · Zhaoran Wang (Northwestern U)
  5. Intrinsic Reward Driven Imitation Learning via Generative Model
    Xingrui Yu (University of Technology Sydney) · Yueming LYU (University of Technology Sydney) · Ivor Tsang (University of Technology Sydney)
  6. Provable Representation Learning for Imitation Learning via Bi-level Optimization
    Sanjeev Arora ( Princeton University and Institute for Advanced Study) · Simon Du (Institute for Advanced Study) · Sham Kakade (University of Washington) · Yuping Luo (Princeton University) · Nikunj Umesh Saunshi (Princeton University)
  7. Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences
    Daniel Brown (University of Texas at Austin) · Scott Niekum (University of Texas at Austin) · Russell Coleman (University of Texas at Austin) · Ravi Srinivasan (University of Texas at Austin)

六、Multi-Agent RL

  1. Kernel Methods for Cooperative Multi-Agent Learning with Delays
    Abhimanyu Dubey (Massachusetts Institute of Technology) · Alex `Sandy' Pentland (MIT)
  2. Robust Multi-Agent Decision-Making with Heavy-Tailed Payoffs
    Abhimanyu Dubey (Massachusetts Institute of Technology) · Alex `Sandy' Pentland (MIT)
  3. Multi-Agent Determinantal Q-Learning
    Yaodong Yang (Huawei Technology R&D UK) · Ying Wen (UCL) · Jun Wang (UCL) · Liheng Chen (Shanghai Jiao Tong University) · Kun Shao (Huawei Noah's Ark Lab) · David Mguni (Noah's Ark Laboratory, Huawei) · Weinan Zhang (Shanghai Jiao Tong University)
  4. Learning Efficient Multi-agent Communication: An Information Bottleneck Approach
    Rundong Wang (Nanyang Technological University) · Xu He (Nanyang Technological University) · Runsheng Yu (Nanyang Technological University) · Wei Qiu (Nanyang Technological University) · Bo An (Nanyang Technological University) · Zinovi Rabinovich (Nanyang Technological University)
  5. Optimizing Multiagent Cooperation via Policy Evolution and Shared Experiences
    Somdeb Majumdar (Intel AI Lab) · Shauharda Khadka (Intel AI) · Santiago Miret (Intel AI Products Group) · Stephen Mcaleer (UC Irvine) · Kagan Tumer (Oregon State University US)
  6. ROMA: Multi-Agent Reinforcement Learning with Emergent Roles
    Tonghan Wang (Tsinghua University) · Heng Dong (Tsinghua) · Victor Lesser (UMASS) · Chongjie Zhang (Tsinghua University)
  7. OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning
    Alexander Vezhnevets (DeepMind) · Yuhuai Wu (University of Toronto) · Maria Eckstein (UC Berkeley) · Rémi Leblond (DeepMind) · Joel Z Leibo (DeepMind)
  8. Multi-Agent Routing Value Iteration Network
    Quinlan Sykora (Uber ATG) · Mengye Ren (Uber ATG / University of Toronto) · Raquel Urtasun (Uber ATG)
  9. Q-value Path Decomposition for Deep Multiagent Reinforcement Learning
    Yaodong Yang (Tianjin University) · Jianye Hao (Tianjin University) · Guangyong Chen (Tencent) · Hongyao Tang (Tianjin University) · Yingfeng Chen (NetEase Fuxi AI Lab) · Yujing Hu (NetEase Fuxi AI Lab) · Changjie Fan (Netease) · Zhongyu Wei (Fudan University)
  10. Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games
    Tianyi Lin (UC Berkeley) · Zhengyuan Zhou (Stanford University) · Panayotis Mertikopoulos (CNRS) · Michael Jordan (UC Berkeley)
  11. “Other-Play” for Zero-Shot Coordination
    Hengyuan Hu (FAIR) · Alexander Peysakhovich (Facebook) · Adam Lerer (Facebook AI Research) · Jakob Foerster (Facebook AI Research)
  12. Asynchronous Coagent Networks
    James Kostas (University of Massachusetts Amherst) · Chris Nota (University of Massachusetts Amherst) · Philip Thomas (University of Massachusetts Amherst)
  13. Extra-gradient with player sampling for faster convergence in n-player games
    Samy Jelassi (Princeton University) · Carles Domingo-Enrich (NYU) · Damien Scieur (Samsung - SAIT AI Lab, Montreal) · Arthur Mensch (ENS) · Joan Bruna (New York University)
  14. Optimally Solving Two-Agent Decentralized POMDPs Under One-Sided Information Sharing
    Yuxuan Xie (INSA de Lyon) · Jilles Dibangoye (INSA Lyon, INRIA) · Olivier Buffet (INRIA - LORIA)

七、Multi-Objective RL

  1. Learning Fair Policies in Multi-Objective (Deep) Reinforcement Learning with Average and Discounted Rewards
    Umer Siddique (Shanghai Jiao Tong University) · Paul Weng (Shanghai Jiao Tong University) · Matthieu Zimmer (UM-SJTU JI)
  2. Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control
    Jie Xu (Massachusetts Institute of Technology) · Yunsheng Tian (Massachusetts Institute of Technology) · Pingchuan Ma (MIT) · Daniela Rus (MIT CSAIL) · Shinjiro Sueda (Texas A&M University) · Wojciech Matusik (MIT)
  3. A distributional view on multi objective policy optimization
    Abbas Abdolmaleki (Google DeepMind) · Sandy Huang (DeepMind) · Leonard Hasenclever (DeepMind) · Michael Neunert (Google DeepMind) · Martina Zambelli (DeepMind) · Murilo Martins (DeepMind) · Francis Song (DeepMind) · Nicolas Heess (DeepMind) · Raia Hadsell (DeepMind) · Martin Riedmiller (DeepMind)

八、Policy Gradient

  1. From Importance Sampling to Doubly Robust Policy Gradient
    Jiawei Huang (University of Illinois at Urbana-Champaign) · Nan Jiang (University of Illinois at Urbana-Champaign)
  2. Statistically Efficient Off-Policy Policy Gradients
    Nathan Kallus (Cornell University) · Masatoshi Uehara (Harvard University)
  3. Momentum-Based Policy Gradient Methods
    Feihu Huang (University of Pittsburgh) · Shangqian Gao (University of Pittsburgh) · Jian Pei (Simon Fraser University) · Heng Huang (University of Pittsburgh & JD Finance America Corporation)
  4. On the Global Convergence Rates of Softmax Policy Gradient Methods
    Jincheng Mei (Google / University of Alberta) · Chenjun Xiao (Google / University of Alberta) · Csaba Szepesvari (DeepMind/University of Alberta) · Dale Schuurmans (University of Alberta)

九、Off-Policy Evaluation

  1. Batch Stationary Distribution Estimation
    Junfeng Wen (University of Alberta) · Bo Dai (Google Brain) · Lihong Li (Google Research) · Dale Schuurmans (University of Alberta)
  2. Minimax Weight and Q-Function Learning for Off-Policy Evaluation
    Masatoshi Uehara (Harvard University) · Jiawei Huang (University of Illinois at Urbana-Champaign) · Nan Jiang (University of Illinois at Urbana-Champaign)
  3. Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
    Yao Liu (Stanford University) · Pierre-Luc Bacon (Stanford University) · Emma Brunskill (Stanford University)
  4. Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
    Nathan Kallus (Cornell University) · Masatoshi Uehara (Harvard University)
  5. Adaptive Estimator Selection for Off-Policy Evaluation
    Yi Su (Cornell University) · Pavithra Srinath (Microsoft Research) · Akshay Krishnamurthy (Microsoft Research)
  6. Doubly robust off-policy evaluation with shrinkage
    Yi Su (Cornell University) · Maria Dimakopoulou (Stanford University) · Akshay Krishnamurthy (Microsoft Research) · Miroslav Dudik (Microsoft Research)
  7. Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation
    Yaqi Duan (Princeton University) · Zeyu Jia (Peking University) · Mengdi Wang (Princeton University)
  8. Accountable Off-Policy Evaluation via a Kernelized Bellman Statistics
    Yihao Feng (The University of Texas at Austin) · Tongzheng Ren (UT Austin) · Ziyang Tang (University of Texas at Austin) · Qiang Liu (UT Austin)
  9. Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions
    Omer Gottesman (Harvard University) · Joseph Futoma (Harvard University) · Yao Liu (Stanford University) · Sonali Parbhoo (Harvard University) · Leo Celi (MIT) · Emma Brunskill (Stanford University) · Finale Doshi-Velez (Harvard University)

十、Application

  1. Description Based Text Classification with Reinforcement Learning
    Wei Wu (Shannon.AI) · Duo Chai (Shannon.AI) · Qinghong Han (Shannon.AI) · Fei Wu (Zhejiang University, China) · Jiwei Li (Shannon.AI)
  2. Reinforcement Learning for Molecular Design Guided by Quantum Mechanics
    Gregor Simm (Cambridge University) · Robert Pinsler (University of Cambridge) · Jose Hernandez-Lobato (University of Cambridge)
  3. Entropy Minimization In Emergent Languages
    Evgeny Kharitonov (FAIR) · Rahma Chaabouni (Facebook/ENS/INRIA) · Diane Bouchacourt (Facebook AI) · Marco Baroni (Facebook Artificial Intelligence Research)
  4. Adaptive Droplet Routing in Digital Microfluidic Biochips Using Deep Reinforcement Learning
    Tung-Che Liang (Duke University) · Zhanwei Zhong (Duke University) · Yaas Bigdeli (Duke Univsersity) · Tsung-Yi Ho (National Tsing Hua University) · Richard Fair (Duke University) · Krishnendu Chakrabarty (Duke University)
  5. Student-Teacher Curriculum Learning via Reinforcement Learning: Predicting Hospital Inpatient Admission Location
    Rasheed El-Bouri (University of Oxford) · David Eyre (University of Oxford) · Peter Watkinson (Oxford University Hospitals NHS Foundation Trust) · Tingting Zhu (University of Oxford) · David Clifton (University of Oxford)
  6. Learning to Navigate in Synthetically Accessible Chemical Space Using Reinforcement Learning
    Sai Krishna Gottipati (99andBeyond) · Boris Sattarov (99andBeyond) · Sufeng Niu (Linkedin) · Haoran Wei (University of Delaware) · Yashaswi Pathak (International Institute of Information Technology,Hyderabad) · Shengchao Liu (MILA-UdeM) · Shengchao Liu (Mila, Université de Montréal) · Simon Blackburn (Mila) · Karam Thomas (99andBeyond) · Connor Coley (MIT) · Jian Tang (HEC Montreal & MILA) · Sarath Chandar (Mila / École Polytechnique de Montréal) · Yoshua Bengio (Mila / U. Montreal)

十一、Other

  1. Generalization to New Actions in Reinforcement Learning
    Ayush Jain (University of Southern California) · Andrew Szot (University of Southern California) · Joseph Lim (Univ. of Southern California)
  2. Generalized Neural Policies for Relational MDPs
    Sankalp Garg (Indian Institute of Technology Delhi) · Aniket Bajpai (Indian Institute of Technology, Delhi) · Mausam (IIT Delhi)
  3. Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies
    Shengpu Tang (University of Michigan) · Aditya Modi (University of Michigan) · Michael Sjoding (University of Michigan) · Jenna Wiens (University of Michigan)
  4. Learning the Valuations of a k-demand Agent
    Hanrui Zhang (Duke University) · Vincent Conitzer (Duke)
  5. Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation
    Shangtong Zhang (University of Oxford) · Bo Liu (Auburn University) · Hengshuai Yao (Huawei Technologies) · Shimon Whiteson (University of Oxford)
  6. Learning Human Objectives by Evaluating Hypothetical Behavior
    Siddharth Reddy (University of California, Berkeley) · EECS Anca Dragan (EECS Department, University of California, Berkeley) · Sergey Levine (UC Berkeley) · Shane Legg (DeepMind) · Jan Leike (DeepMind)
  7. Optimizing Data Usage via Differentiable Rewards
    Xinyi Wang (Carnegie Mellon University) · Hieu Pham (Carnegie Mellon University) · Paul Michel (Carnegie Mellon University) · Antonios Anastasopoulos (Carnegie Mellon University) · Jaime Carbonell (Carnegie Mellon University) · Graham Neubig (Carnegie Mellon University)
  8. Taylor Expansion Policy Optimization
    Yunhao Tang (Columbia University) · Michal Valko (DeepMind) · Remi Munos (DeepMind)
  9. Reinforcement Learning for Integer Programming: Learning to Cut
    Yunhao Tang (Columbia University) · Shipra Agrawal (Columbia University) · Yuri Faenza (Columbia University)
  10. Safe Reinforcement Learning in Constrained Markov Decision Processes
    Akifumi Wachi (IBM Research AI) · Yanan Sui (Tsinghua University)
  11. Off-Policy Actor-Critic with Shared Experience Replay
    Simon Schmitt (DeepMind) · Matteo Hessel (Deep Mind) · Karen Simonyan (DeepMind)
  12. Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning
    Amin Rakhsha (MPI-SWS) · Goran Radanovic (Max Planck Institute for Software Systems) · Rati Devidze (Max Planck Institute for Software Systems) · Jerry Zhu (University of Wisconsin-Madison) · Adish Singla (Max Planck Institute (MPI-SWS))
  13. Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making
    Chengchun Shi (London School of Economics and Political Science) · Runzhe Wan (North Carolina State University) · Rui Song () · Wenbin Lu () · Ling Leng (Amazon)
  14. ConQUR: Mitigating Delusional Bias in Deep Q-Learning
    DiJia Su (Princeton University) · Jayden Ooi (Google) · Tyler Lu (Google) · Dale Schuurmans (Google / University of Alberta) · Craig Boutilier (Google)
  15. Self-Attentive Associative Memory
    Hung Le (Deakin University) · Truyen Tran (Deakin University) · Svetha Venkatesh (Deakin University)
  16. Striving for simplicity and performance in off-policy DRL: Output Normalization and Non-Uniform Sampling
    Che Wang (New York University) · Yanqiu Wu (New York University) · Quan Vuong (University of California San Diego) · Keith Ross (New York University Shanghai)
  17. Low-Variance and Zero-Variance Baselines for Extensive-Form Games
    Trevor Davis (University of Alberta) · Martin Schmid (DeepMind) · Michael Bowling (DeepMind)
  18. Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills
    Victor Campos (Barcelona Supercomputing Center) · Alexander Trott (Salesforce Research) · Caiming Xiong (Salesforce) · Richard Socher (Salesforce) · Xavier Giro-i-Nieto (Universitat Politecnica de Catalunya) · Jordi Torres (Barcelona Supercomputing Center)
  19. Discount Factor as a Regularizer in Reinforcement Learning
    Ron Amit (Technion – Israel Institute of Technology) · Kamil Ciosek (Microsoft) · Ron Meir (Technion Israeli Institute of Technology)
  20. Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning
    Lingxiao Wang (Northwestern University) · Zhuoran Yang (Princeton University) · Zhaoran Wang (Northwestern U)
  21. Gradient Temporal-Difference Learning with Regularized Corrections
    Sina Ghiassian (University of Alberta) · Andrew Patterson (University of Alberta) · Shivam Garg (University of alberta) · Dhawal Gutpa (University of Alberta) · Adam White (University of Alberta) · Martha White (University of Alberta)
  22. A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation
    Pan Xu (University of California, Los Angeles) · Quanquan Gu (University of California, Los Angeles)
  23. Information Particle Filter Tree: An Online Algorithm for POMDPs with Belief-Based Rewards on Continuous Domains
    Johannes Fischer (Karlsruhe Institute of Technology (KIT)) · Ömer Sahin Tas (Karlsruhe Institute of Technology (KIT))
  24. Learning Portable Representations for High-Level Planning
    Steven James (University of the Witwatersrand) · Benjamin Rosman (University of the Witwatersrand / CSIR, South Africa) · George Konidaris (Brown)
  25. The Natural Lottery Ticket Winner: Reinforcement Learning with Ordinary Neural Circuits
    Ramin Hasani (TU Wien) · Mathias Lechner (IST Austria) · Alexander Amini (MIT) · Daniela Rus (MIT CSAIL) · Radu Grosu (TU Wien)
  26. Reinforcement Learning with Differential Privacy
    Giuseppe Vietri (University of Minnesota) · Borja de Balle Pigem (Amazon Research) · Steven Wu (University of Minnesota) · Akshay Krishnamurthy (Microsoft Research)
  27. Growing Action Spaces
    Gregory Farquhar (University of Oxford) · Laura Gustafson (Facebook AI Research) · Zeming Lin (Facebook AI Reseach) · Shimon Whiteson (Oxford University) · Nicolas Usunier (Facebook AI Research) · Gabriel Synnaeve (Facebook AI Research)
  28. Responsive Safety in Reinforcement Learning
    Adam Stooke (UC Berkeley) · Joshua Achiam (OpenAI) · Pieter Abbeel (UC Berkeley & Covariant)
  29. Stabilizing Transformers for Reinforcement Learning
    Emilio Parisotto (Carnegie Mellon University) · Francis Song (DeepMind) · Jack Rae (DeepMind) · Razvan Pascanu (DeepMind) · Caglar Gulcehre (DeepMind) · Siddhant Jayakumar (DeepMind) · Max Jaderberg (DeepMind) · Raphael Lopez Kaufman (Deepmind) · Aidan Clark (DeepMind) · Seb Noury (DeepMind) · Matthew Botvinick (DeepMind) · Nicolas Heess (DeepMind) · Raia Hadsell (DeepMind)
  30. Learning to Score Behaviors for Guided Policy Optimization
    Aldo Pacchiano (UC Berkeley) · Jack Parker-Holder (University of Oxford) · Yunhao Tang (Columbia University) · Krzysztof Choromanski (Google) · Anna Choromanska (NYU Tandon School of Engineering) · Michael Jordan (UC Berkeley)
  31. Efficient Policy Learning from Surrogate-Loss Classification Reductions
    Andrew Bennett (Cornell University) · Nathan Kallus (Cornell University)
  32. Constrained Markov Decision Processes via Backward Value Functions
    Harsh Satija (McGill University) · Philip Amortila (McGill University) · Joelle Pineau (McGill University / Facebook)
  33. Learning Calibratable Policies using Programmatic Style-Consistency
    Eric Zhan (California Institute of Technology) · Albert Tseng (Caltech) · Yisong Yue (Caltech) · Adith Swaminathan (Microsoft Research) · Matthew Hausknecht (Microsoft Research)
  34. Learning Robot Skills with Temporal Variational Inference
    Tanmay Shankar (Facebook AI Research) · Abhinav Gupta (Carnegie Mellon University)
  35. Leveraging Procedural Generation to Benchmark Reinforcement Learning
    Karl Cobbe (OpenAI) · Chris Hesse (OpenAI) · Jacob Hilton (OpenAI) · John Schulman (OpenAI)
  36. What can I do here? A Theory of Affordances in Reinforcement Learning
    Khimya Khetarpal (McGill University, Mila Montreal) · Zafarali Ahmed (DeepMind) · Gheorghe Comanici (DeepMind) · David Abel (Brown University) · Doina Precup (DeepMind)
  37. Data Valuation using Reinforcement Learning
    Jinsung Yoon (Google) · Sercan O. Arik (Google) · Tomas Pfister (Google)
  38. Designing Optimal Dynamic Treatment Regimes: A Causal Reinforcement Learning Approach
    Junzhe Zhang (Columbia University)
  39. Lookahead-Bounded Q-learning
    Ibrahim El Shar (University of Pittsburgh) · Daniel Jiang (University of Pittsburgh)
  40. Evaluating the Performance of Reinforcement Learning Algorithms
    Scott Jordan (University of Massachusetts Amherst) · Yash Chandak (University of Massachusetts Amherst) · Daniel Cohen (University of Massachusetts Amherst) · Mengxue Zhang (umass Amherst ) · Philip Thomas (University of Massachusetts Amherst)
  41. Provable Self-Play Algorithms for Competitive Reinforcement Learning
    Yu Bai (Salesforce Research) · Chi Jin (Princeton University)
  42. Optimizing for the Future in Non-Stationary MDPs
    Yash Chandak (University of Massachusetts Amherst) · Georgios Theocharous (Adobe Research) · Shiv Shankar (University of Massachusetts) · Martha White (University of Alberta) · Sridhar Mahadevan (Adobe Research) · Philip Thomas (University of Massachusetts Amherst)
  43. Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning
    Aleksei Petrenko (University of Southern California) · Zhehui Huang (University of Southern California) · Tushar Kumar (University of Southern California) · Gaurav Sukhatme (University of Southern California) · Vladlen Koltun (Intel Labs)
  44. When Demands Evolve Larger and Noisier: Learning and Earning in a Growing Environment
    Feng Zhu (Peking University) · Zeyu Zheng (University of California, Berkeley)
  45. Structured Policy Iteration for Linear Quadratic Regulator
    Youngsuk Park (Stanford University) · Ryan Rossi (Adobe Research) · Zheng Wen (DeepMind) · Gang Wu (Adobe Research) · Handong Zhao (Adobe Research)
  46. Monte-Carlo Tree Search as Regularized Policy Optimization
    Jean-Bastien Grill (DeepMind) · Florent Altché (DeepMind) · Yunhao Tang (Columbia University) · Thomas Hubert (DeepMind) · Michal Valko (DeepMind) · Ioannis Antonoglou (Deepmind) · Remi Munos (DeepMind)
  47. On the Expressivity of Neural Networks for Deep Reinforcement Learning
    Kefan Dong (Tsinghua University) · Yuping Luo (Princeton University) · Tianhe Yu (Stanford University) · Chelsea Finn (Stanford) · Tengyu Ma (Stanford)
  48. Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?
    Kei Ota (Mitsubishi Electric Corporation) · Tomoaki Oiki (Mitsubishi Electric) · Devesh Jha (Mitsubishi Electric Research Labs) · Toshisada Mariyama (Mitsubishi Electric) · Daniel Nikovski (Mitsubishi Electric Research Labs)
  49. Sub-Goal Trees -- a Framework for Goal-Based Reinforcement Learning
    Tom Jurgenson (Technion) · Or Avner (Technion) · Edward Groshev (Osaro, Inc.) · Aviv Tamar (Technion)
  50. Agent57: Outperforming the Atari Human Benchmark
    Adrià Puigdomenech Badia (Deepmind) · Bilal Piot (DeepMind) · Steven Kapturowski (Deepmind) · Pablo Sprechmann (Google DeepMind) · Oleksandr Vitvitskyi (DeepMind) · Zhaohan Guo (DeepMind) · Charles Blundell (DeepMind)
  51. Stochastically Dominant Distributional Reinforcement Learning
    John Martin (Stevens Institute of Technology) · Michal Lyskawinski (Stevens Institute of Technology) · Xiaohu Li (Stevens Institute of Technology) · Brendan Englot (Stevens Institute of Technology)
  52. Option Discovery in the Absence of Rewards with Manifold Analysis
    Amitay Bar (Technion - Israel Institute of Technology) · Ronen Talmon (Technion - Israel Institute Of Technology) · Ron Meir (Technion Israeli Institute of Technology)
  53. Gradient-free Online Learning in Continuous Games with Delayed Rewards
    Amélie Héliou (Criteo) · Panayotis Mertikopoulos (CNRS) · Zhengyuan Zhou (Stanford University)
  54. Fast Adaptation to New Environments via Policy-Dynamics Value Functions
    Roberta Raileanu (NYU) · Max Goldstein (NYU) · Arthur Szlam (Facebook) · Facebook Rob Fergus (Facebook AI Research, NYU)
  55. Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
    Zhaohan Guo (DeepMind) · Bernardo Avila Pires (DeepMind) · Mohammad Gheshlaghi Azar (Deepmind) · Bilal Piot (DeepMind) · Florent Altché (DeepMind) · Jean-Bastien Grill (DeepMind) · Remi Munos (DeepMind)
  56. Deep Reinforcement Learning with Smooth Policy
    Qianli Shen (Peking University) · Yan Li (Georgia Tech) · Haoming Jiang (Georgia Tech) · Zhaoran Wang (Northwestern) · Tuo Zhao (Gatech)
  57. Inductive Bias-driven Reinforcement Learning For Efficient Schedules in Heterogeneous Clusters
    Subho Banerjee (University of Illinois at Urbana-Champaign) · Saurabh Jha (UIUC) · Zbigniew Kalbarczyk (University of Illinois at Urbana-Champaign) · Ravishankar Iyer (University of Illinois at Urbana-Champaign)
  58. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning
    Vitchyr Pong (UC Berkeley) · Murtaza Dalal (UC Berkeley) · Steven Lin (UC Berkeley) · Ashvin Nair (UC Berkeley) · Shikhar Bahl (UC Berkeley/Carnegie Mellon University) · Sergey Levine (UC Berkeley)
  59. Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
    Chen-Yu Wei (University of Southern California) · Mehdi Jafarnia (University of Southern California) · Haipeng Luo (University of Southern California) · Hiteshi Sharma (University of Southern California) · Rahul Jain (USC)
  60. Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning
    Dipendra Misra (Microsoft) · Mikael Henaff (Microsoft) · Akshay Krishnamurthy (Microsoft Research) · John Langford (Microsoft Research)
  61. Enhanced POET: Open-ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions
    Rui Wang (Uber AI) · Joel Lehman () · Aditya Rawal (Uber AI Labs) · Jiale Zhi (Uber AI) · Yulun Li (Uber AI) · Jeffrey Clune (Open AI) · Kenneth Stanley (Uber AI and University of Central Florida)
  62. Adaptive Reward-Poisoning Attacks against Reinforcement Learning
    Xuezhou Zhang (UW-Madison) · Yuzhe Ma (Univ. of Wisconsin-Madison) · Adish Singla (Max Planck Institute (MPI-SWS)) · Jerry Zhu (University of Wisconsin-Madison)
  63. Estimation of Bounds on Potential Outcomes For Decision Making
    Maggie Makar (MIT) · Fredrik Johansson (Chalmers University of Technology) · John Guttag (MIT) · David Sontag (Massachusetts Institute of Technology)
  64. Sequential Transfer in Reinforcement Learning with a Generative Model
    Andrea Tirinzoni (Politecnico di Milano) · Riccardo Poiani (Politecnico di Milano) · Marcello Restelli (Politecnico di Milano)
  65. Interference and Generalization in Temporal Difference Learning
    Emmanuel Bengio (McGill University) · Joelle Pineau (McGill University / Facebook) · Doina Precup (McGill University / DeepMind)
  66. CoMic: Co-Training and Mimicry for Reusable Skills
    Leonard Hasenclever (DeepMind) · Fabio Pardo (Imperial College London) · Raia Hadsell (DeepMind) · Nicolas Heess (DeepMind) · Josh Merel (DeepMind)
  67. Stochastic Regret Minimization in Extensive-Form Games
    Gabriele Farina (Carnegie Mellon University) · Christian Kroer (Columbia University) · Tuomas Sandholm (Carnegie Mellon University)
  68. Logarithmic Regret for Online Control with Adversarial Noise
    Dylan Foster (MIT) · Max Simchowitz (UC Berkeley)
  69. Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings
    Jesse Zhang (UC Berkeley) · Brian Cheung (UC Berkeley) · Chelsea Finn (Stanford) · Sergey Levine (UC Berkeley) · Dinesh Jayaraman (University of Pennsylvania)
  70. Representations for Stable Off-Policy Reinforcement Learning
    Dibya Ghosh (Google) · Marc Bellemare (Google Brain)
  71. Multi-Step Greedy Reinforcement Learning Algorithms
    Manan Tomar (Indian Institute of Technology, Madras) · Yonathan Efroni (Technion) · Mohammad Ghavamzadeh (Facebook AI Research)
  72. Neural Network Control Policy Verification With Persistent Adversarial Perturbation
    Yuh-Shyang Wang (Argo AI) · Tsui-Wei Weng (MIT) · Luca Daniel (MIT)
  73. Estimating Q(s,s') with Deterministic Dynamics Gradients
    Ashley Edwards (Uber AI) · Himanshu Sahni (Georgia Institute of Technology) · Rosanne Liu (Deep Collective) · Jane Hung (Uber) · Ankit Jain (Uber AI Labs) · Rui Wang (Uber AI) · Adrien Ecoffet (OpenAI) · Thomas Miconi (Uber AI Labs) · Charles Isbell (Georgia Institute of Technology) · Jason Yosinski (Uber Labs)
  74. CURL: Contrastive Unsupervised Representation Learning for Reinforcement Learning
    Michael Laskin (UC Berkeley) · Pieter Abbeel (UC Berkeley & Covariant) · Aravind Srinivas (UC Berkeley)
  75. Inferring DQN structure for high-dimensional continuous control
    Andrey Sakryukin (National University of Singapore) · Chedy Raissi (INRIA) · Mohan Kankanhalli (National University of Singapore,)
  76. R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games
    Zhongxiang Dai (National University of Singapore) · Yizhou Chen (National University of Singapore) · Bryan Kian Hsiang Low (National University of Singapore) · Patrick Jaillet (MIT) · Teck-Hua Ho (National University of Singapore)
  77. Revisiting Fundamentals of Experience Replay
    William Fedus (University of Montreal/Google Brain) · Prajit Ramachandran (Google) · Rishabh Agarwal (Google Research, Brain Team) · Yoshua Bengio (Mila / U. Montreal) · Hugo Larochelle (Google Brain) · Mark Rowland (DeepMind) · Will Dabney (DeepMind)
  78. Predictive Coding for Locally-Linear Control
    Rui Shu (Stanford University) · Tung Nguyen (VinAI Research) · Yinlam Chow (Google) · Tuan Pham (VinAI) · Khoat Than (VinAI & HUST) · Mohammad Ghavamzadeh (Facebook) · Stefano Ermon (Stanford University) · Hung Bui (VinAI Research)
  79. Efficiently Solving MDPs with Stochastic Mirror Descent
    Yujia Jin (Stanford University) · Aaron Sidford (Stanford)
  80. Hierarchically Decoupled Morphological Transfer
    Donald Hejna (UC Berkeley) · Lerrel Pinto (NYU/Berkeley) · Pieter Abbeel (UC Berkeley)
  81. Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics
    Arsenii Kuznetsov (Samsung) · Pavel Shvechikov (Samsung Artificial Intelligence Center ) · Alexander Grishin (Higher School of Economics) · Dmitry Vetrov (Higher School of Economics, Samsung AI Center Moscow)
  82. Invariant Causal Prediction for Block MDPs
    Clare Lyle (University of Oxford) · Amy Zhang (McGill University) · Angelos Filos (University of Oxford) · Shagun Sodhani (Facebook AI Research) · Marta Kwiatkowska (Oxford University) · Yarin Gal (University of Oxford) · Doina Precup (McGill University / DeepMind) · Joelle Pineau (McGill University / Facebook)
  83. Task-Oriented Active Perception and Planning in Environments with Partially Known Semantics
    Mahsa Ghasemi (The University of Texas at Austin) · Erdem Bulgur (University of Texas at Austin) · Ufuk Topcu (University of Texas at Austin)

参考:

编辑于 2020-06-12 22:30