Loading [MathJax]/jax/output/HTML-CSS/jax.js
1 Star 0 Fork 1

codezhong/Imitation-Learning-Paper-Lists

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README

Imitation-Learning-Paper-Lists

Paper Collection for Imitation Learning in RL with brief introductions. This collection refers to Awesome-Imitation-Learning and also contains self-collected papers.

To be precise, the "imitation learning" is the general problem of learning from expert demonstration (LfD). There are 2 names derived from such a description, which are Imitation Learning and Apprenticeship Learning due to historical reasons. Usually, apprenticeship learning is mentioned in the context of "Apprenticeship learning via inverse reinforcement learning (IRL)" which recovers the reward function and learns policies from it, while imitation learning began with behavior cloning that learn the policy directly ref. However, with the development of related researches, "imitation learning" is always used to represent the general LfD problem setting, which is also our view of point.

Typically, different settings of imitation learning derive to different specific areas. A general setting is that one can only obtain (1) pre-collected trajectories ((s,a) pairs) from uninteractive expert (2) he can interact with the environments (with simulators) (3) without reward signals. Here we list some of the other settings as below:

  1. No actions and only state / observations -> Imitation Learning From Observations (ILFO).

  2. With reward signals -> Imitation Learning with Rewards.

  3. Interactive expert for correctness and data aggregation -> On-policy Imitation Learning (begin as Dagger, Dataset Aggregation).

  4. Can not interact with Environments -> Batch RL (see a particular list in here.)

What we want from imitation learning in different settings (for real world):

  1. Less interact with the real world environments with expert demonstrations to improve sample efficiency and learn good policies. (yet some works use few demonstrations to learn good policies but with a vast cost on interacting with environments)

  2. Real world actions are not available or hard to sample.

  3. Use expert data to improve sample efficiency and learn fast with good exploration ability.

  4. Some online setting that human are easily to join in, e.g., human correct the steering wheel in auto-driving cars.

  5. Learn good policies in real world where interact with the environment is difficult.

In this collection, we will concentrate on the general setting and we collect other settings in "Other Settings" section. For other settings, such as "Self-imitation learning" which imitate the policy from one's own historical data, we do not regard it as an imitation learning task.

These papers are classified mainly based on their methodology instead and their specific task settings (except single-agent/multi-agent settings) but since there are many cross-domain papers, the classification is just for reference. As you can see, much works focus on Robotics, especially papers of UCB.

Overview

Single-Agent

Reveiws&Tutorials

Behavior Cloning

Behavior Cloning (BC) directly replicating the expert’s behavior with supervised learning, which can be improved via data aggregation. One can say that BC is the simplest case of interactive direct policy learning.

One-shot / Zero-shot

Model based

Hierarchical RL

Multi-modal Behaviors

Learning with human preference

Inverse RL

Inverse Rinforcement Learning (IRL) learns hidden objectives of the expert’s behavior.

Reveiws&Tutorials

Papers

Generative Adversarial Methods

Generative Adversarial Imitation Learning (GAIL) apply generative adversarial training manner into learning expert policies, which is derived from inverse RL.

Multi-modal Behaviors

Hierarchical RL

Task Transfer

Model-based

POMDP

Support Estimation Methods

Recently, there is a paper designs a new idea for imitation learning, which learns a fixed reward signal which obviates the need for dynamic update of reward functions.

Goal-based methods

Other Methods

Multi-Agent

MA Inverse RL

MA-GAIL

Other Settings

Imitation Learning from Observations

Review Papers

Regular Papers

Imitation Learning with rewards

On-policy Imitation Learning

Batch RL

see a particular list in here.

Applications

空文件

简介

Paper Collection for Imitation Learning in RL. 展开 收起
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/codezhong/Imitation-Learning-Paper-Lists.git
git@gitee.com:codezhong/Imitation-Learning-Paper-Lists.git
codezhong
Imitation-Learning-Paper-Lists
Imitation-Learning-Paper-Lists
master

搜索帮助