# modeling_long_term_future **Repository Path**: facebookresearch/modeling_long_term_future ## Basic Information - **Project Name**: modeling_long_term_future - **Description**: Code for ICLR 2019 paper Learning Dynamics Model by Incorporating the Long Term Future - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: https://arxiv.org/abs/1903.01599 - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-07-24 - **Last Updated**: 2023-07-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README This repo contains code for our paper [Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future](https://arxiv.org/abs/1903.01599) The code base contains multiple branches. - The main branch contains experiments for the BabyAI tasks. - The mujoco branch contains experiments for the Mujoco tasks. - The carracing branch contains experiments for CarRacing task. Based on code base for the BabyAI project at Mila. https://github.com/mila-iqia/babyai Follow similar installations as in https://github.com/mila-iqia/babyai. Requirements: ## Installation Requirements: - Python 3.5+ - OpenAI Gym - NumPy - PyQT5 - PyTorch 0.4.1+ Start by manually installing PyTorch. See the [PyTorch website](http://pytorch.org/) for installation instructions specific to your platform. Then, clone this repository and install the other dependencies with `pip3`: git clone https://github.com/facebookresearch/modeling_long_term_future.git cd modeling_long_term_future pip3 install --editable . Create a new conda env using env.yml in the repo ## Training teacher We use the BabyAI Pickup-Unlock game. First train the teacher (for imitation learning) using PPO with curriculum learning. Start with a room size of 6 and then work our way up to room size of 15. python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0 --algo ppo --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 6 python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0 --algo ppo --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 8 --model MODEL_ROOM6_PRETRAINED python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0 --algo ppo --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 10 --model MODEL_ROOM8_PRETRAINED python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0 --algo ppo --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 12 --model MODEL_ROOM10_PRETRAINED python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0 --algo ppo --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 15 --model MODEL_ROOM12_PRETRAINED ## Generate expert trajectories Generate expert trajectories from the experts trained using curriculum learning mnkdir data python3 -m scripts.gen_samples --episodes 10000 --env BabyAI-UnlockPickup-v0 --model pretrained_model_room_10 --room 10 ## Training the student to imitate the expert To run our model python3 -m scripts.zforcing_main_state_dec --env BabyAI-UnlockPickup-v0 --datafile EXPERT_DATA_TO_LOAD --model MODEL_NAME --eval-episodes 100 --eval-interval 200 --bwd-weight 0.0 --lr 1e-4 --aux-weight-start 0.0001 --bwd-l2-weight 1. --kld-weight-start 0.2 --aux-weight-end 0.0001 --room 10 To run the baseline python3 -m scripts.zforcing_main_state_dec --datafile EXPERT_DATA_TO_LOAD --env BabyAI-UnlockPickup-v0 --model MODEL_NAME --eval-episodes 100 --eval-interval 200 --bwd-weight 0.0 --lr 1e-4 --aux-weight-start 0.000 --aux-weight-end 0.0 --room 10 ## License Find license in [LICENSE](LICENSE) file.