# summarize-from-feedback

**Repository Path**: swner/summarize-from-feedback

## Basic Information

- **Project Name**: summarize-from-feedback
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 1
- **Forks**: 0
- **Created**: 2020-09-14
- **Last Updated**: 2024-05-29

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

**Status:** Archive (code is provided as-is, no updates expected)

# Learning to Summarize from Human Feedback

This repository contains code to run our models, including the supervised baseline, the trained reward model, and the RL fine-tuned policy.

Supported platform: Python 3.7 64-bit on Ubuntu 18.04

## Install

- Install [pipenv](https://github.com/pypa/pipenv#installation).

- Clone this repo.  Then, inside it:
  ```
  pipenv install
  ```

## Run the models

You'll need to run this on a machine with an Nvidia GPU.

First, let's run some tests to make sure everything is working.
```
pipenv run exps/sample.py test test-sample
pipenv run exps/eval_rm.py test test-eval
```

Now let's run some actual evaluations. We can have the model summarize some posts from the validation set:
```
pipenv run exps/sample.py ppo_xl sample-ppo-xl --num_queries 32
```
This will output to `/tmp/jobs/sample-ppo-xl/results/`.

Now we can evaluate them using the reward model:
```
pipenv run exps/eval_rm.py rm4 eval-rm4 --input_path /tmp/jobs/sample-ppo-xl/results/ 
```
This will print some aggregate statistics and output scores for each sample to `/tmp/jobs/eval-rm4/results/`.