# minimal_policy_gradient **Repository Path**: dragon515/minimal_policy_gradient ## Basic Information - **Project Name**: minimal_policy_gradient - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-16 - **Last Updated**: 2025-07-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README A minimal implementation of Policy Gradient in reinforcement learning. 中文视频教程:【大白话强化学习之 Policy Gradient】 https://space.bilibili.com/90303434/lists/4898073?type=season Our target is to keep the code easy enough to read and understand the theory behind gradient policy mathematical derivation: ![](policy_gradient_derivation.png) # Training call `train()` to start training. Press `S` to save a checkpoint, press `Q` to abort the training. We are using CartPole-v0 as an example to train. The training should converge to max steps (500 steps) around **200 to 1000** steps. ![training](training.png) # Evaluating call `eval()` to start evaluate a checkpoint. Press `Q` to abort evaluating. # Thanks A great thanks to: https://huggingface.co/learn/deep-rl-course/unit4/pg-theorem https://github.com/Finspire13/pytorch-policy-gradient-example