# Simple Reinforcement Learning **Repository Path**: simon1239/simple-reinforcement-learning ## Basic Information - **Project Name**: Simple Reinforcement Learning - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-09-29 - **Last Updated**: 2024-09-30 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Simple Reinforcement Learning ### 介绍 {**以下是 Gitee 平台说明，您可以替换此简介** Gitee 是 OSCHINA 推出的基于 Git 的代码托管平台（同时支持 SVN）。专为开发者提供稳定、高效、安全的云端软件开发协作平台无论是个人、团队、或是企业，都能够用 Gitee 实现代码托管、项目管理、协作开发。企业项目请看 [https://gitee.com/enterprises](https://gitee.com/enterprises)} ### 软件架构软件架构说明 ### 安装教程 1. 安装python包 python==3.9 pytorch==1.12.1(cpu) gym==0.26.2 pettingzoo==1.23.1 2. xxxx 3. xxxx ### 算法 #### 1、时序差分 * 定义 * 实例： ![img.png](img/img.png) * 数学解释： ![img_1.png](img/img_1.png) ![img_2.png](img/img_2.png) * 对Rt进行加权， * Q函数训练？ ![img_3.png](img/img_3.png) * 所以target比value更可靠，应该让value靠近target。 ![img_4.png](img/img_4.png) ![img_5.png](img/img_5.png) * 代码： ![img.png](img/img.png) #### 2、SARSA算法 * 与Q-Learning的区别 ![img.png](img.png) * 定义 * 数学解释 * 代码 ![img_1.png](img_1.png) * 改进： * 同策略无赤化的SARSA SARSA算法因为是同策略，理论上不能用池化。 ![img_2.png](img_2.png) * 多步的TD目标 之前计算的target只考虑一步的reward，其实可以多考虑几步。target是对Q函数的近似，直接计算Q是不现实的，但是可以通过蒙特卡洛采样估计Q。之前的蒙特卡洛只采样了一步，采样的部署越多，估计的可靠度越高。所以多步采样，理论上起正面作用。 ![img_3.png](img_3.png) * #### 参与贡献 1. Fork 本仓库 2. 新建 Feat_xxx 分支 3. 提交代码 4. 新建 Pull Request #### 特技 1. 使用 Readme\_XXX.md 来支持不同的语言，例如 Readme\_en.md, Readme\_zh.md 2. Gitee 官方博客 [blog.gitee.com](https://blog.gitee.com) 3. 你可以 [https://gitee.com/explore](https://gitee.com/explore) 这个地址来了解 Gitee 上的优秀开源项目 4. [GVP](https://gitee.com/gvp) 全称是 Gitee 最有价值开源项目，是综合评定出的优秀开源项目 5. Gitee 官方提供的使用手册 [https://gitee.com/help](https://gitee.com/help) 6. Gitee 封面人物是一档用来展示 Gitee 会员风采的栏目 [https://gitee.com/gitee-stars/](https://gitee.com/gitee-stars/) ### 问题 * 问题1：AttributeError: module 'numpy' has no attribute 'bool8' * 解决方法：将numpy换成1.23的版本，pip install numpy==1.23.2