# rl **Repository Path**: ymwm233/rl ## Basic Information - **Project Name**: rl - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-03-05 - **Last Updated**: 2024-03-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Task scheduling using Reinforcement Learning + Policy-based agent | 符号 | 含义 | | ------------ | --------------------------------- | | $N_{rest}$ | 剩余任务数 | | $N_p$ | 处理器数 | | $P$ | 处理器集合 | | $SL_i$ | 第 $i$ 个处理器的当前调度长度 | | $e_i$ | 任务在第 $i$ 个处理器上的执行时间 | | $t$ | 当前任务 | | $P_{fit}(t)$ | 任务 $t$ 可以被调度到的处理器结点 | | | | 输入：$state = [N_{rest}, SL_{p_1}, e_{p_1}, SL_{p_2},e_{p_2},...,SL_{p_N},e_{p_N}]$，其中 $p_i\in P_{fit}(t)$ $P(i|state)$ 为调度到第 $i$ 个处理器的概率。输出：$probability\ distribution=[P(p_1|state),P(p_2|state),..., P(p_N|state)]$ ，其中 $p_i\in P_{fit}(t)$

模型图

+ Reward table: $l_i$ 为第 $i$ 个 processor 调度队列的长度 $makespan = \mathop{max}\limits_{i}\ l_i$ 假设有 $N_p$ 个 processors，此时总调度长度为 $l_{tot}= \sum\limits_{i}l_i$。理论上， $makespan_{max}=l_{tot},\ makespan_{min}=l_{tot}/N_p$。 | 评判标准 | 奖励 | 符号 | | ------------ | ------------------------------------------------------------ | ---- | | $makespan$ | $1-\frac{(makespan-makespan_{min})}{(makespan_{max}-makespan_{min})}=\frac{N_p(l_{tot}-makespan)}{(N_p-1)l_{tot}}$ | $m$ | | 任务是否超时 | $\begin{cases} -1 \quad timeout \\ 0 \quad not\ timeout \end{cases}$ | $t$ | 总奖励： $r=m+t$ ![](imgs/learning%20curve.png)