# DRL-MPC **Repository Path**: wenb11/DRL-MPC ## Basic Information - **Project Name**: DRL-MPC - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-03-04 - **Last Updated**: 2025-03-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # README # Introduction In this project, we developed and simulated a new algorithm that combines model predictive control and deep reinforcement learning, to improve the safety and comfort of autonomous vehicles. At the same time, we tested the improved algorithm under three different autonomous driving scenarios, and compared it with the situation of using deep reinforcement learning alone and using model predictive control algorithm alone, proving the superiority of the fusion algorithm. # Project structure The project contains three files: ## Straightobs The ego vehicle needs to drive along a straight line and avoid obstacles (including fixed obstacles and sudden pedestrians). ## Overtaking ego vehicle needs to complete overtaking. ## Turnobs The ego vehicle needs to turn and avoid oncoming vehicles. # Code Introduction Each folder contains the following files: ## MPC OUT.py This file is an MPC algorithm file. This file mainly includes the following contents: #### shift Function The shift function calculates the next state of the vehicle using the given dynamics and control inputs. It also handles the obstacle avoidance constraint. #### Define Parameters and Symbols This section defines various parameters such as sampling interval (T), prediction horizon (N), vehicle and lane dimensions, and symbolic variables for the vehicle's state and control inputs. #### Vehicle Dynamics The code models the vehicle's dynamics using state-space equations. It computes matrices (AA, BB, CC) that represent the continuous-time linearized dynamics. #### Define Cost Function The cost function is defined with weight matrices Q and R. It penalizes deviations from the reference trajectory and control inputs. #### Define Optimization Variables The decision variables for optimization (U, P, X) are defined along with constraints. #### MPC Optimization The code sets up an optimization problem using CasADi's nlpsol and solves it iteratively for each time step. It considers obstacle avoidance constraints and reference trajectories. #### Simulation The code simulates the vehicle's trajectory and obstacle avoidance behavior. It updates the control inputs based on the optimization results and iterates through time steps. #### Visualization The code includes visualization using matplotlib to plot the vehicle's trajectory, reference trajectory, and obstacle. ## MPC FUSION.py This file is an MPC fusion algorithm file. In addition to the above-mentioned functions, the code also realizes the real-time communication function through the socket. The code can be divided into the following main sections: #### Socket Initialization Create a socket server to communicate with a client (not included in this code snippet). #### shift Function Define the shift function, which computes the next state and control inputs based on the current state, controls, and the system dynamics. #### Model Parameters Define the parameters of the vehicle model, including time step T, prediction horizon N, and various physical parameters. #### State and Control Variables Define symbolic variables for states (Vx, Vy, x, y, theta, vtheta) and control inputs (ax, delta). #### Vehicle Dynamics Define the vehicle dynamics using symbolic expressions for the state derivatives. The dynamics model is represented by matrices AA, BB, and CC. #### Cost Function Define the cost function for MPC optimization, which includes state and control input weights (Q and R, respectively). #### Constraints Set constraints on the state and control variables, including velocity limits, position limits, and control input limits. #### Optimization Problem Set up the optimization problem using CasADi's NLP solver. Define the objective function, constraints, and initial conditions. #### MPC Loop Implement the main MPC control loop, which iteratively solves the optimization problem and updates the control inputs. #### Visualization Visualize the vehicle's trajectory, obstacles, and control inputs using Matplotlib.



Figure 1. Fusion theory plot.

## PPO TRAIN.py This file is used to train deep reinforcement learning neural network. The document uses the PPO algorithm to train an agent to control a vehicle in a simulated environment: It defines an Actor-Critic neural network architecture for the agent, which is used to learn both the policy and the value function. The agent interacts with the environment, collects experiences, and updates its policy using PPO. The code contains hyperparameters that you can adjust to customize the training process. During training, the code will print information about the training progress, including the rewards obtained by the agent.



Figure 2. Actor critic theory plot.

## PPO TEST.py The code can be divided into the following main sections: #### Memory Class The Memory class is defined to store information about actions, states, log probabilities, rewards, and whether the episode terminated. It is used to accumulate data for training the agent. #### ActorCritic Class The ActorCritic class defines the actor-critic neural network architecture. The actor network outputs action probabilities for continuous actions, and the critic network estimates state values. This class also includes methods for action selection (act) and policy evaluation (evaluate). #### PPO Class The PPO class encapsulates the PPO algorithm. It includes methods for selecting actions (select_action) and updating the policy using the PPO algorithm (update). It also handles loading a pretrained policy for evaluation purposes. #### Visualization Function The myplot function is defined for plotting the vehicle's trajectory and obstacles during evaluation. #### Main Function The main function contains the hyperparameters, training loop, and evaluation loop. It trains the agent to control the vehicle and evaluates the agent's performance in a simulated environment. ## PPO FUSION.py The code can be divided into the following main sections: ### ActorCritic and PPO Classes #### ActorCritic This class defines the actor-critic neural network model. It consists of an actor network for generating actions and a critic network for estimating state values. #### PPO This class implements the Proximal Policy Optimization algorithm. It contains methods for selecting actions, updating the policy, and performing evaluations. ### Memory Class #### Memory This class is responsible for storing and clearing the memory of the agent. It stores actions, states, log probabilities, rewards, and terminal flags. ### myplot Function myplot(xground, yground): This function is responsible for creating a 2D plot of the vehicle's trajectory in the environment. It is used to visualize the agent's actions and the vehicle's path. ### main Function main(): The main function orchestrates the training and testing of the reinforcement learning agent. It includes hyperparameters, environment setup, training loops, and testing loops. The agent learns to control the vehicle in the environment. ## ENV.py Run this file to see the test environment. ## Mydynamic.py It contains the vehicle dynamics model, which can be called by other files. In addition, each folder also contains the model files with the best training results so far for reference. # Model Dependencies ## overall Dependencies Python (>= 3.6) nvidia GPU ## MPC OUT.py Dependencies CasADi (https://web.casadi.org/) numpy matplotlib imageio ## MPC FUSION.py Dependencies CasADi numpy matplotlib imageio socket pickle ## PPO TRAIN.py Dependencies PyTorch (>= 1.0) numpy matplotlib mydynamic ## PPO TEST.py Dependencies PyTorch (>= 1.0) numPy matplotlib ## PPO FUSION.py Dependencies PyTorch (>= 1.0) numpy matplotlib math socket pickle The above dependencies can be installed through ```bash pip install models-you-need ``` # Useage ## Non-fused files After configuring the environment and installing dependencies, you can run it directly in pycharm or use the following command. For example if you run MPC OUT.py in turnobs file: ```bash cd .\final\turnobs\ python '.\MPC OUT.py' ``` The simulation result can be obtained which shows in img file(mpc turn.mp4) and the following is the screen shoot:



Figure 3. screen shoot of the mpc turn.

In the simulation results, the blue rectangle represents the ego vehicle. The blue lines represent the road boundaries and the green dashed line represents the tracked path. The units of the x-axis and y-axis in the figure are meters[m]. And you can also train the PPO model use ```bash cd .\final\turnobs\ python '.\PPO TRAIN.py' ``` After training you can get the PPO_continuous_solved_{}.pth model You can also test the model use ```bash cd .\final\turnobs\ python '.\PPO TEST.py' ``` ## Fusion file If it is a file with FUSION, you need to run PPO FUSION.py first, and then run MPC FUSION.py directly in pycharm or use the following command. ```bash cd .\final\turnobs\ python '.\MPC FUSION.py' ``` open a new terminal: ```bash cd .\final\turnobs\ python '.\PPO FUSION.py' ``` After waiting for the two files to establish a connection through the socket, the simulation result can be obtained which shows in img file(fusion turn.mp4) and the following is the screen shoot :



Figure 4. Screen shoot of fusion turn.

In the simulation results, the blue rectangle represents the ego vehicle. The blue lines represent the road boundaries and the green dashed line represents the tracked path. The units of the x-axis and y-axis in the figure are meters[m]. # result The video can be seen in img file. Following is the figure results. In the simulation results, the blue rectangle represents the ego vehicle. The blue lines represent the road boundaries and the green dashed line represents the tracked path. The units of the x-axis and y-axis in the figure are meters[m]. ## Obstacle avoidance ### PPO



Figure 5. Straight obstacle avoiding results with PPO algorithm.

### MPC



Figure 6. Straight obstacle avoiding results with mpc algorithm.

### fusion



Figure 7. Straight obstacle avoiding results with fusion algorithm.

## Overtaking ### PPO



Figure 8. Overtaking results with PPO algorithm.

### MPC



Figure 9. Overtaking results with mpc algorithm.

### fusion



Figure 10. Overtaking results with fusion algorithm.

## Turning and obstacle avoidance ### PPO



Figure 11. Turn and avoid oncoming vehicles forecasting results with PPO algorithm.

### MPC



Figure 12. Turn and avoid oncoming vehicles forecasting results with mpc algorithm.

### fusion The fusion result shows as below



Figure 13. Turn and avoid oncoming vehicles forecasting results with fusion algorithm.

# Author Idea: Wenjun Liu Code writing: Ziting Huang 2023/9/8