# TileSpTRSV

**Repository Path**: luzhengyang007/tilesptrsv

## Basic Information

- **Project Name**: TileSpTRSV
- **Description**: TileSpTRSV : A Tiled Algorithm for Parallel Sparse Triangular Solve on GPUs
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 1
- **Forks**: 0
- **Created**: 2023-08-14
- **Last Updated**: 2023-12-11

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# TileSpTRSV
**tilesptrsv** implements tiled algorithms for parallel SpTRSV on modern GPUs, and design an adaptive selection method that can automatically select the best formats and kernels according to input sparsity structures.
## Paper information
Zhengyang Lu, Weifeng Liu. "TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs". CCF Transactions on High Performance Computing(CCF THPC). 2023.
## Contact us

If you have any questions about running the code, please contact Zhengyang Lu.    

E-mail: 2021211259@student.cup.edu.cn
## Introduction
Sparse triangular solve (SpTRSV) is one of the most important level-2 kernels in sparse basic linear algebra subprograms
(BLAS). Compared to another level-2 sparse BLAS kernel sparse matrix–vector multiplication (SpMV), SpTRSV is in
general more difficult to find high parallelism on many-core processors, such as GPUs. Nowadays, much work focuses on
reducing dependencies and synchronizations in the level-set and Sync-free algorithms for SpTRSV. However, there is less
work that can make good use of sparse spatial structure for SpTRSV on GPUs. In this paper, we propose a tiled algorithm
called TileSpTRSV for optimizing SpTRSV on GPUs through exploiting 2D spatial structure of sparse matrices. We design
two algorithm implementations, i.e., TileSpTRSV_level-set and TileSpTRSV_sync-free, for TileSpTRSV on
top of level-set and Sync-free algorithms, respectively. By testing 16 representative matrices on a latest NVIDIA GPU, the
experimental results show that TileSpTRSV_level-set gives on average 5.29× (up to 38.10×), 5.33× (up to 21.32×)
and 2.62× (up to 12.87×) speedups over cuSPARSE, Sync-free and Recblock algorithms on the 16 representative matrices,
respectively.
## Installation
NVIDIA GPU with compute capability at least 3.5 (NVIDIA 4090 as tested) * NVIDIA nvcc CUDA compiler and cuSPARSE library, both of which are included with CUDA Toolkit (CUDA v11.1 as tested) The GPU test programs have been tested on Ubuntu 18.04/20.04, and are expected to run correctly under other Linux distributions.
## Execution of tilesptrsv
Our test programs currently support input files encoded using the matrix market format. All matrix market datasets used in this evaluation are publicly available from the SuiteSparse Matrix Collection.  
1. Set CUDA path in the Makefile
2. The command 'make' generates an executable file 'test' for double precision.
```
make
```
3. Run SpTRSV code on matrix data with auto-tuning in double precision. The GPU compilation takes four optionals: d=<gpu-device, e.g., 0> parameter that specifies the GPU device to run if multiple GPU devices are available at the same time and <forward/backward> parameter specifies the input matrix is lower/upper triangular matrix.
```
./test -d 0 Name.mtx.
```