# AIALoopMode

**Repository Path**: xfluidsolid/aialoop-mode

## Basic Information

- **Project Name**: AIALoopMode
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-04-09
- **Last Updated**: 2021-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# OptimizationPlatform

The platform is used to optimize some host codes in GPU by cuda c language.
The code just runs in one GPU. 
## 1. how to compile and run 
### 1.1 compile the code
[username]$make
By make, the execution file `OptimizeGPU` is established. 
### 1.2 run the code
[username]$cd Exaple
[username]$yhrun -n 1 ../OptimizeGPU
### 1.3 precision
Both float and double precisions are applied. The precision can be chosen in makefile.
For example, by below set in makefile, float precision is chosen.
> PRECISION = SINGLEPRECISION  
> ##PRECISION = DOUBLEPRECISION
Similarly, double precision is chosen by below set.
> ##PRECISION = SINGLEPRECISION  
> PRECISION = DOUBLEPRECISION

### 1.4 Simulation Results
The test case is in Example, which contains the geometry topology of unstructured mesh.
All of GPU functoins and host functions in runXXXX framework runs and the results are validated like:
```
setNcountQNodeTNodeZeroDevice
Validate CellLoopFinalNoAtomic
The maximum error is 3.996999740600585937500000000000e+00 on 2, 15039 term of qNode
The maximum error is 2.609254121780395507812500000000e+00 on 0, 2989 term of tNode
setNcountQNodeTNodeZeroDevice
Validate CellLoopFinal
The maximum error is 1.430511474609375000000000000000e-06 on 2, 115185 term of qNode
The maximum error is 9.536743164062500000000000000000e-07 on 0, 44647 term of tNode
... ...
```
The maximum error is the difference between host and gpu functions.
After the simulation, `performanceCPU` and `performanceGPU` is the final results.
The executing time is recorded in performanceCPU and performanceGPU.
### 1.5 Explain of performanceGPU and performanceCPU
```
No.     Name            Parent     Elapsed Time    frequency
1       GPUNodeLoopFinal1       CallGPUNodeLoopNCountQNodeTNodeCalFinal 0.038987        100
2       GPUNodeLoopFinal2       CallGPUNodeLoopNCountQNodeTNodeCalFinal 0.044714        1
```
No. is the label of a timer.
Name is the name of a timer.
Parent is the function name, where the computing spot in.
Elapsed time is the time spent in computing spot.
frequency is the frequency of computing spot.
## 2. code structure
Main
RunOptimizations
HostFunctions GPUKernels Validation
GlobalVariables
DeviceControl Timer

Main: chose optimization module to run, create and output timers  

RunOptimizations: optimization framework

HostFunctions: The function should be ported to GPU, the original face loop is often changed into volume-loop or node-loop

GPUKernels: GPU kernels corresponding to host functions and some optimized forms

Validations: Test functions for comparing results between GPUKernels and HostFunctions
GlobalVariables: Host variables and Device variables

DeviceControl: control GPU

Timer: record executing time of CPU and GPU
## 3. programer guide
### 3.1 Main/OptimizationFrame.cu: main function  
runXXXX is the framework for host and gpu functions.
```c++
//runBoundaryQlQrFix(); //Optimize GPUBoundaryQlQrFix
runQTNcountCompInterior(); //Optimize GPUInteriorFaceNCountQNodeTNodeCal
//runCompGradientGGNodeFaceCal();//Optimize CompGradientGGNodeInteriorFaceCal
//runReconstructFaceValue();//Optimize GPUReconstructFaceValue
```
4 framework have been established. In one simulation, you had better just choose one framework to run. Many framework may make so many results.

### 3.2 RunOptimizations/runXXXXX.cu: framework to collect gpu and host functions
Taking runQTNcountCompInterior.cu for instance:
The entrance function is `runQTNcountCompInterior()`.
```c++
void runQTNcountCompInterior(){
        preProcessQTNcountCompInterior();
        int loopID;
        for (loopID = 0; loopID < LOOPNUM; loopID++){
		... ...; //host and gpu functions, or validatoin functions
	}
}
```
The function runs `preProcessQTNcountCompInterior()` firstly.
Then, gpu, host functions or validation functions are ran in one loop.
The loop is used for the average of executing time. In the loop, gpu and host functions are described as below
```c++
	for (loopID = 0; loopID < LOOPNUM; loopID++){
                CallHostCellLoopNCountQNodeTNodeCalFinal(loopID);
                CallGPUNodeLoopNCountQNodeTNodeCalFinal(loopID);
                CallHostNodeLoopNCountQNodeTNodeCalFinal(loopID);
                CallGPUFaceNCountQNodeTNodeCal(loopID);
                CallHostFaceNCountQNodeTNodeCal(loopID);
                CallGPUCellLoopNCountQNodeTNodeCalFinal(loopID);
	}
```
CallHostXXX is host function in HostFunctions/HostXXXX.cu
CallGPUXXX is gpu function in GPUkernels/GPUXXXX.cu