# BOCA

**Repository Path**: leonexu/BOCA

## Basic Information

- **Project Name**: BOCA
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-05-23
- **Last Updated**: 2024-05-30

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

软件介绍： 

脚本自动找到最佳的编译选项的方法，论文中提到自动的贝叶斯优化算法，测试有限的变量空间，自动找寻最佳的编译选项。 


复现过程： 

一、安装必要的软件：sudo apt install clang opt llvm 

二、复现方法 

```bash
cd BOCA 
python -n venv ./venv 
source venv/bin/activate 
pip install numpy -i https://pypi.mirrors.ustc.edu.cn/simple/ 
pip install scikit-learn -i https://pypi.mirrors.ustc.edu.cn/simple/ 
cd examples 
python boca.py 
``` 

三、编译正常的gcc的原始执行程序的方法  

1、polybench 3mm复现：./test_polybench_3mm.sh  

2、cbench复现： ./test_cbench.sh

3、csmith复现： ./test_csmith.sh


# Efficient Compiler Autotuning via Bayesian Optimization(BOCA)

This is the implementation repository of our approach: **Efficient Compiler Autotuning via Bayesian Optimization**.

## Description

`BOCA` is  the first Bayesian optimization based approach for compiler autotuning. The goal is to tune compiler's optimization flags as efficient as possible in order to achieve the required runtime performance of the compiled programs. We further propose a searching strategy in `BOCA` to improve the efficiency of Bayesian optimization that can strike a balance between exploitation and exploration. We conduct extensive experiments to investigate the effectiveness of `BOCA` based on `GCC 4.7.7` , `LLVM 2.9 `and `GCC 8.3.1`. `BOCA` significantly outperforms the state-of-the-art compiler autotuning approaches and Bayesion optimization methods in terms of the time spent on achieving specified speedups, demonstrating the effectiveness of `BOCA`.

## Compilers/Benchmarks/Result

### Compilers/Benchmarks

We used two most popular C compilers (i.e., `GCC` and `LLVM`) and two widely-used C benchmarks (i.e., `CBench` and `PolyBench`), which have been widely used in many existing studies. Also 5 Csmith generated programs. 


| Compiler | Version | Optimization flags number |
| -------- | ------- | ------------------------- |
| GCC      | 4.7.7   | 71                        |
| LLVM     | 2.9     | 64                        |
| GCC      | 8.3.1   | 86                        |

Under the directory `flaglist`, there are two files: `gcc4flags.txt`,` llvmflags.txt` and  `gcc8flags.txt`, which contain flags used for `GCC` compiler and for `LLVM` compiler seperately.


| ID      | Program             | Number of Source Lines of Code |
| ------- | ------------------- | ------------------------------ |
| C1      | consumer_jpeg_c     | 26,950                         |
| C2      | security_sha        | 297                            |
| C3      | automotive_bitcount | 954                            |
| C4      | automotive_susan_e  | 2,129                          |
| C5      | automotive_susan_c  | 2,129                          |
| C6      | automotive_susan_s  | 2,129                          |
| C7      | bzip2e              | 7,200                          |
| C8      | consumer_tiff2rgba  | 22,321                         |
| C9      | telecom_adpcm_c     | 389                            |
| C10     | office_rsynth       | 5,412                          |
| P1      | 2mm                 | 252                            |
| P2      | 3mm                 | 267                            |
| P3      | cholesky            | 212                            |
| P4      | jacobi-2d           | 200                            |
| P5      | lu                  | 210                            |
| P6      | correlation         | 248                            |
| P7      | nussinov            | 569                            |
| P8      | symm                | 231                            |
| P9      | heat-3d             | 211                            |
| P10     | covariance          | 218                            |
| CSmith1 | trainprogram1.c     | 4160                           |
| CSmith2 | trainprogram2.c     | 11793                          |
| CSmith3 | trainprogram3.c     | 10049                          |
| CSmith4 | trainprogram4.c     | 8703                           |
| CSmith5 | trainprogram5.c     | 12246                          |

`CBench`, `Polybench` and `5 Csmith programs` are under the `cbench`, `polybench` and `Csmith` directory respectively.


### Result

Under this folder, a .pdf file shows the results from 20th iteration to 60th iteration. In this file, a cross means that the approaches timed out in the experiment.
Also, the optimal sequences of different programs are shown in important_flags.txt, where flags among the impactful ones are listed before those that are not impactful. The two kinds of flags are delimited by a "||".
Raw_data_for_results.txt shows the speedups and time of different methods. The .pdf file is generated from this. In the header of this file, there are instructions explaining how to read the file.


## Reproducibility

### Environment

Our study is conducted on a workstation with 16- core Intel(R) Xeon(R) CPU E5-2640 v3, 126G memory, and CentOS 6.10 operating system. (**Note: in docker environment**)

### Source code

Under `examples` directory, there are source codes written by us. 

| script     | Description                                                  |
| ---------- | ------------------------------------------------------------ |
| `boca.py`  | Compiler Autotuning via Bayesian Optimization.               |
| `bocas.py` | BOCA with the local search strategy used in SMAC.            |
| `ga.py`    | Genetic Algorithm based Iterative Optimization (initial size of 4). |
| `irace.py` | Irace-based Iterative Optimization.                          |
| `rio.py`   | Random Iterative Optimization.                               |
| `tpe.py`   | an advanced general Bayesian optimization.                   |


### Compile with GCC/LLVM

They are examples showing commands we used to compile benchmark programs with C compilers (i.e., `GCC` and `LLVM`) .

#### GCC

Compile `CBench` programs under current directory using `GCC` with optimization flag `-targetlibinfo` and `-tti` using command line.

```bash
gcc -fbranch-count-reg -fcaller-saves -c *.c 
gcc -fbranch-count-reg -fcaller-saves -lm *.o
time ./a.out # execute
```

Compile `Polybench` programs `3mm` using `GCC` with optimization flag `-fbranch-count-reg` and`-fcaller-saves `using command line.

```bash
gcc -O2 -fbranch-count-reg -fcaller-saves -I utilities -I linear-algebra/kernels/3mm utilities/polybench.c linear-algebra/kernels/3mm/3mm.c -lm -DPOLYBENCH_TIME -o 3mm_time
time ./3mm_time # execute
```

Compile `Csmith` program `trainprogram1.c` using `GCC` with optimization flag `-fbranch-count-reg` and`-fcaller-saves` using command line.

```bash
gcc -g -Wall -std=c99 -O2 -fbranch-count-reg -fcaller-saves trainprogram1.c -I csmith_home/runtime -o a.out
time ./a.out # execute
```


#### LLVM

There are more steps to compile program(s) with `LLVM` than with `GCC` since the front, middle and back ends of `LLVM` are controlled by three commands instead of one.

Compile `CBench` programs under current directory using LLVM with optimization flag `-targetlibinfo` and `-tti` using Python.

```python
for file in cur_dir:
	if file.endswith('.c') and file.startswith('._') != True:

		clangcmd = 'clang -O0 -scalarrepl  -emit-llvm -c -I./ ' + file + ' -o ' + file[:-1] + 'bc'	
		optcmd = 'opt -targetlibinfo -tti -S ' + file[:-1] + 'bc -o ' + file[:-1] + 'opt.bc'
		llccmd = 'llc  -O2 -filetype=obj ' + file[:-1] + 'opt.bc -o ' + file[:-1] + 'o'
    
		os.system(optcmd)
		os.system(clangcmd)
		os.system(llccmd)

cmd = 'clang   -O0 -scalarrepl -lm *.o'
os.system(cmd)

os.system('time ./a.out') # execute
```

Compile `Polybench` program `3mm` using `LLVM` with optimization flag `-targetlibinfo` and `-tti` using Python.

```python
files = os.listdir('./')
files.append('../utilities/polybench.c')

for file in 3mm_dir:
	if file.endswith('.c') and file.startswith('._') != True:
		file_name = file.split('/')[-1][:-1]
		clangcmd = 'clang -O0 -scalarrepl  -emit-llvm -I ../utilities -I ./ -c ' + file + ' -o ' + file_name + 'bc '
		optcmd = 'opt -targetlibinfo -tti -S ' + file_name + 'bc -o ' + file_name + 'opt.bc '
		llccmd = 'llc  -O2 -filetype=obj ' + file_name + 'opt.bc -o ' + file_name + 'o '
		
		os.system(optcmd)
		os.system(clangcmd)
		os.system(llccmd)
cmd = 'clang -O0 -scalarrepl -lm *.o '
os.system(cmd) 

os.system('time ./a.out') # execute
```

Compile `Csmith` program `test.c` using `LLVM` with optimization flag `-targetlibinfo` and `-tti` using command line.

```bash
clang -O0 -emit-llvm -c -I csmith_home/runtime trainprogram1.c -o trainprogram1.c.bc
opt -targetlibinfo -tti -S trainprogram1.c.bc -o trainprogram1.c.opt.bc
llc -O3 -filetype=obj trainprogram1.c.opt.bc -o trainprogram1.c.o
clang -O0 -lm *.o 

time ./a.out # execute
```


In `boca.py`, we compile and execute a program from Polybench by GCC compiler. In `ga.py`, we show how we test the approach on a program from CBench by GCC compiler. In  `rio.py` compiles and executes a program from Polybench by LLVM compiler. 

All combinations of programs and compilers can be used in the experiment in a similar way to these examples.