# repairboost-code
**Repository Path**: Wounds/repairboost-code
## Basic Information
- **Project Name**: repairboost-code
- **Description**: 1111111111111111
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 1
- **Created**: 2023-05-30
- **Last Updated**: 2023-05-30
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# RepairBoost
This is the implementation of RepairBoost described in our paper "Boosting Full-Node Repair in Erasure-Coded Storage" appeared in USENIX ATC'21. RepairBoost is a scheduling framework that can assist existing linear erasure codes and repair algorithms to boost the full-node repair performance.
Please contact [linesyoo@gmail.com](mailto:linesyoo@gmail.com) if you have any questions.
## 1. Install
We have tested RepairBoost on Ubuntu16.04 LTS.
### 1.1 Common
- g++ & make & libtool & autoconf & git
```shell
$ sudo apt-get install g++ cmake libtool autoconf git
```
- gf-complete
```shell
$ git clone https://github.com/ceph/gf-complete.git
$ cd gf-complete
$ ./autogen.sh && ./configure && make && sudo make install
```
- jerasure2.0
```shell
$ git clone https://github.com/ceph/jerasure.git
$ cd jerasure
$ autoreconf -if &&./configure && make && sudo make install
```
- redis-3.2.8
Download redis-3.2.8 and install it.
```shell
$ sudo wget http://download.redis.io/releases/redis-3.2.8.tar.gz
$ tar -zxvf redis-3.2.8.tar.gz
$ cd redis-3.2.8
$ make && sudo make install
```
Install redis as a background daemon. You can just use the default settings.
```shell
$ cd utils
$ sudo ./install_server.sh
```
Configure redis to be remotely accessible.
```shell
$ sudo /etc/init.d/redis_6379 stop
```
Edit /etc/redis/6379.conf. Find the line with bind 127.0.0.0 and modify it to bind 0.0.0.0, then start redis.
```shell
$ sudo /etc/init.d/redis_6379 start
```
- hiredis
```shell
$ cd redis-3.2.8/deps/hiredis
$ make && sudo make install
```
Edit /etc/ld.so.conf, append "/usr/local/lib" to the last line.
```shell
$ sudo ldconfig
```
### 1.2 Compile RepairBoost
After finishing the preparations above, download and compile the source code.
```shell
$ cd repairboost-code
$ make
```
## 2. Standalone Test
### 2.1 Prerequisites
#### Configuration File
We configure the settings of RepairBoost via the configuration file config.xml in XML format.
| Property | Description |
| ---- | ---- |
| erasure.code.type | Three types are implemented: RS, LRC, and BUTTERFLY (only n=6, k=4). |
| erasure.code.k | The number of data chunks. |
| erasure.code.n | Total number of chunks in a stripe. |
| lrc.code.l | The number of groups, only valid in LRC. Default is 0. |
| encode.matrix.file | The path of the encoding matrix file. Absolute path is required. |
| packet.size | The size of a packet in units of bytes. |
| packet.count | The number of packets in a chunk. A chunk is partitioned into many smaller packets. |
| repair.method | Three single-chunk repair methods are implemented: cr (i.e., conventional repair), ppr and path(i.e., ecpipe). |
| file.system.type | Two types are supported: standalone and HDFS3. |
| meta.stripe.dir | The directory where the stripe metadata is stored. Absolute path is required. |
| block.directory | The directory where the coded chunks are stored. Absolute path is required. |
| coordinator.address| IP address of the coordinator. |
| helpers.address | IP addresses of all nodes in the system. |
| local.ip.address | IP address of the node itself.|
#### Encoding Matrix File
For RS codes and LRC codes, we read the encoding coefficients from the encoding matrix file. The file contains an (n-k)×k matrix that specifies the coefficients for generating n-k parity chunks from k data chunks.
The file repairboost-code/conf/rsEncMat_6_9 shows a encoding matrix for RS(9, 6) (n=9, k=6):
$$
\begin{bmatrix}
122 &186& 71& 167& 142 &244\\
186 &122 &167 &71 &244 &142\\
173 &157 &221 &152 &61& 170
\end{bmatrix}
$$
Our example uses the file repairboost-code/conf/rsEncMat_3_4 to construct a simple coding matrix for RS(4, 3)(n=4, k=3):
$$
\begin{bmatrix}
1 &1&1
\end{bmatrix}
$$
#### Create Erasure-Coded Chunks
Before testing our standalone system, we need to create stripes of erasure-coded chunks and corresponding metadata. We have provided programs for generating chunks of different codes under repairboost-code/test/.
```shell
$ cd repairboost-code/test
$ make
$ dd if=/dev/urandom iflag=fullblock of=input.txt bs=64M count=3
$ ./createdata_rs ../conf/rsEncMat_3_4 input.txt 3 4
```
For more detail, you can run the command ./createdata_rs (or ./createdata_lrc, etc.) to get the usage of the programs.
Take RS(4, 3) (n=4, k=3) as an example. In repairboost-code/test, we create 3 files of uncoded chunks and 1 file of coded chunks. We can distribute the 4 chunks across 4 nodes, each of which stores one chunk under the path specified by block.directory. The coordinator stores the stripe metadata under the path specified by meta.stripe.dir.
The naming of data files and metadata should meet the requirements. If there are 2 stripes, the data of the stripes are named as follows:
```
stripe_0_file_k1 stripe_0_file_k2 stripe_0_file_k3 stripe_0_file_m1
stripe_1_file_k1 stripe_1_file_k2 stripe_1_file_k3 stripe_1_file_m1
```
The coordinator stores 4 files for each stripe: rs:stripe_0_file_k1_1001, rs:stripe_0_file_k2_1001, rs:stripe_0_file_k3_1001 and rs:stripe_0_file_m1_1002, all of which have the following content:
```
stripe_0_file_k1_1001:stripe_0_file_k2_1001:stripe_0_file_k3_1001:stripe_0_file_m1_1002
```
The file name rs:file_k1_1001 means that the chunk uses Reed-Solomon (RS) code and the chunk name is file_k1. The tail 1001 (resp. 1002) means the chunk is an uncoded chunk (resp. coded chunk).
### 2.2 Run
#### Start RepairBoost
The start script is in repairboost-code/scripts/.
```shell
$ python scripts/start.py
```
#### Full-node Repair Test
Use ssh to connect to any node (not the coordinator) to send a full-node repair request.
```shell
$ ./ECClient
```
#### Stop RepairBoost
The stop script is in repairboost-code/scripts/.
```shell
$ python scripts/stop.py
```
## 3. Hadoop-3 Integration
### 3.1 Prerequisites
- isa-l
```shell
$ git clone https://github.com/01org/isa-l.git
$ cd isal
$ ./autogen.sh && ./configure && make && sudo make install
```
- java8
```shell
$ sudo apt-get purge openjdk*
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository ppa:ts.sch.gr/ppa
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
$ sudo apt install oracle-java8-set-default
```
Then configure the environment variables for java.
```shell
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export PATH=$JAVA_HOME/bin:$PATH
```
Test
```shell
$ java -version
java version "1.8.0_212"
```
- maven
Download apache-maven-3.5.4 on [available mirror](https://downloads.apache.org/maven/maven-3/3.5.4/binaries/apache-maven-3.5.4-bin.tar.gz).
```shell
$ tar -zxvf apache-maven-3.5.4-bin.tar.gz
$ sudo mv apache-maven-3.5.4 /opt/
$ sudo ln -s /opt/apache-maven-3.5.4 /opt/apache-maven
```
Then configure the environment variables for maven.
```shell
export MAVEN_HOME=/opt/apache-maven
export PATH=$MAVEN_HOME/bin:$PATH
```
Test
```shell
$ mvn help:system
BUILD SUCESS
```
- protobuf-2.5.0
Download protobuf-2.5.0. (required)
```shell
$ tar -zxvf protobuf-2.5.0.tar.gz
$ cd protobuf-2.5.0
$ sudo mkdir /usr/local/protoc-2.5.0/
$ ./configure --prefix=/usr/local/protoc-2.5.0/
$ make && sudo make install
```
Then configure the environment variables for maven.
```shell
export PROTOC_HOME=/usr/local/protoc-2.5.0
export PATH=$PROTOC_HOME/bin:$PATH
```
Test
```shell
$ protoc --version
libprotoc 2.5.0
```
- hadoop-3.1.4-src
Download hadoop-3.1.4-src on [available mirror](https://apache.01link.hk/hadoop/common/hadoop-3.1.4/hadoop-3.1.4-src.tar.gz).
```shell
$ tar -zxvf hadoop-3.1.4-src.tar.gz
```
Then configure the environment variable.
```shell
export HADOOP_SRC_DIR=/path/to/hadoop-3.1.4-src
export HADOOP_HOME=$HADOOP_SRC_DIR/hadoop-dist/target/hadoop-3.1.4
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
```
Tip: Native libs used to rebuild hadoop.
```shell
$ sudo apt-get -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev
```
Edit repairboostcode/hadoop-3.1.4-integrate/install.sh with the proper directory to where you installed hadoop-3.1.4-src. Then execute the script to rebuild hadoop.
```shell
$ ./install.sh
```
### 3.2 Hadoop Configuration
The following shows an example to configure Hadoop-3.1.4 with 6 nodes.
| IP Address | Role in Hadoop | Role in standalone |
| ------------- | -------------- | ------------------ |
| 192.168.0.201 | NameNode | Coordinator |
| 192.168.0.202 | DataNode | Helper |
| 192.168.0.203 | DataNode | Helper |
| 192.168.0.204 | DataNode | Helper |
| 192.168.0.205 | DataNode | Helper |
| 192.168.0.206 | DataNode | Helper |
You should modify the following configure files in /path/to/hadoop-3.1.4-src/hadoop-dist/target/hadoop-3.1.1/etc/hadoop.
- core-site.xml
```xml
fs.defaultFShdfs://192.168.0.201:9000
hadoop.tmp.dir/path/to/hadoop-3.1.4
```
- hadoop-env.sh
```bash
export JAVA_HOME="/usr/lib/jvm/java-8-oracle"
```
- hdfs-site.xml
```xml
dfs.client.use.datanode.hostnametrue
dfs.replication1
dfs.blocksize67108864
repairboost.coordinator192.168.0.201
dfs.datanode.ec.reconstruction.stripedread.buffer.size1048576
dfs.datanode.ec.repairboosttrue
repairboost.packetsize1048576
repairboost.packetcnt64
```
- user_ec_policies.xml
We provide sample configuration for RS-3-1-1024k erasure code policy.
- workers
This file contains multiple lines, each of which is a DataNode IP address.
```
192.168.0.202
192.168.0.203
192.168.0.204
192.168.0.205
192.168.0.206
```
### 3.3 Run
#### Start Hadoop
- Format the Hadoop cluster.
```shell
$ hdfs namenode -format
$ start-dfs.sh
$ hdfs dfsadmin -report
```
If the report result indicates that there are 5 DataNodes, then the Hadoop cluster starts correctly.
- Set erasure coding policy.
```shell
$ hdfs ec -addPolicies -policyFile /path/to/user_ec_policies.xml
$ hdfs ec -enablePolicy -policy RS-3-1-1024k
$ hdfs dfs -mkdir /ec_test
$ hdfs ec -setPolicy -path /ec_test -policy RS-3-1-1024k
```
- Write data into HDFS.
```shell
$ dd if=/dev/urandom iflag=fullblock of=file.txt bs=64M count=3
$ hdfs dfs -put file.txt /ec_test/testfile
```
Check the writed data.
```shell
$ hdfs fsck / -files -blocks -locations
```
#### Start RepairBoost
- Get the directory where the erasure-coded data is stored in Hadoop.
```shell
$ ssh datanode
$ find -name "finalized"
```
You can ssh to any DataNode to execute the command.
- Modify the value of file.system.type, block.directory, and helpers.address in the configuration file config.xml.
- Start.
```shell
$ cd repairboost-code
$ python scripts/start.py
```
#### Full-node Repair Test
- Stop a datanode in Hadoop.
```shell
$ ssh datanode
$ hdfs --daemon stop datanode
```
You can ssh to any DataNode to execute the command.
- Check the repaired data.
```shell
$ hdfs fsck / -files -blocks -locations
```
#### Stop
```shell
$ stop-dfs.sh
$ cd repairboost-code && python scripts/stop.py
```