# PLSC

**Repository Path**: yuyangup/PLSC

## Basic Information

- **Project Name**: PLSC
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-12-13
- **Last Updated**: 2021-12-13

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# PLSC

## 1. Introduction

[PLSC](https://github.com/PaddlePaddle/PLSC) is an open source Paddle Large Scale Classification Tools, which supports 60 million classes on single node 8 NVIDIA V100 (32G).

## 2. Environment Preparation

### 2.1 Install Paddle from Source Code

```shell

git clone https://github.com/PaddlePaddle/Paddle.git

cd /path/to/Paddle/

mkdir build && cd build

cmake .. -DWITH_TESTING=ON -DWITH_GPU=ON -DWITH_GOLANG=OFF -DWITH_STYLE_CHECK=ON -DCMAKE_INSTALL_PREFIX=$PWD/output -DWITH_DISTRIBUTE=ON -DCMAKE_BUILD_TYPE=Release -DPY_VERSION=3.7 -DCUDA_ARCH_NAME=All -DPADDLE_VERSION=2.2.0

make -j20 && make install -j20

pip install output/opt/paddle/share/wheels/paddlepaddle_gpu-2.2.0-cp37-cp37m-linux_x86_64.whl

```

### 2.2 Install Paddle from PyPI

```shell
# python required 3.x or later
# paddlepaddle required 2.2.0rc0 or later
pip install paddlepaddle-gpu==2.2.0rc0
```

### 2.3 Download PLSC

```shell
git clone https://github.com/PaddlePaddle/PLSC.git

cd /path/to/PLSC/
```


## 3. Data Preparation

### 3.1 Download Dataset

Download the dataset from [insightface datasets](https://github.com/deepinsight/insightface/tree/master/recognition/_datasets_).

* MS1M_v2: MS1M-ArcFace
* MS1M_v3: MS1M-RetinaFace

### 3.2 Extract MXNet Dataset to Images

```shell
python tools/mx_recordio_2_images.py --root_dir ms1m-retinaface-t1/ --output_dir MS1M_v3/
```

After finishing unzipping the dataset, the folder structure is as follows.

```
MS1M_v3
|_ images
|  |_ 00000001.jpg
|  |_ ...
|  |_ 05179510.jpg
|_ label.txt
|_ agedb_30.bin
|_ cfp_ff.bin
|_ cfp_fp.bin
|_ lfw.bin
```

Label file format is as follows.

```
# delimiter: "\t"
# the following the content of label.txt
images/00000001.jpg 0
...
```

If you want to use customed dataset, you can arrange your data according to the above format. 

### 3.3 Transform Between Original Image Files and Bin Files

If you want to convert original image files to `bin` files used directly for training process, you can use the following command to finish the conversion.

```shell
python tools/convert_image_bin.py --image_path="your/input/image/path" --bin_path="your/output/bin/path" --mode="image2bin"
```

If you want to convert `bin` files to original image files, you can use the following command to finish the conversion.

```shell
python tools/convert_image_bin.py --image_path="your/input/bin/path" --bin_path="your/output/image/path" --mode="bin2image"
```

## 4. How to Training

### 4.1 Single Node, 8 GPUs:

#### Static Mode

```bash
sh scripts/train_static.sh
```

#### Dynamic Mode

```bash
sh scripts/train_dynamic.sh
```


During training, you can view loss changes in real time through `VisualDL`,  For more information, please refer to [VisualDL](https://github.com/PaddlePaddle/VisualDL/).


## 5. Model Evaluation

The model evaluation process can be started as follows.

#### Static Mode

```bash
sh scripts/validation_static.sh
```

#### Dynamic Mode

```bash
sh scripts/validation_dynamic.sh
```

## 6. Export Model
PaddlePaddle supports inference using prediction engines. Firstly, you should export inference model.

#### Static Mode

```bash
sh scripts/export_static.sh
```

#### Dynamic Mode

```bash
sh scripts/export_dynamic.sh
```

We also support export to onnx model, you only need to set `--export_type onnx`.

## 7. Model Inference

The model inference process supports paddle save inference model and onnx model.

```bash
sh scripts/inference.sh
```

## 8. Model Performance

### 8.1 Accuracy on Verification Datasets

**Configuration：**
  * GPU: 8 NVIDIA Tesla V100 32G
  * Precison: FP16
  * BatchSize: 128/1024

| Mode    | Datasets | backbone | Ratio | agedb30 | cfp_fp | lfw  | log  | last checkpoint |
| ------- | :------: | :------- | ----- | :------ | :----- | :--- | :--- |  :--- |
| Static  |  MS1MV3  | r50      | 0.1   | 0.98317 | 0.98943| 0.99850 | [log](experiments/logs/static/ms1mv3_r50_static_128_fp16_0.1/training.log) | [checkpoint](https://paddle-model-ecology.bj.bcebos.com/model/insight-face/distributed/ms1mv3_r50_static_128_fp16_0.1_epoch_24.tgz) |
| Static  |  MS1MV3  | r50      | 1.0   | 0.98283 | 0.98843| 0.99850 | [log](experiments/logs/static/ms1mv3_r50_static_128_fp16_1.0/training.log) | [checkpoint](https://paddle-model-ecology.bj.bcebos.com/model/insight-face/distributed/ms1mv3_r50_static_128_fp16_1.0_epoch_24.tgz) |
| Dynamic |  MS1MV3  | r50      | 0.1   | 0.98367 | 0.98971| 0.99850 | [log](experiments/logs/dynamic/ms1mv3_r50_dynamic_128_fp16_0.1/training.log) | [checkpoint](https://plsc.bj.bcebos.com/pretrained_model/ms1mv3_r50_dynamic_128_fp16_0.1_eopch_24.tgz) |
| Dynamic |  MS1MV3  | r50      | 1.0   | 0.98333 | 0.99043| 0.99850 | [log](experiments/logs/dynamic/ms1mv3_r50_dynamic_128_fp16_1.0/training.log) | [checkpoint](https://plsc.bj.bcebos.com/pretrained_model/ms1mv3_r50_dynamic_128_fp16_1.0_eopch_24.tgz) |

  
### 8.2 Maximum Number of Identities 

**Configuration：**
  * GPU: 8 NVIDIA Tesla V100 32G (32510MiB)
  * BatchSize: 64/512
  * SampleRatio: 0.1

| Mode                      | Precision | Res50    | Res100   |
| ------------------------- | --------- | -------- | -------- |
| Framework1 (static)       | AMP       | 42000000 (31792MiB)| 39000000 (31938MiB)|
| Framework2 (dynamic)      | AMP       | 30000000 (31702MiB)| 29000000 (32286MiB)|
| Paddle (static)           | FP16 | 60000000 (32018MiB)| 60000000 (32018MiB)|
| Paddle (dynamic)          | FP16 | 67000000 (31970MiB)| 67000000 (31970MiB)|

**Note:** config environment variable by ``export FLAGS_allocator_strategy=naive_best_fit``

### 8.3 Throughtput

**Configuration：**
  * BatchSize: 128/1024
  * SampleRatio: 0.1
  * Datasets: MS1MV3
  
![insightface_throughtput](experiments/images/throughtput.png)

## 9. Demo

Combined with face detection model, we can complete the face recognition process.

Firstly, use the fllowing commands to download the models.

```bash
# Create models directory
mkdir -p models

# Download blazeface face detection model and extract it
wget https://paddle-model-ecology.bj.bcebos.com/model/insight-face/blazeface_fpn_ssh_1000e_v1.0_infer.tar -P models/
tar -xzf models/blazeface_fpn_ssh_1000e_v1.0_infer.tar -C models/
rm -rf models/blazeface_fpn_ssh_1000e_v1.0_infer.tar

# Download static ResNet50 PartialFC 0.1 model and extract it
wget https://paddle-model-ecology.bj.bcebos.com/model/insight-face/distributed/ms1mv3_r50_static_128_fp16_0.1_epoch_24.tgz -P models/
tar -xf models/ms1mv3_r50_static_128_fp16_0.1_epoch_24.tgz -C models/
rm -rf models/ms1mv3_r50_static_128_fp16_0.1_epoch_24.tgz

# Export static save inference model
python tools/export.py --is_static True --export_type paddle --backbone FresResNet50 --embedding_size 512 --checkpoint_dir models/ms1mv3_r50_static_128_fp16_0.1_epoch_24 --output_dir models/ms1mv3_r50_static_128_fp16_0.1_epoch_24_infer
rm -rf models/ms1mv3_r50_static_128_fp16_0.1_epoch_24
```

Then, use the following commands to download the gallery, demo image and font file for visualization. And we generate gallery features.

```bash
# Download gallery, query and font file
mkdir -p images/
git clone https://github.com/littletomatodonkey/insight-face-paddle /tmp/insight-face-paddle
cp -r /tmp/insight-face-paddle/demo/friends/gallery/ images/
cp -r /tmp/insight-face-paddle/demo/friends/query/ images/
mkdir -p assets
cp /tmp/insight-face-paddle/SourceHanSansCN-Medium.otf assets/
rm -rf /tmp/insight-face-paddle

# Build index file
python tools/test_recognition.py \
    --rec \
    --rec_model_file_path models/ms1mv3_r50_static_128_fp16_0.1_epoch_24_infer/FresResNet50.pdmodel \
    --rec_params_file_path models/ms1mv3_r50_static_128_fp16_0.1_epoch_24_infer/FresResNet50.pdiparams \
    --build_index=images/gallery/index.bin \
    --img_dir=images/gallery \
    --label=images/gallery/label.txt
```

Use the following command to run the whole face recognition demo.

```bash
# detection + recogniotion process
python tools/test_recognition.py \
    --det \
    --det_model_file_path models/blazeface_fpn_ssh_1000e_v1.0_infer/inference.pdmodel \
    --det_params_file_path models/blazeface_fpn_ssh_1000e_v1.0_infer/inference.pdiparams \
    --rec \
    --rec_model_file_path models/ms1mv3_r50_static_128_fp16_0.1_epoch_24_infer/FresResNet50.pdmodel \
    --rec_params_file_path models/ms1mv3_r50_static_128_fp16_0.1_epoch_24_infer/FresResNet50.pdiparams \
    --index=images/gallery/index.bin \
    --input=images/query/friends2.jpg \
    --cdd_num 10 \
    --rec_thresh 0.4 \
    --output="./output"
```

The final result is save in folder `output/`, which is shown as follows.

<div align="center">
<img src="experiments/images/friends2.jpg"  width = "800" />
</div>