# BlazeFace-PyTorch

**Repository Path**: senior_cu/BlazeFace-PyTorch

## Basic Information

- **Project Name**: BlazeFace-PyTorch
- **Description**: No description available
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2023-12-27
- **Last Updated**: 2023-12-27

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# BlazeFace in Python

BlazeFace is a fast, light-weight face detector from Google Research. [Read more](https://sites.google.com/view/perception-cv4arvr/blazeface), [Paper on arXiv](https://arxiv.org/abs/1907.05047)

A pretrained model is available as part of Google's [MediaPipe](https://github.com/google/mediapipe/blob/master/mediapipe/docs/face_detection_mobile_gpu.md) framework.

![](https://raw.githubusercontent.com/google/mediapipe/master/mediapipe/docs/images/realtime_face_detection.gif)

Besides a bounding box, BlazeFace also predicts 6 keypoints for face landmarks (2x eyes, 2x ears, nose, mouth).

Because BlazeFace is designed for use on mobile devices, the pretrained model is in TFLite format. However, I wanted to use it from PyTorch and so I converted it.

> **NOTE:** The MediaPipe model is slightly different from the model described in the BlazeFace paper. It uses depthwise convolutions with a 3x3 kernel, not 5x5. And it only uses "single" BlazeBlocks, not "double" ones.

The BlazePaper paper mentions that there are two versions of the model, one for the front-facing camera and one for the back-facing camera. This repo includes only the frontal camera model, as that is the only one I was able to find an official trained version for. The difference between the two models is the dataset they were trained on. As the paper says,

> For the frontal camera model, only faces that occupy more than 20% of the image area were considered due to the intended use case (the threshold for the rear-facing camera model was 5%).

This means the included model will not be able to detect faces that are relatively small. It's really intended for selfies, not for general-purpose face detection.

## Inside this repo

Essential files:

- **blazeface.py**: defines the `BlazeFace` class that does all the work

- **blazeface.pth**: the weights for the trained model

- **anchors.npy**: lookup table with anchor boxes

Notebooks:

- **Anchors.ipynb**: creates anchor boxes and saves them as a binary file (anchors.npy)

- **Convert.ipynb**: loads the weights from the TFLite model and converts them to PyTorch format (blazeface.pth)

- **Inference.ipynb**: shows how to use the `BlazeFace` class to make face detections

## Detections

Each face detection is a PyTorch `Tensor` consisting of 17 numbers:

- The first 4 numbers describe the bounding box corners: 
    - `ymin, xmin, ymax, xmax`
    - These are normalized coordinates (between 0 and 1).

- The next 12 numbers are the x,y-coordinates of the 6 facial landmark keypoints:
    - `right_eye_x, right_eye_y`
    - `left_eye_x, left_eye_y`
    - `nose_x, nose_y`
    - `mouth_x, mouth_y`
    - `right_ear_x, right_ear_y`
    - `left_ear_x, left_ear_y`
    - Tip: these labeled as seen from the perspective of the person, so their right is your left.

- The final number is the confidence score that this detection really is a face.

## Image credits

Included for testing are the following images:

- **1face.png**. Fei Fei Li by [ITU Pictures](https://www.flickr.com/photos/itupictures/35011409612/), CC BY 2.0

- **3faces.png**. Geoffrey Hinton, Yoshua Bengio, Yann Lecun. Found at [AIBuilders](https://aibuilders.ai/le-prix-turing-recompense-trois-pionniers-de-lintelligence-artificielle-yann-lecun-yoshua-bengio-et-geoffrey-hinton/)

- **4faces.png** from Andrew Ng’s Facebook page / [KDnuggets](https://www.kdnuggets.com/2015/03/talking-machine-deep-learning-gurus-p1.html)

These images were scaled down to 128x128 pixels as that is the expected input size of the model.