# lwnn

**Repository Path**: parai/lwnn

## Basic Information

- **Project Name**: lwnn
- **Description**: Lightweight Neural Network
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2022-09-16
- **Last Updated**: 2023-09-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# LWNN - Lightweight Neural Network

[![Build Status](https://travis-ci.org/lwnn/lwnn.svg?branch=master)](https://travis-ci.org/lwnn/lwnn)

Mostly, inspired by [NNOM](https://github.com/majianjia/nnom), [CMSIS-NN](https://github.com/ARM-software/CMSIS_5/tree/develop/CMSIS/NN), I want to do something for Edge AI.

But as I think NNOM is not well designed for different runtime, CPU/DSP/GPU/NPU etc, it doesn't have a clear path to handle different type of runtime, and nowdays, I really want to study somehing about OpenCL, and I came across [MACE](https://github.com/XiaoMi/mace/tree/master/mace/ops/opencl/cl), and I find there is a bunch of CL kernels can be used directly.

So I decieded to do something meaningfull, do some study of OpenCL and at the meantime to create a Lightweight Neural Network that can be suitale for decices such as PC, mobiles and MCU etc.

## Architecture

And for the purpose to support variant Deep Learning frameworks such as tensorflow/keras/caffe2, pytorch etc, the [onnx](https://onnx.ai/) will be supported by lwnn, also for some old frameworks such as caffe/darknet that doesn't support onnx, they are supported by special handling.

![arch](docs/arch.png)

| Layers/Runtime | cpu float | cpu s8 | cpu q8 | cpu q16 | opencl | comments |
| - | - | - | - | - | - | - |
| Conv1D | Y d | Y | Y | Y | Y | based on Conv2D |
| Conv2D | Y d | Y | Y | Y | Y | |
| DeConv2D | Y | Y | Y | Y | Y | |
| DepthwiseConv2D | Y | Y | Y | Y | Y | |
| DilatedConv2D | Y | N | N | N | Y | |
| EltmentWise Max | Y d | Y | Y | Y | Y | |
| ReLU | Y d | Y | Y | Y | Y | |
| PReLU | Y d | N | N | N | Y | |
| MaxPool1D | Y d | Y | Y | Y | Y | based on MaxPool2D |
| MaxPool2D | Y d | Y | Y | Y | Y | |
| Dense | Y | Y | Y | Y | Y | |
| Softmax | Y d | Y | Y | Y | Y | |
| Reshape | Y d | Y | Y | Y | Y | |
| Pad | Y | Y | Y | Y | Y | |
| BatchNorm | Y | Y | Y | Y | Y | |
| Concat | Y | Y | Y | Y | Y | |
| AvgPool1D | Y d | Y | Y | Y | Y | based on AvgPool2D |
| AvgPool2D | Y d | Y | Y | Y | Y | |
| Add | Y d | Y | Y | Y | Y | |
| PriorBox | Y | N | N | N | F | |
| DetectionOutput | Y | F | F | F | F | |
| Upsample | Y | Y | Y | Y | Y | |
| Yolo | Y | F | F | F | F | |
| YoloOutput | Y | F | F | F | F | |
| Mfcc | Y | F | F | F | F | |
| LSTM | Y | N | Y | N | F | |
| Proposal | Y | N | N | N | N | |
| Mul | Y d | N | N | N | Y | |

* F means fallback to others runtime that supported that layer.
* d means dynamic shape support
* s8/q8/q16: all are in Q Format
* s8: 8 bit symmetric quantization with zero offset, very similar to [tflite quantization](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/performance/quantization_spec.md)
* q8/q16: 8/16 bit symmetric quantization, no zero offset.
* q8/s8/q16 activation(ReLU/Clip) will reuse its input layer's buffer, so the activation layer's input layer must has only one consumer that is itself.

## Supported Famous Models

* [MobileNet-SSD](https://github.com/chuanqi305/MobileNet-SSD) : [README](gtest/models/ssd/README.md)

* [YOLOv3](https://github.com/pjreddie/darknet) : [README](gtest/models/yolov3/README.md)

* [ENET](https://github.com/TimoSaemann/ENet) : [README](gtest/models/enet/README.md)

* [DeepSpeech](https://github.com/mozilla/DeepSpeech) : [README](gtest/models/deepspeech/README.md)

* [Mask-RCNN](https://github.com/matterport/Mask_RCNN) : [README](gtest/models/maskrcnn/README.md)

Below is a list of command to run above models on OPENCL or CPU runtime.

```sh
# objection detection
lwnn_gtest --gtest_filter=*CL*SSDFloat -i images/dog.jpg
lwnn_gtest --gtest_filter=*CPU*SSDFloat -i images/dog.jpg
lwnn_gtest --gtest_filter=*CL*YOLOV3Float -i images/dog.jpg
lwnn_gtest --gtest_filter=*CPU*YOLOV3Float -i images/dog.jpg
lwnn_gtest --gtest_filter=*CPU*MASKRCNNFloat -i images/dog.jpg
# semantic segmentation
lwnn_gtest --gtest_filter=*CL*ENETFloat -i ENet/example_image/munich_000000_000019_leftImg8bit.png
lwnn_gtest --gtest_filter=*CPU*ENETFloat -i ENet/example_image/munich_000000_000019_leftImg8bit.png
# speech to text
lwnn_gtest --gtest_filter=*CPU*DSFloat -i speech_dataset/bird/042ea76c_nohash_0.wav
stt 49/29:                                 b  irr  d
```

Note: Those models has big accuracy drop when do quantization, I think quantization awareness training or something like TensorRT calibration is necessary.

## Development

### prepare environment
```sh
conda create -n lwnn python=3.6
source activate lwnn
conda install scons 
pip install tensorflow keras keras2onnx onnxruntime
sudo apt install nvidia-opencl-dev
```

### build

```sh
scons
```