# DSQ **Repository Path**: auto-bit2021/DSQ ## Basic Information - **Project Name**: DSQ - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 1 - **Created**: 2021-02-17 - **Last Updated**: 2022-07-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # DSQ pytorch unofficial implementation of "Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks" **** The Origin Paper : **** This repository follow the Algorithm 1 in the paper. This repository uses the max value of int32 as the initial value. It should not affect the value range (because the parameter of the deep model should not too large), and most of the edge device range is up to int32. ---- # Training Training with quantization. Scrip modified from Now support uniform/DSQ quantization adding argments -q : quantization type, default is None --quantize_input : quantize input or not --quan_bit : quantization bit num --log_path : tensorboard log path to write, default folder is ./log. Examples Training DSQ with 8 bit (no quantiza input) ``` python train.py -a resnet18 -q DSQ --quan_bit 8 {Path to data} ``` Training DSQ with 8 bit ( quantiza input) ``` python train.py -a resnet18 -q DSQ --quan_bit 8 --quantize_input {Path to data} ``` Evaluating (directly use evaluation and resume from model_best.pth.tar) ``` python train.py -a resnet18 -q DSQ --quan_bit 8 --quantize_input --resume {path to model_best.pth.tar} -- evaluate {Path to data} ``` # Experiments The results is base on fake-quantization.(only quantized convolution). As the mentioned in the paper, not to quantize the final Linear Layer.
model QuanType W/A bit top1 top5
resnet18 UniformQuan 4/32 69.486 89.004
DSQ 4/32 69.328 88.872
UniformQuan 4/4 69.306 88.780
DSQ 4/4 69.542 88.884
learned alpha for 4 bit DSQ (quantize weight and input) layer | weight | activation| ----------------|:-------:|----------:| layer1.0.conv1 | 0.4832 | 0.5661 | layer1.0.conv2 | 0.3730 | 0.2953 | layer1.1.conv1 | 0.4405 | 0.2975 | layer1.1.conv2 | 0.3427 | 0.1959 | layer2.0.conv1 | 0.3966 | 0.1653 | layer2.0.conv2 | 0.4140 | 0.2014 | layer2.downsample| **0.3275** | **0.1779** | layer2.1.conv1 | 0.4303 | 0.1675 | layer2.1.conv2 | 0.4207 | 0.1570 | layer3.0.conv1 | 0.4590 | 0.2774 | layer3.0.conv2 | **0.4838** | **0.2569** | layer3.downsample| **0.2305** | **0.1073** | layer3.1.conv1 | 0.4523 | 0.1775 | layer3.1.conv2 | 0.4382 | 0.1792 | #### Resutls: As the table2 in the paper, it indeed show that ``` Second, different layers show different sensitivity to the quantization. For example, the downsampling convolution layers can be quantized much (a small α), while some layers such as layer3.0.conv2 are not suitable for quantization (a large α). ``` #### Issue: It seems that α of weights is bigger than that of activations. Maybe the un-quantize batchnorm restricts the activation and cause the difference to the paper. (or someone can tell why) ### Update Note > 20191218: > Update uniform quantization results. It seems that the sgn function still need STE backward or the loss will becomes Nan. > 20191231: > Update Experiments.