# Bag_of_Tricks_for_Image_Classification_with_Convolutional_Neural_Networks

**Repository Path**: jane-one/Bag_of_Tricks_for_Image_Classification_with_Convolutional_Neural_Networks

## Basic Information

- **Project Name**: Bag_of_Tricks_for_Image_Classification_with_Convolutional_Neural_Networks
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-06-26
- **Last Updated**: 2021-06-26

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Bag of Tricks for Image Classification with Convolutional Neural Networks 


This repo was inspired by Paper [Bag of Tricks for Image Classification with Convolutional Neural Networks](https://arxiv.org/abs/1812.01187)

I would test popular training tricks as many as I can for improving image classification accuarcy, feel
free to leave a comment about the tricks you want me to test(please write the referenced paper along with
the tricks)

## hardware
Using 4 Tesla P40 to run the experiments

## dataset

I will use [CUB_200_2011](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) dataset instead of ImageNet,
just for simplicity, this is a fine-grained image classification dataset, which contains 200 birds categlories, 
5K+ training images, and 5K+ test images.The state of the art acc on vgg16 is around 73%(please correct me if 
I was wrong).You could easily change it to the ones you like: [Stanford Dogs](http://vision.stanford.edu/aditya86/ImageNetDogs/), [Stanford Cars](http://vision.stanford.edu/aditya86/ImageNetDogs/).
Or even ImageNet.

## network

Use a VGG16 network to test my tricks, also for simplicity reasons, since VGG16 is easy to implement. I'm considering
switch to AlexNet, to see how powerful these tricks are.

## tricks

tricks I've tested, some of them were from the Paper [Bag of Tricks for Image Classification with Convolutional Neural Networks](https://arxiv.org/abs/1812.01187) :

|trick|referenced paper|
|:---:|:---:|
|xavier init|[Understanding the difficulty of training deep feedforward neural networks](http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf)|
|warmup training|[Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677v2)|
|no bias decay|[Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes](https://arxiv.org/abs/1807.11205vx)|
|label smoothing|[Rethinking the inception architecture for computer vision](https://arxiv.org/abs/1512.00567v3))|
|random erasing|[Random Erasing Data Augmentation](https://arxiv.org/abs/1708.04896v2)|
|cutout|[Improved Regularization of Convolutional Neural Networks with Cutout](https://arxiv.org/abs/1708.04552v2)|
|linear scaling learning rate|[Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677v2)|
|cosine learning rate decay|[SGDR: Stochastic Gradient Descent with Warm Restarts](https://arxiv.org/abs/1608.03983)|

**and more to come......**


## result

baseline(training from sctrach, no ImageNet pretrain weights are used): 

vgg16 64.60% on [CUB_200_2011](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html) dataset, lr=0.01, batchsize=64

effects of stacking tricks 

|trick|acc|
|:---:|:---:|
|baseline|64.60%|
|+xavier init and warmup training|66.07%|
|+no bias decay|70.14%|
|+label smoothing|71.20%|
|+random erasing|does not work, drops about 4 points|
|+linear scaling learning rate(batchsize 256, lr 0.04)|71.21%|
|+cutout|does not work, drops about 1 point|
|+cosine learning rate decay|does not work, drops about 1 point|