# Semantic_Human_Matting **Repository Path**: StevenLiuwen/Semantic_Human_Matting ## Basic Information - **Project Name**: Semantic_Human_Matting - **Description**: Semantic Human Matting - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-02-23 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Semantic_Human_Matting The project is my reimplement of paper ([Semantatic Human Matting](https://arxiv.org/abs/1809.01354)) from Alibaba, it proposes a new end-to-end scheme to predict human alpha from image. SHM is the first algorithm that learns to jointly fit both semantic information and high quality details with deep networks. One of the main contributions of the paper is that: ***A large scale high quality human matting dataset is created. It contains 35,513 unique human images with corresponding alpha mattes***. But, the dataset is not avaiable. I collected 6k+ images as my dataset of the project. Worth noting that, the architecture of my network, which builded with mobilenet and shallow encoder-decoder net, is a light version compaired to original implement. # update 2019/04/08 :thumbsup: :thumbsup: The company [***爱分割***](http://www.aisegment.com) shared their [dataset](https://github.com/aisegmentcn/matting_human_datasets) recently ! # Requirements - python3.5 / 3.6 - pytorch >= 0.4 - opencv-python # Usage Directory structure of the project: ``` Semantic_Human_Matting │ README.md │ train.py │ train.sh | test_camera.py | test_camera.sh └───model │ │ M_Net.py │ │ T_Net.py │ │ network.py └───data │ dataset.py │ gen_trimap.py | gen_trimap.sh | knn_matting.py | knn_matting.sh └───image └───mask └───trimap └───alpha ``` ## Step 1: prepare dataset ```./data/train.txt``` contain image names according to 6k+ images(```./data/image```) and corresponding masks(```./data/mask```). Use ```./data/gen_trimap.sh``` to get trimaps of the masks. Use ```./data/knn_matting.sh``` to get alpha mattes(it will take long time...). ## Step 2: build network ![SHM](https://github.com/lizhengwei1992/Semantic_Human_Matting/raw/master/network.png) - ***Trimap generation: T-Net*** The T-Net plays the role of semantic segmentation. I use mobilenet_v2+unet as T-Net to predict trimap. - ***Matting network: M-Net*** The M-Net aims to capture detail information and generate alpha matte. I build M-Net same as the paper, but reduce channels of the original net. - ***Fusion Module*** Probabilistic estimation of alpha matte can be written as $\alpha _{p} = F_{s} + U_{s}\alpha _{r}$ ## Step 3: build loss The overall prediction loss for alpha_p at each pixel is $L_{p} = \gamma\left \| \alpha _{p} - \alpha _{g} \right \|_{1} + \left ( 1-\gamma \right )\left \| c_{p} - c_{g} \right \|_{1}$ The total loss is $L = L_{p} + \lambda L_{t}$ Read papers for more details, and my codes for two loss functions: ``` # ------------------------------------- # classification loss L_t # ------------------------ criterion = nn.CrossEntropyLoss() L_t = criterion(trimap_pre, trimap_gt[:,0,:,:].long()) # ------------------------------------- # prediction loss L_p # ------------------------ eps = 1e-6 # l_alpha L_alpha = torch.sqrt(torch.pow(alpha_pre - alpha_gt, 2.) + eps).mean() # L_composition fg = torch.cat((alpha_gt, alpha_gt, alpha_gt), 1) * img fg_pre = torch.cat((alpha_pre, alpha_pre, alpha_pre), 1) * img L_composition = torch.sqrt(torch.pow(fg - fg_pre, 2.) + eps).mean() L_p = 0.5*L_alpha + 0.5*L_composition ``` ## Step 4: train Firstly, pre_train T-Net, use ```./train.sh``` as : ``` python3 train.py \ --dataDir='./data' \ --saveDir='./ckpt' \ --trainData='human_matting_data' \ --trainList='./data/train.txt' \ --load='human_matting' \ --nThreads=4 \ --patch_size=320 \ --train_batch=8 \ --lr=1e-3 \ --lrdecayType='keep' \ --nEpochs=1000 \ --save_epoch=1 \ --train_phase='pre_train_t_net' ``` Then, train end to end, use ```./train.sh``` as: ``` python3 train.py \ --dataDir='./data' \ --saveDir='./ckpt' \ --trainData='human_matting_data' \ --trainList='./data/train.txt' \ --load='human_matting' \ --nThreads=4 \ --patch_size=320 \ --train_batch=8 \ --lr=1e-4 \ --lrdecayType='keep' \ --nEpochs=2000 \ --save_epoch=1 \ --finetuning \ --train_phase='end_to_end' ``` # Test run ```./test_camera.sh```