# caffe_rtpose

**Repository Path**: ltlgyr/caffe_rtpose

## Basic Information

- **Project Name**: caffe_rtpose
- **Description**: Realtime C++ code for multi-person pose estimation
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-09-01
- **Last Updated**: 2021-09-01

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

Realtime Multiperson Pose Estimation
====================================
## New version released as library!!!
### Includes hands and face keypoints, Windows version, and it is faster!
### https://github.com/CMU-Perceptual-Computing-Lab/openpose
### This repository is not maintained anymore and it will eventually be closed. Please, move to OpenPose!


## Introduction
C++ code repo for the ECCV 2016 demo, "Realtime Multiperson Pose Estimation", Zhe Cao, Shih-En Wei, Tomas Simon, Yaser Sheikh. Thanks Ginés Hidalgo Martínez for restructuring the code. 

The [full project repo](https://github.com/ZheC/Multi-Person-Pose-Estimation) includes matlab and python version, and training code.

This project is under the terms of the [license](LICENSE).

## Quick Start
1. Required: CUDA & cuDNN installed on your machine.
2. If you have installed OpenCV 2.4 in your system, go to step 3. If you are using OpenCV 3, uncomment the line `# OPENCV_VERSION := 3` on the file `Makefile.config.Ubuntu14.example` (for Ubuntu 14) and/or `Makefile.config.Ubuntu16.example` (for Ubuntu 15 or 16). In addition, OpenCV 3 does not incorporate the `opencv_contrib` module by default. Assuming you have manually installed it and you need to use it, append `opencv_contrib` at the end of the line `LIBRARIES += opencv_core opencv_highgui opencv_imgproc` in the `Makefile` file.
3. Build `caffe` & `rtpose.bin` + download the required caffe models (script tested on Ubuntu 14.04 & 16.04, it uses all the available cores in your machine):**
```
chmod u+x install_caffe_and_cpm.sh
./install_caffe_and_cpm.sh
```

## Running on a video:
```
./build/examples/rtpose/rtpose.bin --video video_file.mp4
```

## Running on your webcam:
```
./build/examples/rtpose/rtpose.bin
```

## Important options:
`--help` <--- It displays all the available options.

`--video input.mp4` <--- Input video. If omitted, will use webcam.

`--camera #` <--- Choose webcam number (default: 0).

`--image_dir path_to_images/` <--- Run on all jpg, png, or bmp images in `path_to_images/`. If omitted, will use webcam.

`--write_frames path/`  <--- Render images with this prefix: path/frame%06d.jpg

`--write_json path/`  <--- Output JSON file with joints with this prefix: path/frame%06d.json

`--no_frame_drops` <--- Don't drop frames. Important for making offline results.

`--no_display` <--- Don't open a display window. Useful if there's no X server.

`--num_gpu 4` <--- Parallelize over this number of GPUs. Default is 1.

`--num_scales 3 --scale_gap 0.15`  <--- Use 3 scales, 1, (1-0.15), (1-0.15*2). Default is one scale=1.

(HD)
`--net_resolution 656x368 --resolution 1280x720` (These are the default values.)

(VGA)
`--net_resolution 496x368 --resolution 640x480`

`--logtostderr` <--- Log messages to standard error.

## Example:
Run on a video `vid.mp4`, render image frames as `output/frame%06d.jpg` and output JSON files as `output/frame%06d.json`, using 3 scales (1.00, 0.85, and 0.70), parallelized over 2 GPUs:
```
./build/examples/rtpose/rtpose.bin --video vid.mp4 --num_gpu 2 --no_frame_drops --write_frames output/ --write_json output/ --num_scales 3 --scale_gap 0.15
```

## Output format:
Each JSON file has a `bodies` array of objects, where each object has an array `joints` containing the joint locations and detection confidence formatted as `x1,y1,c1,x2,y2,c2,...`, where `c` is the confidence in [0,1].

```
{
"version":0.1,
"bodies":[
{"joints":[1114.15,160.396,0.846207,...]},
{"joints":[...]},
]
}
```

where the joint order of the COCO parts is: (see src/rtpose/modelDescriptorFactory.cpp )
```
	part2name {
		{0,  "Nose"},
		{1,  "Neck"},
		{2,  "RShoulder"},
		{3,  "RElbow"},
		{4,  "RWrist"},
		{5,  "LShoulder"},
		{6,  "LElbow"},
		{7,  "LWrist"},
		{8,  "RHip"},
		{9,  "RKnee"},
		{10, "RAnkle"},
		{11, "LHip"},
		{12, "LKnee"},
		{13, "LAnkle"},
		{14, "REye"},
		{15, "LEye"},
		{16, "REar"},
		{17, "LEar"},
		{18, "Bkg"},
	}
```

## Custom Caffe:
We modified and added several Caffe files in `include/caffe` and `src/caffe`. In case you want to use your own Caffe distribution, these are the files we added and modified:

1. Added folders in `include/caffe` and `src/caffe`: `include/caffe/cpm` and `src/caffe/cpm`.
2. Modified files in `include/caffe` (search for `// CPM extra code:` to find the modified code): `data_transformer.hpp`.
3. Modified files in `src/caffe` (search for `// CPM extra code:` to find the modified code): `data_transformer.cpp`, `proto/caffe.proto` and `util/blocking_queue.cpp`.
4. Replaced files: `README.md`.
5. Added files: `install_caffe_and_cpm.sh`, `Makefile.config.Ubuntu14.example` (extracted from `Makefile.config.example`) and `Makefile.config.Ubuntu16.example` (extracted from `Makefile.config.example`).
6. Other added folders: `model/`, `examples/rtpose`, `/include/rtpose` and `/src/rtpose`.
7. Other modified files: `Makefile`.
8. Optional - deleted Caffe files and folders (only to save space): `Makefile.config.example`, `data/`, `examples/` (do not delete `examples/rtpose`) and `models/`.


## Custom Caffe layers:
We created a few Caffe layers (located in `include/caffe/cpm/layers` and `src/caffe/cpm/layers`):

1. ImResizeLayer: Only used for testing (backward pass not implemented). This layer performs 2-D resize over the 4-D data. I.e., given a 4-D input of size (`num` x `channels` x `height_input` x `width_input`), the layer returns a 4-D output of size (`num` x `channels` x `height_output` x `width_output`). It is independently applied to each dimension of `num` and `channels`. Its parameters are:
	1. `factor`: Scaling factor with respect to the input width and height. `factor` is the alternative to the pair of variables [`target_spatial_width`, `target_spatial_height`]. If `factor != 0`, the latter are ignored.
	2. `scale_gap` and `start_scale`: These parameters are related and used for doing scale search in testing mode. If `start_scale = 1` (default), the CNN input patch size is the net resolution (set with `--net_resolution`). `scale_gap` is used to calculate the scale difference between scales. This parameters are related with the flag `--num_scales`. For instance, using `--start_scale 1 --num_scales 3 --scale_gap 0.1` means using 3 scales: 1, 1-0.1, 1-2*0.1, hence the different patch sizes correspond to the net resolution multiplied by these scales values.
	3. `target_spatial_height`: Alternative to `factor`. It sets the output height. Ignored if `factor != 0`.
	4. `target_spatial_width`: Alternative to `factor`. It sets the output width. Ignored if `factor != 0`.
2. NmsLayer: Only used for testing (backward pass not implemented). This layer performs 3-D Non-Maximum Suppression over the 4-D data. I.e., given a 4-D input of size (`num` x `channels` x `height` x `width`), it returns a 4-D output of size (`num` x `num_parts` x `max_peaks+1` x `3`). It is independently applied to each dimension of `num`. The seconds dimension corresponds to the number of limbs (`num_parts`). The third dimension indicates the maximum number of peaks to be analyzed (`max_peaks+1`). Finally, the last one corresponds to the `x`, `y` and `score` values (`3`). Its parameters are:
	1. `max_peaks`: The number of peaks to be considered. The last `total_peaks` - `max_peaks` peaks are discarded.
	2. `num_parts`: The number of limbs to detect (e.g. 15 for MPI and 18 for COCO).
	3. `threshold`: Any input value smaller than this threshold is set to 0.


## Citation
Please cite the paper in your publications if it helps your research:


    @article{cao2016realtime,
	  title={Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields},
	  author={Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh},
	  journal={arXiv preprint arXiv:1611.08050},
	  year={2016}
	  }

    @inproceedings{wei2016cpm,
      author = {Shih-En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh},
      booktitle = {CVPR},
      title = {Convolutional pose machines},
      year = {2016}
      }