# crossview_3d_pose_tracking

**Repository Path**: AI52CV/crossview_3d_pose_tracking

## Basic Information

- **Project Name**: crossview_3d_pose_tracking
- **Description**: Dataset of the paper "Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS"
原地址：https://github.com/longcw/crossview_3d_pose_tracking
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 1
- **Forks**: 0
- **Created**: 2021-04-06
- **Last Updated**: 2021-04-06

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Dataset of "Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS"

>**Note**: The repo contains the dataset used in the paper, 
including Campus, Shelf, StoreLayout1, StoreLayout2.
Along with the data,
we provide some scripts to visualize the data, in both 2D and 3D,
and also to evaluate with the results.
The source code is not included 
as this is a commercial project,
find more in http://aifi.io if you are interested.


## Dataset
Here we provide four datasets, including
+ Campus: https://www.epfl.ch/labs/cvlab/data/data-pom-index-php/
+ Shelf: http://campar.in.tum.de/Chair/MultiHumanPose
+ StoreLayout1: proposed by AiFi Inc.
+ StoreLayout2: proposed by AiFi Inc.


For convenient, you can find and download them by one click from [Google Drive](https://drive.google.com/drive/folders/1LJGcP2v0aQDmetnCzO2PiRP1v4jU6sFC?usp=sharing).

### Data Structure
For each dataset, the structure of the directory is organized as follow
```
Campus_Seq1
├── annotation_2d.json
├── annotation_3d.json
├── calibration.json
├── detection.json
├── frames
│   ├── Camera0
│   ├── Camera1
│   └── Camera2
│       ├── 0060.720.jpg
│       ├── 0060.760.jpg
│       ├── 0060.800.jpg
│       └── xxxxxxxx.jpg
└── result_3d.json
```
The `annotations` were only provided in Campus and Shelf datasets
and the `detection` is generated using 
Cascaded Pyramid Network (CPN) in https://github.com/zju3dv/mvpose.
The `frames` are renamed using timestamps, i.e. the name
of each file is the tiemstamp in second of that frame.


### Data Format
2D (2D annotation and detection) and 3D (3D annotation and tracking result) 
data have their own unified data format as follows.
#### 2D Data Format
The 2D data is organized by frames:
```json
{
  "image_wh": [360, 288],
  "frames": {
    "Camera0/0002.320.jpg": {
      "camera": "Camera0",
      "timestamp": 2.32,
      "poses": []
    },
    "Camera0/0002.360.jpg": {
      "camera": "Camera0",
      "timestamp": 2.36,
      "poses": [
        {
          "id": -1,
          "points_2d": Nx2 Array,
          "scores": N Array
        },
        ...
      ]
    },
    ...
  }
}
```

#### 3D Data Format
The 3D data is organized by timestamps:
```json
[
  {
    "timestamp": 6.08,
    "poses": [
      {
        "id": 10159970873491820000,
        "points_3d": Nx3 Array,
        "scores": N Array
      },
      ...
    ]
  },
  ...
]
```

### Human Pose Format
In the annotation
the human pose has 14 keypoints:
```json
0: 'r-ankle',
1: 'r-knee',
2: 'r-hip',
3: 'l-hip',
4: 'l-knee',
5: 'l-ankle',
6: 'r-wrist',
7: 'r-elbow',
8: 'r-shoulder',
9: 'l-shoulder',
10: 'l-elbow',
11: 'l-wrist',
12: 'bottom-head',
13: 'top-head'
```
In detection and result, the human pose has 17 keypoints:
```json
0: 'nose',
1: 'l-eye',
2: 'r-eye',
3: 'l-ear',
4: 'r-ear',
5: 'l-shoulder',
6: 'r-shoulder',
7: 'l-elbow',
8: 'r-elbowr',
9: 'l-wrist',
10: 'r-wrist',
11: 'l-hip',
12: 'r-hip',
13: 'l-knee'
14: 'r-knee'
15: 'l-ankle'
16: 'r-ankle'
```

## Demo
Along with the data, here we provide some tools 
to load the data and calibration, visualize and evaluate the result.
### Visualize Annotation
```bash
DATA_ROOT=/data/3DPose_pub/Campus_Seq1

# 2D
python display.py --frame-root ${DATA_ROOT}/frames --calibration ${DATA_ROOT}/calibration.json --pose-file ${DATA_ROOT}/annotation_2d.json --pose-type 2d

# 3D (only tested on Linux)
python display.py --frame-root ${DATA_ROOT}/frames --calibration ${DATA_ROOT}/calibration.json --pose-file ${DATA_ROOT}/annotation_3d.json --pose-type 3d
```

### Visualize Detection and Result
```bash
DATA_ROOT=/data/3DPose_pub/Campus_Seq1

# 2D detection
python display.py --frame-root ${DATA_ROOT}/frames --calibration ${DATA_ROOT}/calibration.json --pose-file ${DATA_ROOT}/detection.json --pose-type 2d

# 3D result
python display.py --frame-root ${DATA_ROOT}/frames --calibration ${DATA_ROOT}/calibration.json --pose-file ${DATA_ROOT}/result_3d.json --pose-type 3d
```

### 3D visualization with Docker
Sometimes it's hard to setup the environment for vispy. 
Here we provide a dockerfile supports OpenGL and CUDA applications (from https://medium.com/@benjamin.botto/opengl-and-cuda-applications-in-docker-af0eece000f1).

1. To use it you will need `nvidia-container-runtime`: https://github.com/NVIDIA/nvidia-container-runtime#installation

2. Build the docker image 
   ```bash
   docker build -t glvnd-x-vispy:latest .
   ```
3. Start the container
   ```bash
   # Connecting to the Host’s X Server
   xhost +local:root

   docker run \
   --rm \
   -it \
   --gpus all \
   -v /tmp/.X11-unix:/tmp/.X11-unix \
   -e DISPLAY=$DISPLAY \
   -e QT_X11_NO_MITSHM=1 \
   -v /PATH-TO-DATA/3DPose_pub:/data/3DPose_pub \
   -v /PATH-TO-CODE/crossview_3d_pose_tracking:/app \
   glvnd-x-vispy bash
   ```
4. Run the demo in a docker container
   ```bash
   cd /app
   pip3 install -r requirements.txt

   DATA_ROOT=/data/3DPose_pub/Campus_Seq1

   # 2D detection
   python3 display.py --frame-root ${DATA_ROOT}/frames --calibration ${DATA_ROOT}/calibration.json --pose-file ${DATA_ROOT}/detection.json --pose-type 2d

   # 3D result
   python3 display.py --frame-root ${DATA_ROOT}/frames --calibration ${DATA_ROOT}/calibration.json --pose-file ${DATA_ROOT}/result_3d.json --pose-type 3d
   ```

### Evaluate

```bash
DATA_ROOT=/data/3DPose_pub/Campus_Seq1

python evaluate.py --annotation ${DATA_ROOT}/annotation_3d.json --result ${DATA_ROOT}/result_3d.json 
```
Then you will get the the output like
```
+------------+---------+---------+---------+---------+
| Bone Group | Actor 0 | Actor 1 | Actor 2 | Average |
+------------+---------+---------+---------+---------+
|    Head    |  1.0000 |  1.0000 |  0.9928 |  0.9976 |
|   Torso    |  1.0000 |  1.0000 |  1.0000 |  1.0000 |
| Upper arms |  0.9592 |  1.0000 |  1.0000 |  0.9864 |
| Lower arms |  0.8980 |  0.7063 |  0.9348 |  0.8464 |
| Upper legs |  1.0000 |  1.0000 |  1.0000 |  1.0000 |
| Lower legs |  1.0000 |  1.0000 |  1.0000 |  1.0000 |
|   Total    |  0.9714 |  0.9413 |  0.9862 |  0.9663 |
+------------+---------+---------+---------+---------+
```

## Citation
```
@InProceedings{Chen_2020_CVPR,
author = {Chen, Long and Ai, Haizhou and Chen, Rui and Zhuang, Zijie and Liu, Shuang},
title = {Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
```