# KeypointNeRF **Repository Path**: mirrors_facebookresearch/KeypointNeRF ## Basic Information - **Project Name**: KeypointNeRF - **Description**: KeypointNeRF Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-08-13 - **Last Updated**: 2025-09-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints

Marko Mihajlovic · Aayush Bansal · Michael Zollhoefer . Siyu Tang · Shunsuke Saito

ECCV 2022

KeypointNeRF leverages human keypoints to instantly generate volumetric radiance representation from 2-3 input images without retraining or fine-tuning. It can represent human faces and full bodies.

## News :new: - [2022/10/01] Combine [ICON](https://github.com/YuliangXiu/ICON) with our relative spatial keypoint encoding for fast and convenient monocular reconstruction, without requiring the expensive SMPL feature. More details are [here](#Reconstruction-from-a-Single-Image). ## Installation Please install python dependencies specified in `environment.yml`: ```bash conda env create -f environment.yml conda activate KeypointNeRF ``` ## Data preparation Please see [DATA_PREP.md](DATA_PREP.md) to setup the ZJU-MoCap dataset. After this step the data directory follows the structure: ```bash ./data/zju_mocap ├── CoreView_313 ├── CoreView_315 ├── CoreView_377 ├── CoreView_386 ├── CoreView_387 ├── CoreView_390 ├── CoreView_392 ├── CoreView_393 ├── CoreView_394 └── CoreView_396 ``` ## Train your own model on the ZJU dataset Execute `train.py` script to train the model on the ZJU dataset. ```shell script python train.py --config ./configs/zju.json --data_root ./data/zju_mocap ``` After the training, the model checkpoint will be stored under `./EXPERIMENTS/zju/ckpts/last.ckpt`, which is equivalent to the one provided [here](https://drive.google.com/file/d/1rsMb3DFFXaFw0iK7yoUmoDEaCW_XqfaN/view?usp=sharing). ## Evaluation To extract render and evaluate images, execute: ```shell script python train.py --config ./configs/zju.json --data_root ./data/zju_mocap --run_val python eval_zju.py --src_dir ./EXPERIMENTS/zju/images_v3 ``` To visualize the dynamic results, execute: ```shell python render_dynamic.py --config ./configs/zju.json --data_root ./data/zju_mocap --model_ckpt ./EXPERIMENTS/zju/ckpts/last.ckpt ```

(The first three views of an unseen subject are the input to KeypointNeRF; the last image is a rendered novel view)

We compare KeypointNeRF with recent state-of-the-art methods. The evaluation metric is SSIM and PSNR. | Models | PSNR ↑ | SSIM ↑ | |---|---|---| | pixelNeRF (Yu et al., CVPR'21) | 23.17 | 86.93 | | PVA (Raj et al., CVPR'21) | 23.15 | 86.63 | | NHP (Kwon et al., NeurIPS'21) | 24.75 | 90.58 | | KeypointNeRF* (Mihajlovic et al., ECCV'22) | **25.86** | **91.07** |

(*Note that results of KeypointNeRF are slightly higher compared to the numbers reported in the original paper due to training views not beeing shuffled during training.)

## Reconstruction from a Single Image Our relative spatial encoding can be used to reconstruct humans from a single image. As a example, we leverage ICON and replace its expensive SDF feature with our relative spatial encoding.

While it achieves comparable quality to ICON, it's much faster and more convinient to use (*displayed image taken from pinterest.com).

### 3D Human Reconstruction on CAPE | Models | Chamfer ↓ (cm) | P2S ↓ (cm) | |---|---|---| | PIFu (Saito et al., ICCV'19) | 3.573 | 1.483 | | ICON (Xiu et al., CVPR'22) | 1.424 | 1.351 | | KeypointICON (Mihajlovic et al., ECCV'22; Xiu et al., CVPR'22) | 1.539 | 1.358 | Check the benchmark [here](https://paperswithcode.com/sota/3d-human-reconstruction-on-cape) and more details [here](https://github.com/YuliangXiu/ICON/blob/master/docs/evaluation.md). ## Publication If you find our code or paper useful, please consider citing: ```bibtex @inproceedings{Mihajlovic:ECCV2022, title = {{KeypointNeRF}: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints}, author = {Mihajlovic, Marko and Bansal, Aayush and Zollhoefer, Michael and Tang, Siyu and Saito, Shunsuke}, booktitle={European conference on computer vision}, year={2022}, } ``` ## License [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/legalcode). See the [LICENSE](LICENSE) file.