# Audio2Face-3D-SDK

**Repository Path**: mirrors_NVIDIA/Audio2Face-3D-SDK

## Basic Information

- **Project Name**: Audio2Face-3D-SDK
- **Description**: High-performance C++/CUDA SDK for running Audio2Emotion and Audio2Face inference with integrated post-processing.
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-09-25
- **Last Updated**: 2026-04-04

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Audio2X SDK

The Audio2X SDK is a comprehensive toolkit for fast audio-driven animation and emotion detection. It consists of two main components:

- **Audio2Emotion SDK**: Analyzes audio (speech) input to detect and classify emotional states
- **Audio2Face SDK**: Generates facial animations from audio (speech) input, supporting both regression and diffusion-based approaches

The SDK is designed for high-performance applications requiring fast audio processing and animation generation. It leverages NVIDIA's GPU computing capabilities through CUDA and TensorRT for optimal performance.

### Key Features

- **Faster Than Real-time Processing**: Faster than 60 FPS frame generation
- **Multi-track Support**: Process multiple audio streams simultaneously
- **GPU Acceleration**: Full CUDA/TensorRT integration
- **Flexible Architecture**: Support for both batch and interactive processing modes
- **Cross-platform**: Windows and Linux support

# Build

## Prerequisites

### System Requirements

- **Operating System**: Windows 10/11 or Linux (Ubuntu 20.04+)
- **GPU**: NVIDIA GPU with CUDA 12.8.0+ support
- **Memory**: 8GB+ RAM, 4GB+ GPU memory recommended
- **Storage**: 10GB+ free space for SDK and models

**Build Tools**

To build Audio2X SDK, you will first need the following software packages.

1. ### Windows
    - MSBuild (Visual Studio 2022+)
2. ### Linux
    - g++
    - make
3. ### Common
    Note: You can use your own dependencies or use the pre-fetched versions, which will be downloaded to `_deps\build-deps` by running `.\fetch_deps.{bat|sh}`.
    - CMake
    - Ninja (Optional)

**System Packages**

- [CUDA](https://developer.nvidia.com/cuda-toolkit) >=12.8, <13.0 (12.9 is recommended)
- [git](https://git-scm.com/) 
- [git-lfs](https://github.com/git-lfs/git-lfs)
- [python](https://www.python.org/downloads/) >= v3.8, <= v3.10.x
- [pip](https://pypi.org/project/pip/#history) >= v19.0
- [TensorRT](https://developer.nvidia.com/tensorrt) >=10.13, <11.0

## Building Audio2X SDK

1. ### Using the default build script

(Note: Use `debug` instead of `release` below for a debug build)

#### Windows
```shell
git clone https://github.com/NVIDIA/Audio2Face-3D-SDK.git
cd Audio2Face-3D-SDK
git lfs pull # Pull large files in sample-data
.\fetch_deps.bat release

$env:TENSORRT_ROOT_DIR="C:\path\to\tensorrt"
$env:CUDA_PATH="C:\path\to\cuda" # Usually not needed if the CUDA Toolkit installer has already set it
.\build.bat clean release # Optional: Remove previous build
.\build.bat all release # Uses CMake and Ninja from `_deps\build-deps` and builds to `_build` by default.
```

#### Linux
```shell
git clone https://github.com/NVIDIA/Audio2Face-3D-SDK.git
cd Audio2Face-3D-SDK
git lfs pull # Pull large files in sample-data
./fetch_deps.sh release

export TENSORRT_ROOT_DIR="path/to/tensorrt"
./build.sh clean release # Optional: Remove previous build
./build.sh all release # Uses CMake and Ninja from `_deps\build-deps` and builds to `_build` by default.
```

2. ### Using CMake

#### Windows
```shell
$env:TENSORRT_ROOT_DIR="C:\path\to\tensorrt"
$env:CUDA_PATH="C:\path\to\cuda" # Usually not needed if the CUDA Toolkit installer has already set it
cmake -B _build -G "Visual Studio 17 2022" -S . -DTENSORRT_ROOT_DIR="$env:TENSORRT_ROOT_DIR" -DCUDA_PATH="$env:CUDA_PATH" -DCMAKE_GENERATOR_TOOLSET="cuda=$env:CUDA_PATH"
cmake --build _build --target ALL_BUILD --config Release --parallel
```

#### Linux
```shell
export TENSORRT_ROOT_DIR="path/to/tensorrt"
cmake -B _build -G "Unix Makefiles" -S . -DCMAKE_BUILD_TYPE=Release
cmake --build _build --target all --config Release --parallel
```

## Build Output Structure
After a successful build, you should see a directory structure like this:

```
_build/
└── release/               # Release (or debug) build artifacts
    ├── audio2emotion-sdk/
    │   ├── bin/           # Audio2Emotion samples and unit test executables
    │   └── lib/           # Audio2Emotion static libraries
    ├── audio2face-sdk/
    │   ├── bin/           # Audio2Face samples and unit test executables
    │   └── lib/           # Audio2Face static libraries
    ├── audio2x-common/
    │   ├── bin/           # Audio2X Common unit test executables
    │   └── lib/           # Audio2X Common static libraries
    └── audio2x-sdk/       # Combines A2E + A2F + A2X Common into a single shared library
        ├── bin/           # audio2x.dll (on Windows)
        ├── include/       # Header files
        └── lib/           # Import libraries (on Windows) or libaudio2x.so (on Linux)
```

The `audio2x-sdk` directory contains the unified SDK that combines both Audio2Emotion and Audio2Face functionality into a single shared library for easy integration.

# Downloading Models and Generating Test Data

## License-protected models (Gated Models)

`Audio2Emotion` models are gated on Hugging Face and require a license click-through tied to your Hugging Face account. To download them, you must:
- Accept the model's license on its Hugging Face page (click `Agree and access repository`).
    - The default model used is https://huggingface.co/nvidia/Audio2Emotion-v2.2. Please visit the page with your Hugging Face account and accept the license. You should see the license prompt the first time you visit.  
- Authenticate the CLI so the script can use your credentials.
    - Generate a [user access token](https://huggingface.co/docs/hub/security-tokens) from your Hugging Face account.
        - Please ensure `Read access to contents of all public gated repos you can access` permission is enabled for this token.
    - Log in via the CLI using: `hf auth login`

Here's a complete example of the whole process:

1. #### Windows
```shell
# Create venv
python -m venv venv # Requires python >= v3.8, <= v3.10.x
.\venv\Scripts\activate
pip install -r deps\requirements.txt # If this step fails, please verify your Python version (python --version).

# Run these scripts in venv
hf auth login         # One-time setup: when prompted, paste the user access token you generated on Hugging Face
.\download_models.bat # Download all the Audio2Face & Audio2Emotion models

# Generate unit test data
# Convert downloaded models to TensorRT format
.\gen_testdata.bat
```

2. #### Linux
```shell
# Create venv
python -m venv venv # Requires python >= v3.8, <= v3.10.x
source ./venv/bin/activate
pip install -r deps/requirements.txt # If this step fails, please verify your Python version (python --version).

# Run these scripts in venv
hf auth login         # One-time setup: when prompted, paste the user access token you generated on Hugging Face
./download_models.sh  # Download all the Audio2Face & Audio2Emotion models

# Generate unit test data
# Convert downloaded models to TensorRT format
./gen_testdata.sh
```

# Running Audio2X SDK Unit Tests and Samples

To verify your setup is correct, run the provided samples and unit tests. This process involves several steps:

1. **Install Python dependencies** - Required packages for downloading models from Hugging Face and generating test data
2. **Download models and generate test data** - Using the provided scripts
3. **Run samples using the wrapper script** - The `run_sample.{bat|sh}` script is necessary because the SDK is a single DLL that depends on CUDA and TensorRT libraries, which must be properly located in the system PATH. 

Below are the platform-specific instructions for Windows and Linux:

1. #### Windows
```shell
# Run samples (Please ensure that the environment variables CUDA_PATH and TENSORRT_ROOT_DIR are set)
.\run_sample.bat .\_build\release\audio2face-sdk\bin\audio2face-unit-tests.exe
.\run_sample.bat .\_build\release\audio2face-sdk\bin\sample-a2f-executor.exe

# By default, the script runs a release build. To run a debug build, pass the debug argument.
.\run_sample.bat debug .\_build\debug\audio2face-sdk\bin\sample-a2f-executor.exe

# Run benchmarks
.\run_sample.bat .\_build\release\audio2face-sdk\bin\audio2face-benchmarks.exe --benchmark_filter=<filter>

# Run benchmarks with a default set of filters
.\run_sample.bat .\audio2face-sdk\source\benchmarks\test_benchmark.bat .\_build\release\audio2face-sdk\bin\audio2face-benchmarks.exe
```

2. #### Linux
```shell
# Run samples (Please ensure that the environment variables CUDA_PATH and TENSORRT_ROOT_DIR are set)
./run_sample.sh ./_build/release/audio2face-sdk/bin/audio2face-unit-tests
./run_sample.sh ./_build/release/audio2face-sdk/bin/sample-a2f-executor

# By default, the script runs a release build. To run a debug build, pass the debug argument.
./run_sample.sh debug ./_build/debug/audio2face-sdk/bin/sample-a2f-executor

# Run benchmarks
./run_sample.sh ./_build/release/audio2face-sdk/bin/audio2face-benchmarks.exe --benchmark_filter=<filter>

# Run benchmarks with a default set of filters
./run_sample.sh ./audio2face-sdk/source/benchmarks/test_benchmark.bat ./_build/release/audio2face-sdk/bin/audio2face-benchmarks.exe
```

## ✅ What to Expect After Running the Samples

- All unit tests should pass
- Sample executables should complete without any errors

    The sample does not include a GUI, so there is no visualization of the generated vertex positions.  
    To view the results, you can either:
    - Export the data to a `.bin` file and visualize it in a DCC (Digital Content Creation) tool of your choice, or
    - Use [Maya-ACE](https://github.com/NVIDIA/Maya-ACE) for direct integration with Autodesk Maya.

## ⚠️ Common Issues and Troubleshooting

If you encounter errors, here are some common causes:

1. `build.bat` shows "Visual Studio installation not found"

    Visual Studio with the C++ compiler toolchain was not found. Please install it or set the `VS_PATH` variable manually in `build.bat`.

2. `build.bat` shows "TENSORRT_ROOT_DIR is not defined"

    Make sure the `TENSORRT_ROOT_DIR` environment variable points to your TensorRT directory.

3. Samples printed `[A2F SDK] [ERROR] Unable to parse file...`

    Make sure to run the download_models and gen_testdata scripts before running the samples or unit tests. These scripts will create the required `_data\generated` directory.

4. `.\venv\Scripts\Activate.ps1` cannot be loaded because running scripts is disabled on this system.

    You need to allow PowerShell to run local scripts. To fix this, open PowerShell and run:
    ```shell
    Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
    ```
    Please check [about_Execution_Policies - Powershell | Microsoft learn](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_execution_policies) for more details.

# Getting Started with Development

Now that you've set up the SDK, here are some recommended next steps to explore and get the most out of it:

### Read the High-Level Overview
Check out the [high-level overview document](docs/) to understand the architecture, core concepts, and key components in the SDK.

### Explore the Samples
Check out the provided samples to see example code and typical use cases for the SDK:
- [Audio2Face Samples](audio2face-sdk/source/samples/)
- [Audio2Emotion Samples](audio2emotion-sdk/source/samples/)

### Try the Maya Plugin
[Maya-ACE](https://github.com/NVIDIA/Maya-ACE) includes a local inference player node that demonstrates direct SDK integration. Setup is more complex than the samples but provides visual results.

### Hugging Face Pretrained Models & Custom Training
Browse the [Audio2Face-3D Hugging Face Collection](https://huggingface.co/collections/nvidia/audio2face-3d-6865d22d6daec4ac85887b17) for available models compatible with this SDK, or use the [Audio2Face-3D Training Framework](https://github.com/NVIDIA/Audio2Face-3D-Training-Framework) to customize and train your own models!


# Citation

If you use Audio2Face-3D Training Framework or Audio2Face-3D models in publications or other outputs, please use citations in the following format (BibTeX entry for LaTeX):

```bibtex
@misc{
      nvidia2025audio2face3d,
      title={Audio2Face-3D: Audio-driven Realistic Facial Animation For Digital Avatars},
      author={Chaeyeon Chung and Ilya Fedorov and Michael Huang and Aleksey Karmanov and Dmitry Korobchenko and Roger Ribera and Yeongho Seol},
      year={2025},
      eprint={2508.16401},
      archivePrefix={arXiv},
      primaryClass={cs.GR},
      url={https://arxiv.org/abs/2508.16401},
      note={Authors listed in alphabetical order}
}
```