# mcmc-bnn-example **Repository Path**: mirrors_NVIDIA/mcmc-bnn-example ## Basic Information - **Project Name**: mcmc-bnn-example - **Description**: Reference CUDA implementation of training a small Bayesian neural network (BNN) using MCMC - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-01-24 - **Last Updated**: 2026-03-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## Bayesian neural network (BNN) training using MCMC This repository is a CUDA implementation of training a Bayesian neural network using MCMC, specifically Hamiltonian Monte Carlo (HMC). The following image shows the architecture of the neural network used here. ![BNN architecture: a small 2-layer MLP](bnn_arch.png "BNN architecture") It is a small MLP with two layers, 22 features in the input, 40 hidden features and a single output feature. The dataset size is 58368, used in its entirety as the batch for estimating gradients. ## Performance measurements The following table shows measurements of performance obtained using this implementation. The numbers are multiplied by a factor of 200, imitating 200 windows on which the training would be applied. | Setting | #Samples | Time (min) | | ----------- | ----------- | ----------- | | 1xA100, TF32 | 800 | 42 | | 4xA100, TF32 | 800 | 14 | | 1xA100, TF32 | 1600 | 84 | | 4xA100, TF32 | 1600 | 23 | ## How to get started with this repo ### Initial setup 1. Clone and `cd path/to/repo` 1. `wget "https://s3-us-west-1.amazonaws.com/gc-demo-resources/returns_and_features_for_mcmc.tar.gz"` 1. `tar xf returns_and_features_for_mcmc.tar.gz` 1. `rm ._returns_and_features_for_mcmc.txt returns_and_features_for_mcmc.tar.gz` ### Compile/run 1. Requires: 1. G++ 1. CTK 1. CUDA run-time 1. Use `python compile_run.py --arch sm_70` for V100 (FP32 only) 1. Use `python compile_run.py --arch sm_80` for A100 (FP32 only) 1. Use `python compile_run.py --arch sm_80 --tc` for A100 (using TF32 through tensor cores)