# fused-ssim **Repository Path**: huanghone/fused-ssim ## Basic Information - **Project Name**: fused-ssim - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-30 - **Last Updated**: 2025-09-30 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Fully Fused Differentiable SSIM This repository contains an efficient fully-fused implementation of [SSIM](https://en.wikipedia.org/wiki/Structural_similarity_index_measure) which is differentiable in nature. There are several factors that contribute to an efficient implementation: - Convolutions in SSIM are spatially localized leading to fully-fused implementation without touching global memory for intermediate steps. - Backpropagation through Gaussian Convolution is simply another Gaussian Convolution itself. - Gaussian Convolutions are separable leading to reduced computation. - Gaussians are symmetric in nature leading to fewer computations. - Single convolution pass for multiple statistics. As per the original SSIM paper, this implementation uses `11x11` sized convolution kernel. The weights for it have been hardcoded and this is another reason for it's speed. This implementation currently only supports **2D images** but with **variable number of channels** and **batch size**. ## PyTorch Installation Instructions - You must have CUDA and PyTorch+CUDA installed in you Python 3.X environment. This project has currently been tested with: - PyTorch `2.3.1+cu118` and CUDA `11.8` on Ubuntu 24.04 LTS. - PyTorch `2.4.1+cu124` and CUDA `12.4` on Ubuntu 24.04 LTS. - PyTorch `2.5.1+cu124` and CUDA `12.6` on Windows 11. - Run `pip install git+https://github.com/rahul-goel/fused-ssim/ --no-build-isolation` or clone the repository and run `pip install . --no-build-isolation` from the root of this project. - setup.py should detect your GPU architecture automatically. If you want to see the output, run `pip install git+https://github.com/rahul-goel/fused-ssim/ -v --no-build-isolation` or clone the repository and run `pip install . -v --no-build-isolation` from the root of this project. - If the previous command does not work, run `python setup.py install` from the root of this project. ## Usage ```python import torch from fused_ssim import fused_ssim # predicted_image, gt_image: [BS, CH, H, W] # predicted_image is differentiable gt_image = torch.rand(2, 3, 1080, 1920) predicted_image = torch.nn.Parameter(torch.rand_like(gt_image)) ssim_value = fused_ssim(predicted_image, gt_image) ``` By default, `same` padding is used. To use `valid` padding which is the kind of padding used by [pytorch-mssim](https://github.com/VainF/pytorch-msssim): ```python ssim_value = fused_ssim(predicted_image, gt_image, padding="valid") ``` If you don't want to train and use this only for inference, use the following for even faster speed: ```python with torch.no_grad(): ssim_value = fused_ssim(predicted_image, gt_image, train=False) ``` ## Constraints - Currently, only one of the images is allowed to be differentiable i.e. only the first image can be `nn.Parameter`. - Limited to 2D images. - Images must be normalized to range `[0, 1]`. - Standard `11x11` convolutions supported. ## Performance This implementation is 5-8x faster than the previous fastest (to the best of my knowledge) differentiable SSIM implementation [pytorch-mssim](https://github.com/VainF/pytorch-msssim). ## BibTeX If you leverage fused SSIM for your research work, please cite our main paper: ``` @inproceedings{taming3dgs, author = {Mallick, Saswat Subhajyoti and Goel, Rahul and Kerbl, Bernhard and Steinberger, Markus and Carrasco, Francisco Vicente and De La Torre, Fernando}, title = {Taming 3DGS: High-Quality Radiance Fields with Limited Resources}, year = {2024}, url = {https://doi.org/10.1145/3680528.3687694}, doi = {10.1145/3680528.3687694}, booktitle = {SIGGRAPH Asia 2024 Conference Papers}, series = {SA '24} } ``` ## Acknowledgements Thanks to [Bernhard](https://snosixtyboo.github.io) for the idea. Thanks to [Janusch](https://github.com/MrNeRF) for further optimizations. Thanks to [Florian](https://fhahlbohm.github.io/) and [Ishaan](https://ishaanshah.xyz) for testing.