# Stream-DiffVSR **Repository Path**: stfocus/Stream-DiffVSR ## Basic Information - **Project Name**: Stream-DiffVSR - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-05 - **Last Updated**: 2026-01-05 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion **Authors:** Hau-Shiang Shiu, Chin-Yang Lin, Zhixiang Wang, Chi-Wei Hsiao, Po-Fan Yu, Yu-Chih Chen, Yu-Lun Liu

### TODO - ✅ Release inference code and model weights - ⬜ Release training code ## Abstract Diffusion-based video super-resolution (VSR) methods achieve strong perceptual quality but remain impractical for latency-sensitive settings due to reliance on future frames and expensive multi-step denoising. We propose Stream-DiffVSR, a causally conditioned diffusion framework for efficient online VSR. Operating strictly on past frames, it combines a four-step distilled denoiser for fast inference, an Auto-regressive Temporal Guidance (ARTG) module injecting motion-aligned cues during latent denoising, and a lightweight temporal-aware decoder with a Temporal Processor Module (TPM) enhancing detail and temporal coherence. Stream-DiffVSR processes 720p frames in 0.328 seconds on an RTX4090 GPU and significantly outperforms prior diffusion-based methods. Compared with the online SOTA TMP~\citep{zhang2024tmp}, it boosts perceptual quality (LPIPS +0.095) while reducing latency by over 130X. Stream-DiffVSR achieves the lowest latency reported for diffusion-based VSR reducing initial delay from over 4600 seconds to 0.328 seconds, thereby making it the first diffusion VSR method suitable for low-latency online deployment. ## Usage ### Environment The code is based on Python 3.8, CUDA 11, and [diffusers](https://github.com/huggingface/diffusers). ### Conda setup ``` git clone https://github.com/jamichss/Stream-DiffVSR.git cd Stream-DiffVSR conda env create -f requirements.yml conda activate stream-diffvsr ``` ### Pretrained models Pretrained models are available [here](https://huggingface.co/Jamichsu/Stream-DiffVSR). You don't need to download them explicitly as they are fetched with inference code. ### Inference You can run the inference directly using the following command. No manual download of checkpoints is required, as the inference script will automatically fetch the necessary files. ``` python inference.py \ --model_id 'Jamichsu/Stream-DiffVSR' \ --out_path 'YOUR_OUTPUT_PATH' \ --in_path 'YOUR_INPUT_PATH' \ --num_inference_steps 4 ``` The expected file structure for the inference input data is outlined below. The model processes individual video sequences contained within subdirectories. ``` YOUR_INPUT_PATH/ ├── seq1/ │ ├── frame_0001.png │ ├── frame_0002.png │ └── ... ├── seq2/ │ ├── frame_0001.png │ ├── frame_0002.png │ └── ... ``` For additional acceleration using NVIDIA TensorRT, please execute the following command. Please note that utilizing TensorRT may introduce a slight degradation in the output quality while providing significant performance gains. ``` python inference.py \ --model_id 'Jamichsu/Stream-DiffVSR' \ --out_path 'YOUR_OUTPUT_PATH' \ --in_path 'YOUR_INPUT_PATH' \ --num_inference_steps 4 \ --enable_tensorrt \ --image_height \ --image_width ``` When executing the TensorRT command for the first time with a new input resolution, you may observe that the process takes an extended period to build the dedicated TensorRT engine. We kindly ask for your patience. Please note that this engine compilation is a one-time setup step for that specific resolution, essential for enabling subsequent accelerated inference at the same setting. ## Citation If you find this work useful, please consider citing: ```bibtex @article{shiu2025stream, title={Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion}, author={Shiu, Hau-Shiang and Lin, Chin-Yang and Wang, Zhixiang and Hsiao, Chi-Wei and Yu, Po-Fan and Chen, Yu-Chih and Liu, Yu-Lun}, journal={arXiv preprint arXiv:2512.23709}, year={2025} } ```