# NovaSR **Repository Path**: baxtax/NovaSR ## Basic Information - **Project Name**: NovaSR - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2026-05-16 - **Last Updated**: 2026-05-16 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README ## NovaSR: Pushing the Limits of Extreme Efficiency in Audio Super-Resolution

Hugging Face Model   Hugging Face Space   Kaggle Notebook

This is the repository for NovaSR, a tiny 50kb audio upsampling model that upscales muffled 16khz audio into clear and crisp 48khz audio at speeds over 3500x realtime. https://github.com/user-attachments/assets/c81f87eb-f6de-4bf9-85bd-dfc9a223a865 ### Key benefits * Speed: Can reach 3600x realtime speed on a single a100 gpu. * Quality: On par with models 5,000x larger. * Size: Just 52kb in size, several thousand times smaller then most. ### Why is this even useful? * Enhancing models: NovaSR can enhance TTS model quality considerably with nearly 0 computational cost. * Real-time enhancement: NovaSR allows for on device enhancement of any low quality calls, audio, etc. while using nearly no memory. * Restoring datasets: NovaSR can enhance audio quality of any audio dataset. ### Comparisons Comparisons were done on A100 gpu. Higher realtime means faster processing speeds. | Model | Speed (Real-Time) | Model Size | | :------------ | :---------------- | :--------- | | **NovaSR** | **3600x realtime** | **~52 KB** | | FlowHigh | 20x realtime | ~450 MB | | FlashSR | 14x realtime | ~1000 MB | | AudioSR | 0.6x realtime | ~2000 MB | ### Examples Please check the [huggingface model](https://huggingface.co/YatharthS/NovaSR) for a few examples. ### Usage You can try it on [huggingface spaces](https://huggingface.co/spaces/YatharthS/NovaSR) or locally. Simple 1 line installation: ``` pip install git+https://github.com/ysharma3501/NovaSR.git ``` Load model ```python from NovaSR import FastSR upsampler = FastSR() ## downloads from hf ## Use this instead for CPUs as it leads to 3-4x speedup. # upsampler = FastSR(half=False) ``` Run model ```python from IPython.display import Audio ## replace audio_path.wav with your wav/mp3 file lowres_audio = upsampler.load_audio('audio_path.wav') ## infer with model highres_audio = upsampler.infer(lowres_audio).cpu() display(Audio(highres_audio, rate=48000)) ``` ### Training Please check out the kaggle notebook for training the model further on custom datasets: https://www.kaggle.com/code/yatharthsharma888/novasr-training ### Info Q: How much data was this trained on? A: Just 100 hours of data(mls_sidon along with vctk) Q: How is it so small? A: It uses less then 10 tiny conv1d layers along with snake activations based on bigvgan for maximum quality and size. Q: Will benchmarks come? A: Yes, I am still training it further and will benchmark it later. ## Final Notes Repo stars and model likes would be appreciated if found helpful, thank you. Email: yatharthsharma3501@gmail.com