# moonshine **Repository Path**: my_forks/moonshine ## Basic Information - **Project Name**: moonshine - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-02-23 - **Last Updated**: 2025-02-23 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Moonshine

[[Blog]](https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/) [[Paper]](https://arxiv.org/abs/2410.15608) [[Model Card]](https://github.com/usefulsensors/moonshine/blob/main/model-card.md) [[Podcast]](https://notebooklm.google.com/notebook/d787d6c2-7d7b-478c-b7d5-a0be4c74ae19/audio) Moonshine is a family of speech-to-text models optimized for fast and accurate automatic speech recognition (ASR) on resource-constrained devices. It is well-suited to real-time, on-device applications like live transcription and voice command recognition. Moonshine obtains word-error rates (WER) better than similarly-sized Whisper models from OpenAI on the datasets used in the [OpenASR leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard) maintained by HuggingFace:
TinyBase
| WER | Moonshine | Whisper | | ---------- | --------- | ------- | | Average | **12.66** | 12.81 | | AMI | 22.77 | 24.24 | | Earnings22 | 21.25 | 19.12 | | Gigaspeech | 14.41 | 14.08 | | LS Clean | 4.52 | 5.66 | | LS Other | 11.71 | 15.45 | | SPGISpeech | 7.70 | 5.93 | | Tedlium | 5.64 | 5.97 | | Voxpopuli | 13.27 | 12.00 | | WER | Moonshine | Whisper | | ---------- | --------- | ------- | | Average | **10.07** | 10.32 | | AMI | 17.79 | 21.13 | | Earnings22 | 17.65 | 15.09 | | Gigaspeech | 12.19 | 12.83 | | LS Clean | 3.23 | 4.25 | | LS Other | 8.18 | 10.35 | | SPGISpeech | 5.46 | 4.26 | | Tedlium | 5.22 | 4.87 | | Voxpopuli | 10.81 | 9.76 |
Moonshine's compute requirements scale with the length of input audio. This means that shorter input audio is processed faster, unlike existing Whisper models that process everything as 30-second chunks. To give you an idea of the benefits: Moonshine processes 10-second audio segments _5x faster_ than Whisper while maintaining the same (or better!) WER. This repo hosts the inference code for Moonshine. ## Installation We like `uv` for managing Python environments, so we use it here. If you don't want to use it, simply skip the first step and leave `uv` off of your shell commands. ### 1. Create a virtual environment First, [install](https://github.com/astral-sh/uv) `uv` for Python environment management. Then create and activate a virtual environment: ```shell uv venv env_moonshine source env_moonshine/bin/activate ``` ### 2. Install the Moonshine package The `moonshine` inference code is written in Keras and can run with each of the backends that Keras supports: Torch, TensorFlow, and JAX. The backend you choose will determine which flavor of the `moonshine` package to install. If you're just getting started, we suggest installing the (default) Torch backend: ```shell uv pip install useful-moonshine@git+https://github.com/usefulsensors/moonshine.git ``` To run the provided inference code, you have to instruct Keras to use the PyTorch backend by setting an environment variable: ```shell export KERAS_BACKEND=torch ``` To run with the TensorFlow backend, run the following to install Moonshine and set the environment variable: ```shell uv pip install useful-moonshine[tensorflow]@git+https://github.com/usefulsensors/moonshine.git export KERAS_BACKEND=tensorflow ``` To run with the JAX backend, run the following: ```shell uv pip install useful-moonshine[jax]@git+https://github.com/usefulsensors/moonshine.git export KERAS_BACKEND=jax # Use useful-moonshine[jax-cuda] for jax on GPU ``` ### 3. Try it out You can test Moonshine by transcribing the provided example audio file with the `.transcribe` function: ```shell python >>> import moonshine >>> moonshine.transcribe(moonshine.ASSETS_DIR / 'beckett.wav', 'moonshine/tiny') ['Ever tried ever failed, no matter try again, fail again, fail better.'] ``` The first argument is a path to an audio file and the second is the name of a Moonshine model. `moonshine/tiny` and `moonshine/base` are the currently available models. ## TODO * [ ] Live transcription demo * [ ] ONNX model * [ ] CTranslate2 support * [ ] MLX support ## Citation If you benefit from our work, please cite us: ``` @misc{jeffries2024moonshinespeechrecognitionlive, title={Moonshine: Speech Recognition for Live Transcription and Voice Commands}, author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden}, year={2024}, eprint={2410.15608}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2410.15608}, } ```