# moonshine
**Repository Path**: my_forks/moonshine
## Basic Information
- **Project Name**: moonshine
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-02-23
- **Last Updated**: 2025-02-23
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
Moonshine
[[Blog]](https://petewarden.com/2024/10/21/introducing-moonshine-the-new-state-of-the-art-for-speech-to-text/) [[Paper]](https://arxiv.org/abs/2410.15608) [[Model Card]](https://github.com/usefulsensors/moonshine/blob/main/model-card.md) [[Podcast]](https://notebooklm.google.com/notebook/d787d6c2-7d7b-478c-b7d5-a0be4c74ae19/audio)
Moonshine is a family of speech-to-text models optimized for fast and accurate automatic speech recognition (ASR) on resource-constrained devices. It is well-suited to real-time, on-device applications like live transcription and voice command recognition. Moonshine obtains word-error rates (WER) better than similarly-sized Whisper models from OpenAI on the datasets used in the [OpenASR leaderboard](https://huggingface.co/spaces/hf-audio/open_asr_leaderboard) maintained by HuggingFace:
| Tiny | Base |
|
| WER | Moonshine | Whisper |
| ---------- | --------- | ------- |
| Average | **12.66** | 12.81 |
| AMI | 22.77 | 24.24 |
| Earnings22 | 21.25 | 19.12 |
| Gigaspeech | 14.41 | 14.08 |
| LS Clean | 4.52 | 5.66 |
| LS Other | 11.71 | 15.45 |
| SPGISpeech | 7.70 | 5.93 |
| Tedlium | 5.64 | 5.97 |
| Voxpopuli | 13.27 | 12.00 |
|
| WER | Moonshine | Whisper |
| ---------- | --------- | ------- |
| Average | **10.07** | 10.32 |
| AMI | 17.79 | 21.13 |
| Earnings22 | 17.65 | 15.09 |
| Gigaspeech | 12.19 | 12.83 |
| LS Clean | 3.23 | 4.25 |
| LS Other | 8.18 | 10.35 |
| SPGISpeech | 5.46 | 4.26 |
| Tedlium | 5.22 | 4.87 |
| Voxpopuli | 10.81 | 9.76 |
|
Moonshine's compute requirements scale with the length of input audio. This means that shorter input audio is processed faster, unlike existing Whisper models that process everything as 30-second chunks. To give you an idea of the benefits: Moonshine processes 10-second audio segments _5x faster_ than Whisper while maintaining the same (or better!) WER.
This repo hosts the inference code for Moonshine.
## Installation
We like `uv` for managing Python environments, so we use it here. If you don't want to use it, simply skip the first step and leave `uv` off of your shell commands.
### 1. Create a virtual environment
First, [install](https://github.com/astral-sh/uv) `uv` for Python environment management.
Then create and activate a virtual environment:
```shell
uv venv env_moonshine
source env_moonshine/bin/activate
```
### 2. Install the Moonshine package
The `moonshine` inference code is written in Keras and can run with each of the backends that Keras supports: Torch, TensorFlow, and JAX. The backend you choose will determine which flavor of the `moonshine` package to install. If you're just getting started, we suggest installing the (default) Torch backend:
```shell
uv pip install useful-moonshine@git+https://github.com/usefulsensors/moonshine.git
```
To run the provided inference code, you have to instruct Keras to use the PyTorch backend by setting an environment variable:
```shell
export KERAS_BACKEND=torch
```
To run with the TensorFlow backend, run the following to install Moonshine and set the environment variable:
```shell
uv pip install useful-moonshine[tensorflow]@git+https://github.com/usefulsensors/moonshine.git
export KERAS_BACKEND=tensorflow
```
To run with the JAX backend, run the following:
```shell
uv pip install useful-moonshine[jax]@git+https://github.com/usefulsensors/moonshine.git
export KERAS_BACKEND=jax
# Use useful-moonshine[jax-cuda] for jax on GPU
```
### 3. Try it out
You can test Moonshine by transcribing the provided example audio file with the `.transcribe` function:
```shell
python
>>> import moonshine
>>> moonshine.transcribe(moonshine.ASSETS_DIR / 'beckett.wav', 'moonshine/tiny')
['Ever tried ever failed, no matter try again, fail again, fail better.']
```
The first argument is a path to an audio file and the second is the name of a Moonshine model. `moonshine/tiny` and `moonshine/base` are the currently available models.
## TODO
* [ ] Live transcription demo
* [ ] ONNX model
* [ ] CTranslate2 support
* [ ] MLX support
## Citation
If you benefit from our work, please cite us:
```
@misc{jeffries2024moonshinespeechrecognitionlive,
title={Moonshine: Speech Recognition for Live Transcription and Voice Commands},
author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},
year={2024},
eprint={2410.15608},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2410.15608},
}
```