# FCPE
**Repository Path**: s-water/FCPE
## Basic Information
- **Project Name**: FCPE
- **Description**: TorchFCPE(Fast Context-based Pitch Estimation) is a PyTorch-based library designed for audio pitch extraction and MIDI conversion
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-03-31
- **Last Updated**: 2025-03-31
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
TorchFCPE
## Overview
TorchFCPE(Fast Context-based Pitch Estimation) is a PyTorch-based library designed for audio pitch extraction and MIDI conversion. This README provides a quick guide on how to use the library for audio pitch inference and MIDI extraction.
Note: that the MIDI extractor of FCPE is quantized from f0 using non neural network methods
Note: I won't be updating FCPE (or benchmark) so soon, but I will definitely release a version with cleaned-up code by no later than next year.
## Installation
Before using the library, make sure you have the necessary dependencies installed:
```bash
pip install torchfcpe
```
## Usage
### 1. Audio Pitch Inference
```python
from torchfcpe import spawn_bundled_infer_model
import torch
import librosa
# Configure device and target hop size
device = 'cpu' # or 'cuda' if using a GPU
sr = 16000 # Sample rate
hop_size = 160 # Hop size for processing
# Load and preprocess audio
audio, sr = librosa.load('test.wav', sr=sr)
audio = librosa.to_mono(audio)
audio_length = len(audio)
f0_target_length = (audio_length // hop_size) + 1
audio = torch.from_numpy(audio).float().unsqueeze(0).unsqueeze(-1).to(device)
# Load the model
model = spawn_bundled_infer_model(device=device)
# Perform pitch inference
f0 = model.infer(
audio,
sr=sr,
decoder_mode='local_argmax', # Recommended mode
threshold=0.006, # Threshold for V/UV decision
f0_min=80, # Minimum pitch
f0_max=880, # Maximum pitch
interp_uv=False, # Interpolate unvoiced frames
output_interp_target_length=f0_target_length, # Interpolate to target length
)
print(f0)
```
### 2. MIDI Extraction
```python
# Extract MIDI from audio
midi = model.extact_midi(
audio,
sr=sr,
decoder_mode='local_argmax', # Recommended mode
threshold=0.006, # Threshold for V/UV decision
f0_min=80, # Minimum pitch
f0_max=880, # Maximum pitch
output_path="test.mid", # Save MIDI to file
)
print(midi)
```
### Notes
- **Inference Parameters:**
- `audio`: Input audio as a `torch.Tensor`.
- `sr`: Sample rate of the audio.
- `decoder_mode` (Optional): Mode for decoding, 'local_argmax' is recommended.
- `threshold` (Optional): Threshold for voice/unvoiced decision; default is 0.006.
- `f0_min` (Optional): Minimum pitch value; default is 80 Hz.
- `f0_max` (Optional): Maximum pitch value; default is 880 Hz.
- `interp_uv` (Optional): Whether to interpolate unvoiced frames; default is False.
- `output_interp_target_length` (Optional): Length to which the output pitch should be interpolated.
- **MIDI Extraction Parameters:**
- `audio`: Input audio as a `torch.Tensor`.
- `sr`: Sample rate of the audio.
- `decoder_mode` (Optional): Mode for decoding; 'local_argmax' is recommended.
- `threshold` (Optional): Threshold for voice/unvoiced decision; default is 0.006.
- `f0_min` (Optional): Minimum pitch value; default is 80 Hz.
- `f0_max` (Optional): Maximum pitch value; default is 880 Hz.
- `output_path` (Optional): File path to save the MIDI file. If not provided, only returns the MIDI structure.
- `tempo` (Optional): BPM for the MIDI file. If None, BPM is automatically predicted.
## Additional Features
- **Model as a PyTorch Module:**
You can use the model as a standard PyTorch module. For example:
```python
# Change device
model = model.to(device)
# Compile model
model = torch.compile(model)
```