# steermusic

**Repository Path**: mirrors_sony/steermusic

## Basic Information

- **Project Name**: steermusic
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-11-16
- **Last Updated**: 2026-05-17

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Implementation of SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing

This is implementation codes of the [paper](https://doi.org/10.48550/arXiv.2504.10826): SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing.

Our demonstration page is available in [Demo](https://steermusic.pages.dev/).

## Environment setting

This code was tested with Python3.8.10, Pytorch 2.2.0+cu121. SteerMusic relies on a pretrained [AudioLDM2](https://github.com/haoheliu/AudioLDM2).

You can install the Python dependencies with
```
pip3 install -r requirements.txt
```

If you encounter issues such as
`
ImportError: cannot import name 'cached_download' from 'huggingface_hub'
`
please manually change `cached_download` to `hf_hub_download` in `diffusers/utils/dynamic_modules_utils.py`.

## SteerMusic for Zero-shot Text-guided Music Editing

To perform a corse-grained text-to-music editing, run 

```
python SteerMusic_edit.py --audio_path '/path/to/source/music/' --prompt 'target prompt' --prompt_ref 'source prompt' --output_dir '/output/path/' --guidance_scale 30
```

Example 
```
python SteerMusic_edit.py --audio_path "./audios/bach_anh114.wav" --prompt "Energetic guitar cover with a groovy, reverberant melody." --prompt_ref "Energetic piano cover with a groovy, reverberant melody." --guidance_scale 30  --weight_aug 3
```

## SteerMusic+ for Personalized Music Editing

SteerMusic+ relies on a fine-tuned personalized diffusion model. To fine-tune a personalized diffusion model, please refer to [DreamSound](https://github.com/zelaki/DreamSound). In this SteerMusic+ implementation, we plug-in SteerMusic+ to a DreamSound fine-tuned on a **AudioLDM2 checkpoint**. To personalize your music editing, please follow the fine-tune instruction provided in [DreamSound](https://github.com/zelaki/DreamSound) and obtain a checkpoint captured the desired musical concept token.


To perform a fine-grained personalized music editing, please run

```
python SteerMusic_personalized.py --audio_path '/path/to/source/music/' --prompt_ref 'source prompt with [emphasized] edit area, e.g., a recording of [piano] music' --concept 'target concept' --personalized_ckpt '/path/to/personalized/diffusion/ckpt/' --guidance_scale 15
```

This is an example command. We provide an example of fine-tuned DreamSound ckpt on [bouzouki] concept which can be downloaded via the [link](https://zenodo.org/records/15226658). The reference audio examples for bouzouki are available inside the folder `audios`. Please unzip the downloaded ckpt file and put to the path `./DreamSound/outputs_bouzouki/`, then execute the codes as below:

```
python SteerMusic_personalized.py --audio_path "./audios/bach_anh114.wav" --prompt_ref "Energetic [piano] cover with a groovy, reverberant melody." --concept 'bouzouki' --personalized_ckpt './Dreamsound/outputs_bouzouki/pipeline_step_100' --guidance_scale 20
```

## Evaluation Metrics

- For CLAP and LPAPS score, please refer to [CLAP](./eval/clap_score.py) and [LPAPS](./eval/lpaps_score.py). These codes are adapted from [AudioEditingCode](https://github.com/HilaManor/AudioEditingCode).

- For FAD scores, please refer to [fadtk](https://github.com/microsoft/fadtk)

- For CDPAM score, please refer to [CDPAM](./eval/CDPAM.py), which is adapted from [CDPAM_repo](https://github.com/pranaymanocha/PerceptualAudio).

- For CQT-1 PCC score, please refer to [CQT-1](./eval/CQT1_PCC.py)

## Q&A

### 1.  The edited results are not noticeable. What should I do?

If the editing effects are not noticeable, consider increasing the guidance scale. A higher guidance scale can strengthen the influence of the editing instructions, leading to more pronounced changes in the output. See Section 5 in our paper and our supplementary materials for the discussion of the trade-off between editing effects and original music content preservation.

### 2.  The edited results of SteerMusic+ significantly distort the source melody. What should I do?

One possible reason is that the personalized diffusion model may be overfitted. To mitigate this, try adjusting the number of fine-tuning steps used during personalization. For more details, please refer to Figure 11 in our paper.

## Acknowledgement

We acknowledge the following works for sharing their implementation code:

[Delta_denoising_score](https://github.com/ethanhe42/dds/tree/main); [Constrative_denoising_score](https://github.com/HyelinNAM/ContrastiveDenoisingScore); [DreamSound](https://github.com/zelaki/DreamSound); [AudioEditingCodes](https://github.com/HilaManor/AudioEditingCode); [AudioLDM2](https://github.com/haoheliu/AudioLDM2);