# neural-style-audio-tf

**Repository Path**: mirrors_DmitryUlyanov/neural-style-audio-tf

## Basic Information

- **Project Name**: neural-style-audio-tf
- **Description**: TensorFlow implementation for audio neural style.
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2022-01-11
- **Last Updated**: 2025-11-04

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Audio Style Transfer

This is a TensorFlow reimplementation of [Vadim's Lasagne code](https://github.com/vadim-v-lebedev/audio_style_tranfer) for style transfer algorithm for audio, which uses convolutions with random weights to represent audio features.

To listen to examples go to the [blog post](http://dmitryulyanov.github.io/audio-texture-synthesis-and-style-transfer/). Also check out [Torch implementation](https://github.com/DmitryUlyanov/neural-style-audio-torch).

So far it is CPU only, but if you are proficient in TensorFlow it should be easy to switch. Actually it runs fast on CPU.
### Dependencies
- python (tested with 2.7)
- TensorFlow ([installation instructions](https://www.tensorflow.org/get_started/os_setup))
- librosa
```
pip install librosa
```
- numpy and matplotlib

The easiest way to install python is to use [Anaconda](https://www.continuum.io/downloads).

### How to run
- Open `neural-style-audio-tf.ipynb` in Jupyter.
- In case you want to use your own audio files as inputs, first cut them to 10s length with:
```
ffmpeg -i yourfile.mp3 -ss 00:00:00 -t 10 yourfile_10s.mp3
```
- Set `CONTENT_FILENAME` and `STYLE_FILENAME` in the third cell of Jupyter notebook to your input files.
- Run all cells.

The most frequent problem is domination of either content or style in the output. To fight this problem, adjust `ALPHA` parameter. Larger `ALPHA` means more content in the output, and `ALPHA=0` means no content, which reduces stylization to texture generation. Example output `outputs/imperial_usa.wav`, the result of mixing content of imperial march from star wars with style of U.S. National Anthem, was obtained with default value `ALPHA=1e-2`.

### References
- Original paper on style transfer:
[A Neural Algorithm of Artistic Style](https://arxiv.org/abs/1508.06576)
- [Neural style TensorFlow implementation](https://github.com/anishathalye/neural-style)
- Publications on texture generation with random convolutions:
 - [Extreme Style Machines](https://nucl.ai/blog/extreme-style-machines/)
 - [Texture Synthesis Using Shallow Convolutional Networks with Random Filters](https://arxiv.org/abs/1606.00021)
 - [A Powerful Generative Model Using Random Weights for the Deep Image Representation](https://arxiv.org/pdf/1606.04801)