# CosyVoice

**Repository Path**: Zdevote/cosy-voice

## Basic Information

- **Project Name**: CosyVoice
- **Description**: 基于来源CosyVoice 的 声音复刻以及合成
- **Primary Language**: JavaScript
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-04-30
- **Last Updated**: 2026-04-30

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# CosyVoice Demo

This is a small, no-dependency demo for Alibaba Cloud Bailian CosyVoice voice cloning and voice design.

It includes:

- Voice cloning form with consent confirmation and audio URL validation.
- Voice design form that returns `voice_id` and preview audio when the API provides it.
- Query, list, and delete helpers for custom voices.
- HTTP speech synthesis with a returned browser-playable audio URL.

## Run

```bash
cp .env.example .env
# edit .env and set DASHSCOPE_API_KEY
npm run dev
```

Open:

```text
http://localhost:5177
```

## Notes

- The demo keeps `DASHSCOPE_API_KEY` on the local server and never sends it to the browser.
- Sample audio should be a clear human voice, normally 10-20 seconds, up to 60 seconds, WAV/MP3/M4A, no larger than 10 MB, with at least 16 kHz sample rate.
- `cosyvoice-v3.5-plus` and `cosyvoice-v3.5-flash` are limited to the China mainland Beijing region according to the current Alibaba Cloud documentation.
- CosyVoice speech synthesis can use the non-realtime HTTP API for a simple demo. The optional WebSocket path remains available by sending `transport: "websocket"`, but the UI defaults to HTTP because it is easier to debug for single-click preview.
- Add a product-level authorization flow before any production use. Voice cloning must only be used with explicit permission from the voice owner.

## API Surface

```text
GET    /api/health
POST   /api/cosyvoice/clone
POST   /api/cosyvoice/design
GET    /api/cosyvoice/:voiceId
GET    /api/cosyvoice?prefix=demo
DELETE /api/cosyvoice/:voiceId
POST   /api/cosyvoice/synthesize-preview
```

The REST proxy uses the official voice customization endpoint:

```text
POST /api/v1/services/audio/tts/customization
```

The upstream request body uses `model: "voice-enrollment"` and passes the selected CosyVoice synthesis model as `input.target_model`.

Do not send `X-DashScope-Async: enable` to this endpoint. Some accounts and customization APIs reject asynchronous calls with `current user api does not support asynchronous calls`.

The synthesis proxy uses:

```text
POST /api/v1/services/audio/tts/SpeechSynthesizer
```

If the API returns `[cosyvoice]Engine return error code: 418`, first verify that the selected `model` exactly matches the `target_model` used when the custom `voice_id` was created. Custom voice IDs commonly start with that model name, for example `cosyvoice-v3-plus-...`.