# LLMSpeech

**Repository Path**: sciencewys/LLMSpeech

## Basic Information

- **Project Name**: LLMSpeech
- **Description**: LLMSPeech 可实现地语音聊天、控制电脑做出简单操作的功能。
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 7
- **Forks**: 4
- **Created**: 2025-06-03
- **Last Updated**: 2025-09-26

## Categories & Tags

**Categories**: llm

**Tags**: 人机对话

## README

## 1. 项目名称 LLMSpeech

通过开源项目：语音转文字模型 SenseVoice，文字转语音模型 CosyVoice2，大语言模型服务 Ollama，键盘鼠标宏 KeymouseGo。LLMSPeech 可实现地语音聊天、控制电脑做出简单操作的功能。


## 2. 环境部署

### 2.1. 前期准备

Windows 11 (10)

安装 [Git](https://git-scm.cn/downloads/win)


下载 [ffmpeg](https://ffmpeg.org/download.html) 并加入环境变量, 选择
Windows builds from gyan.dev ，选择 ffmpeg-git-essentials.7z。
解压后将 bin 加入环境变量 PATH 中

下载 [Ollama](https://ollama.com/download) ，打开 PowerShell 下载一个非思考模型。qwen3 0.6b 版本用于测试连通性，qwen3:4b-instruct用于测试prompt。

```PowerShell
ollama run qwen3:qwen3:0.6b
ollama run qwen3:4b-instruct
```

### 2.2. 安装虚拟环境

 [Miniconda(清华源)](https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/)， 下载 Miniconda3-py313_25.5.1-1-Windows-x86_64.exe 版本，安装时勾选 `Add Miniconda3 to my PATH environment variable`

打开 PowerShell ，如果想要在 PowerShell 使用 conda ，开发环境可执行以下:添加权限，Y 确认，初始化，取消base环境自启动，关闭 PowerShell。若不执行，后续操作需要在 Anaconda PowerShell Prompt 中进行

```PowerShell
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
conda init
conda config --set auto_activate_base false
```

加入信任链接，安装虚拟环境，配置镜像源。

```PowerShell
conda tos accept --override-channels --channel  https://repo.anaconda.com/pkgs/main
conda tos accept --override-channels --channel  https://repo.anaconda.com/pkgs/r
conda tos accept --override-channels --channel  https://repo.anaconda.com/pkgs/msys2

conda create -n llmspeech python=3.10 -y
conda activate llmspeech

python -m pip install -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple --upgrade pip
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/linux-64/
conda config --set show_channel_urls yes
conda upgrade pip

```

### 2.3. 可跳过该步骤。单独测试各个模块和环境配置

#### 2.3.1. 安装 SenseVoice

SenseVoice 下载，配置环境，运行 python 脚本自动下载模型,等待显示成功。默认下载模型位置为 `.cache\modelscope\hub\iic`，
其中 `speech_fsmn_vad_zh-cn-16k-common-pytorch` 是 AuroModel 类的 `vad_model` 参数, `SenseVoiceSmall` 是 AuroModel 类的 `model` 参数。

```
git clone https://github.com/FunAudioLLM/SenseVoice.git
conda create -n sensevoice python=3.10 -y
conda activate sensevoice
cd SenseVoice
pip install -r requirements.txt
```

如果有英伟达显卡，最好使用 CUDA,这个命令在 CosyVoice 中也有

```
pip install torch==2.3.1+cu121 torchaudio==2.3.1+cu121 -f https://download.pytorch.org/whl/torch_stable.html
```

检测是否安装成功：将 demo_sensevoice.py 复制到 LLMSpeech 目录下，运行


#### 2.3.2. 安装 CosyVoice

CosyVoice,默认下载模型位置在 pretrained_models/CosyVoice2-0.5B

```
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
cd CosyVoice
git submodule update --init --recursive
conda create -n cosyvoice -y python=3.10
conda activate cosyvoice
conda install -y -c conda-forge pynini==2.1.5
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
```

把 CosyVoice 重命名为 CosyVoice2, 然后把里面的 cosyvoice 文件夹复制到 LLMSpeech 根目录

下载 CosyVoice2-0.5B 模型：把 `demo\download_cosyvoice-0.5B.py` 复制到 LLMSpeech 根目录下，运行

检测是否安装成功：demo_cosyvoice.py 也复制到 LLMSpeech 根目录，应该可以直接运行。

至此， CosyVoice 还有一个小问题，就是无法断网运行。将 assets/pys/wetext.py 覆盖复制到 wetext 包，参考路径 `miniconda3\envs\cosyvoice\Lib\site-packages\wetext` 这样 wetext 就不会在每次启动时联网了

#### 2.3.3. 安装 KeymouseGo

```powershell
git clone https://github.com/taojy123/KeymouseGo.git
cd KeymouseGo
pip install -r requirements-windows.txt

```

### 2.4. 配置环境

#### 2.4.1 安装依赖

```powershell
pip install -r requirements.txt
```

4.1.1 如果没有英伟达显卡，就注释掉第一行 `--extra-index-url https://download.pytorch.org/whl/cu121`，此时会会安装 pytorch cpu 版本，速度会慢。

4.1.2 如果网络连接不畅，可以下载 [UsbEAm Hosts Editor](https://www.dogfight360.com/blog/18627/)，自动优化 github, pytorch, ollama 的 host 文件。

4.1.3 此时程序中的 wetext 会在启动时自动联网， LLMSpeech 无法断网运行。将 assets/pys/wetext.py 覆盖复制到 wetext 包，参考路径 `miniconda3\envs\llmspeech\Lib\site-packages\wetext` 这样 wetext 就不会在每次启动时联网了

#### 2.4.2 克隆必要仓库

```powershell
git clone https://github.com/FunAudioLLM/SenseVoice.git
git clone https://github.com/taojy123/KeymouseGo.git
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
cd CosyVoice
git submodule update --init --recursive
```
将 CosyVoice 重命名为 CosyVoice2, 然后把里面的 cosyvoice 文件夹复制到 LLMSpeech 根目录

#### 2.4.3 安装 CosyVoice-0.5B 模型

```powershell
python demo/download_cosyvoice-0.5B.py
```

#### 2.4.4 下载其他模型

将 demo 文件夹下的 demo_cosyvoice.py 和 demo_sensevoice.py 复制到 LLMSpeech 根目录运行。
其中运行 demo_sensevoice.py 后会自动下载需要的模型。

### 2.5. 运行

运行 LLMSpeech.py 即可

#### 2.5.1 双击运行

如果是 Windows 10 系统需要去应用商店自行下载 Windows Terminal，Windows 11 自带 Windows Terminal。
双击 LLMSpeech.bat 即可运行。