# avatar_project

**Repository Path**: wanghongshengxintong/avatar_project

## Basic Information

- **Project Name**: avatar_project
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-11-26
- **Last Updated**: 2025-12-02

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 🎙️ Chatbox - Voice Assistant System

A complete voice-based conversational AI system featuring Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), Dialog Management, Text-to-Speech (TTS), and Speaker Recognition.

**All components run locally - no cloud dependencies!** 🔒

---

## 🚀 Quick Start - Choose Your Platform

This project is organized into **platform-specific folders** for easy installation:

### 📁 For Native Ubuntu/Linux
```bash
cd native-ubuntu/
./install.sh
source venv/bin/activate
ollama serve &
python main.py
```
**➡️ See full instructions**: `native-ubuntu/README.md`

### 📁 For Windows (WSL2)
```bash
cd wsl/
./install-wsl.sh
source venv/bin/activate
ollama serve &
python main.py
```
**➡️ See full instructions**: `wsl/README.md`

> **💡 Note for WSL users**: Docker is NOT needed! Direct installation works better.

---

## 📂 Project Structure

```
chatbox/
├── native-ubuntu/      ← For native Ubuntu/Linux systems
│   ├── *.py           (Core application)
│   ├── install.sh     (Installation script)
│   ├── docker/        (Optional Docker support)
│   └── README.md      (Platform guide)
│
├── wsl/               ← For Windows Subsystem for Linux
│   ├── *.py           (Core application)
│   ├── install-wsl.sh (WSL-specific installer)
│   └── README.md      (Platform guide)
│
└── docs/              ← Documentation
    ├── README-full.md          (Complete documentation)
    ├── WSL_GUIDE.md            (Detailed WSL setup)
    ├── QUICK_START.md          (Quick reference)
    ├── PLATFORM_FOLDERS.md     (Folder organization)
    └── ...
```

**Quick guide**: `docs/FOLDER_STRUCTURE.txt`

---

## 🎯 Features

- **🎤 Speech Recognition (ASR)**: Real-time speech-to-text using Vosk (offline)
- **🧠 Natural Language Processing**: Powered by Ollama with Qwen 2.5 model
- **🗣️ Text-to-Speech (TTS)**: Natural voice synthesis using Coqui TTS
- **👥 Speaker Recognition**: Identify and verify speakers using SpeechBrain
- **💬 Conversational AI**: Context-aware dialog management
- **🐳 Docker Support**: Containerized deployment (native Ubuntu only)
- **🪟 WSL Compatible**: Full support for Windows Subsystem for Linux
- **🔒 Privacy-focused**: All processing happens on your machine
- **🔧 Modular Architecture**: Each component can be used independently

---

## 📋 Prerequisites

### System Requirements
- **OS**: Ubuntu 24.04 LTS or WSL2 with Ubuntu 24.04
- **Python**: 3.11 or 3.12
- **RAM**: 8GB minimum (16GB recommended)
- **Storage**: ~10GB for models and dependencies
- **Audio**: Microphone and speakers/headphones

### Quick Platform Check

**Native Ubuntu?** → Use `native-ubuntu/`  
**Windows with WSL2?** → Use `wsl/`

Not sure? See `docs/PLATFORM_FOLDERS.md`

---

## 🏗️ Architecture

```
┌─────────────────────────────────────────────────────────┐
│                   CHATBOX SYSTEM                         │
├─────────────────────────────────────────────────────────┤
│                                                           │
│  🎤 Microphone                                           │
│       ↓                                                  │
│  ┌──────────┐     ┌──────────┐     ┌──────────┐       │
│  │   ASR    │ ──→ │  Dialog  │ ──→ │   TTS    │       │
│  │  (Vosk)  │     │ Manager  │     │ (Coqui)  │       │
│  │          │     │ (Ollama) │     │          │       │
│  └──────────┘     └──────────┘     └──────────┘       │
│       ↑                                    ↓            │
│  ┌──────────┐                         🔊 Speakers      │
│  │ Speaker  │                                           │
│  │   ID     │                                           │
│  └──────────┘                                           │
│                                                           │
└─────────────────────────────────────────────────────────┘
```

**Components:**
- **ASR (Vosk)**: Converts speech to text offline
- **Dialog Manager (Ollama + Qwen)**: Understands intent and generates responses
- **TTS (Coqui)**: Converts text responses to natural speech
- **Speaker Recognition (SpeechBrain)**: Identifies who is speaking

---

## 🎮 Usage Examples

### Basic Conversation
```bash
python main.py
# Start speaking - the system will respond!
```

### Test Individual Modules
```bash
python main.py --mode test-asr     # Test microphone
python main.py --mode test-tts     # Test speakers
python main.py --mode test-dm      # Test AI responses
```

### Speaker Recognition
```bash
python main.py --mode enroll       # Enroll a speaker
python main.py --mode identify     # Identify speaker
```

### Voice Commands
- Say "**exit**" or "**quit**" to stop
- Say "**reset**" to clear conversation history

---

## 📚 Documentation

### Getting Started
- **`native-ubuntu/README.md`** - Native Ubuntu quick start
- **`wsl/README.md`** - WSL quick start
- **`docs/QUICK_START.md`** - Quick reference for all platforms

### Comprehensive Guides
- **`docs/README-full.md`** - Complete project documentation
- **`docs/WSL_GUIDE.md`** - Detailed WSL setup and troubleshooting
- **`docs/PLATFORM_FOLDERS.md`** - Folder organization explained
- **`native-ubuntu/docker/DOCKER_GUIDE.md`** - Docker tutorial

### Quick References
- **`docs/FOLDER_STRUCTURE.txt`** - Visual folder structure
- **`docs/PROJECT_OVERVIEW.txt`** - System overview
- **`docs/WSL_USERS_READ_THIS.txt`** - Simple WSL guide

---

## 🔧 Technology Stack

| Component | Technology | Version | Purpose |
|-----------|-----------|---------|---------|
| ASR | Vosk | 0.3.45 | Offline speech recognition |
| NLU/DM | Ollama + Qwen | 2.5 (7B) | Natural language understanding |
| TTS | Coqui TTS | 0.22.0 | Text-to-speech synthesis |
| Speaker ID | SpeechBrain | 0.5.16 | Speaker recognition |
| Framework | Python | 3.11+ | Core application |
| Container | Docker | Latest | Optional deployment (native only) |

---

## 💡 Key Differences: Native vs WSL

| Aspect | native-ubuntu/ | wsl/ |
|--------|----------------|------|
| Installation | `install.sh` | `install-wsl.sh` |
| Audio Setup | Automatic | PulseAudio bridge |
| Docker | ✅ Included | ❌ Not needed |
| Performance | 100% | ~95% |
| Best For | Production | Development |

---

## ⚠️ Important Notes

### For WSL Users
- ⚠️ Always use `install-wsl.sh` (NOT `install.sh`)
- ⚠️ Audio requires PulseAudio setup (see `wsl/README.md`)
- ⚠️ Keep files in `/home/` not `/mnt/c/` for best performance
- 💡 Docker NOT needed - direct installation is better!

### For Native Ubuntu Users
- ✅ Both direct and Docker installation work well
- ✅ Audio works out of the box
- ✅ Choose Docker for isolation, direct for performance

---

## 🔍 Troubleshooting

### Common Issues

**Audio not working?**
```bash
# Check audio devices
arecord -l    # Microphones
aplay -l      # Speakers

# Test audio
speaker-test -t wav
```

**Ollama not running?**
```bash
# Start Ollama
ollama serve &

# Check connection
curl http://localhost:11434/api/tags
```

**Model not found?**
```bash
# Pull the model
ollama pull qwen2.5:7b
```

For platform-specific troubleshooting:
- Native Ubuntu → See `native-ubuntu/README.md`
- WSL → See `wsl/README.md` or `docs/WSL_GUIDE.md`

---

## 📊 Installation Time & Storage

| Item | Size | Time |
|------|------|------|
| Python dependencies | ~500 MB | 5 min |
| Vosk model (small) | ~40 MB | 1 min |
| Qwen 2.5 (7B) | ~4.7 GB | 5-15 min |
| TTS & Speaker models | ~150 MB | 2 min |
| **Total** | **~5-10 GB** | **15-25 min** |

---

## 🌟 Use Cases

- Personal voice assistant
- Voice-controlled applications
- Accessibility tool for hands-free interaction
- Language learning practice
- Voice-based note-taking
- Meeting assistant
- Smart home integration
- Research in conversational AI

---

## 🤝 Contributing

This is a complete, ready-to-use system with:
- ✅ Modular architecture for easy customization
- ✅ Platform-specific installation scripts
- ✅ Comprehensive documentation
- ✅ Example usage for each module

Feel free to:
- Add support for more languages
- Integrate additional TTS voices
- Improve speaker recognition accuracy
- Add more dialog management features

---

## 📞 Support

**Need help?**

1. Check the README in your platform folder:
   - `native-ubuntu/README.md`
   - `wsl/README.md`

2. Review comprehensive guides:
   - `docs/README-full.md` - Complete documentation
   - `docs/WSL_GUIDE.md` - WSL troubleshooting
   - `docs/QUICK_START.md` - Quick reference

3. Check logs:
   - `chatbox.log` in your platform folder

4. Platform-specific resources:
   - [Vosk Documentation](https://alphacephei.com/vosk/)
   - [Ollama Documentation](https://ollama.com/docs)
   - [Coqui TTS](https://github.com/coqui-ai/TTS)
   - [SpeechBrain](https://speechbrain.github.io/)

---

## 📄 License

This project uses open-source components:
- Vosk: Apache 2.0
- Ollama: MIT
- Coqui TTS: MPL 2.0
- SpeechBrain: Apache 2.0

---

## 🎉 Ready to Start?

1. **Choose your platform folder**: `native-ubuntu/` or `wsl/`
2. **Read the README** in that folder
3. **Run the installation script**
4. **Start chatting!**

```bash
# Example for Native Ubuntu
cd native-ubuntu/
./install.sh
source venv/bin/activate
ollama serve &
python main.py
```

**Enjoy your voice assistant!** 🎙️✨