# avatar_project **Repository Path**: wanghongshengxintong/avatar_project ## Basic Information - **Project Name**: avatar_project - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-26 - **Last Updated**: 2025-12-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 🎙️ Chatbox - Voice Assistant System A complete voice-based conversational AI system featuring Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), Dialog Management, Text-to-Speech (TTS), and Speaker Recognition. **All components run locally - no cloud dependencies!** 🔒 --- ## 🚀 Quick Start - Choose Your Platform This project is organized into **platform-specific folders** for easy installation: ### 📁 For Native Ubuntu/Linux ```bash cd native-ubuntu/ ./install.sh source venv/bin/activate ollama serve & python main.py ``` **➡️ See full instructions**: `native-ubuntu/README.md` ### 📁 For Windows (WSL2) ```bash cd wsl/ ./install-wsl.sh source venv/bin/activate ollama serve & python main.py ``` **➡️ See full instructions**: `wsl/README.md` > **💡 Note for WSL users**: Docker is NOT needed! Direct installation works better. --- ## 📂 Project Structure ``` chatbox/ ├── native-ubuntu/ ← For native Ubuntu/Linux systems │ ├── *.py (Core application) │ ├── install.sh (Installation script) │ ├── docker/ (Optional Docker support) │ └── README.md (Platform guide) │ ├── wsl/ ← For Windows Subsystem for Linux │ ├── *.py (Core application) │ ├── install-wsl.sh (WSL-specific installer) │ └── README.md (Platform guide) │ └── docs/ ← Documentation ├── README-full.md (Complete documentation) ├── WSL_GUIDE.md (Detailed WSL setup) ├── QUICK_START.md (Quick reference) ├── PLATFORM_FOLDERS.md (Folder organization) └── ... ``` **Quick guide**: `docs/FOLDER_STRUCTURE.txt` --- ## 🎯 Features - **🎤 Speech Recognition (ASR)**: Real-time speech-to-text using Vosk (offline) - **🧠 Natural Language Processing**: Powered by Ollama with Qwen 2.5 model - **🗣️ Text-to-Speech (TTS)**: Natural voice synthesis using Coqui TTS - **👥 Speaker Recognition**: Identify and verify speakers using SpeechBrain - **💬 Conversational AI**: Context-aware dialog management - **🐳 Docker Support**: Containerized deployment (native Ubuntu only) - **🪟 WSL Compatible**: Full support for Windows Subsystem for Linux - **🔒 Privacy-focused**: All processing happens on your machine - **🔧 Modular Architecture**: Each component can be used independently --- ## 📋 Prerequisites ### System Requirements - **OS**: Ubuntu 24.04 LTS or WSL2 with Ubuntu 24.04 - **Python**: 3.11 or 3.12 - **RAM**: 8GB minimum (16GB recommended) - **Storage**: ~10GB for models and dependencies - **Audio**: Microphone and speakers/headphones ### Quick Platform Check **Native Ubuntu?** → Use `native-ubuntu/` **Windows with WSL2?** → Use `wsl/` Not sure? See `docs/PLATFORM_FOLDERS.md` --- ## 🏗️ Architecture ``` ┌─────────────────────────────────────────────────────────┐ │ CHATBOX SYSTEM │ ├─────────────────────────────────────────────────────────┤ │ │ │ 🎤 Microphone │ │ ↓ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ ASR │ ──→ │ Dialog │ ──→ │ TTS │ │ │ │ (Vosk) │ │ Manager │ │ (Coqui) │ │ │ │ │ │ (Ollama) │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ ↑ ↓ │ │ ┌──────────┐ 🔊 Speakers │ │ │ Speaker │ │ │ │ ID │ │ │ └──────────┘ │ │ │ └─────────────────────────────────────────────────────────┘ ``` **Components:** - **ASR (Vosk)**: Converts speech to text offline - **Dialog Manager (Ollama + Qwen)**: Understands intent and generates responses - **TTS (Coqui)**: Converts text responses to natural speech - **Speaker Recognition (SpeechBrain)**: Identifies who is speaking --- ## 🎮 Usage Examples ### Basic Conversation ```bash python main.py # Start speaking - the system will respond! ``` ### Test Individual Modules ```bash python main.py --mode test-asr # Test microphone python main.py --mode test-tts # Test speakers python main.py --mode test-dm # Test AI responses ``` ### Speaker Recognition ```bash python main.py --mode enroll # Enroll a speaker python main.py --mode identify # Identify speaker ``` ### Voice Commands - Say "**exit**" or "**quit**" to stop - Say "**reset**" to clear conversation history --- ## 📚 Documentation ### Getting Started - **`native-ubuntu/README.md`** - Native Ubuntu quick start - **`wsl/README.md`** - WSL quick start - **`docs/QUICK_START.md`** - Quick reference for all platforms ### Comprehensive Guides - **`docs/README-full.md`** - Complete project documentation - **`docs/WSL_GUIDE.md`** - Detailed WSL setup and troubleshooting - **`docs/PLATFORM_FOLDERS.md`** - Folder organization explained - **`native-ubuntu/docker/DOCKER_GUIDE.md`** - Docker tutorial ### Quick References - **`docs/FOLDER_STRUCTURE.txt`** - Visual folder structure - **`docs/PROJECT_OVERVIEW.txt`** - System overview - **`docs/WSL_USERS_READ_THIS.txt`** - Simple WSL guide --- ## 🔧 Technology Stack | Component | Technology | Version | Purpose | |-----------|-----------|---------|---------| | ASR | Vosk | 0.3.45 | Offline speech recognition | | NLU/DM | Ollama + Qwen | 2.5 (7B) | Natural language understanding | | TTS | Coqui TTS | 0.22.0 | Text-to-speech synthesis | | Speaker ID | SpeechBrain | 0.5.16 | Speaker recognition | | Framework | Python | 3.11+ | Core application | | Container | Docker | Latest | Optional deployment (native only) | --- ## 💡 Key Differences: Native vs WSL | Aspect | native-ubuntu/ | wsl/ | |--------|----------------|------| | Installation | `install.sh` | `install-wsl.sh` | | Audio Setup | Automatic | PulseAudio bridge | | Docker | ✅ Included | ❌ Not needed | | Performance | 100% | ~95% | | Best For | Production | Development | --- ## ⚠️ Important Notes ### For WSL Users - ⚠️ Always use `install-wsl.sh` (NOT `install.sh`) - ⚠️ Audio requires PulseAudio setup (see `wsl/README.md`) - ⚠️ Keep files in `/home/` not `/mnt/c/` for best performance - 💡 Docker NOT needed - direct installation is better! ### For Native Ubuntu Users - ✅ Both direct and Docker installation work well - ✅ Audio works out of the box - ✅ Choose Docker for isolation, direct for performance --- ## 🔍 Troubleshooting ### Common Issues **Audio not working?** ```bash # Check audio devices arecord -l # Microphones aplay -l # Speakers # Test audio speaker-test -t wav ``` **Ollama not running?** ```bash # Start Ollama ollama serve & # Check connection curl http://localhost:11434/api/tags ``` **Model not found?** ```bash # Pull the model ollama pull qwen2.5:7b ``` For platform-specific troubleshooting: - Native Ubuntu → See `native-ubuntu/README.md` - WSL → See `wsl/README.md` or `docs/WSL_GUIDE.md` --- ## 📊 Installation Time & Storage | Item | Size | Time | |------|------|------| | Python dependencies | ~500 MB | 5 min | | Vosk model (small) | ~40 MB | 1 min | | Qwen 2.5 (7B) | ~4.7 GB | 5-15 min | | TTS & Speaker models | ~150 MB | 2 min | | **Total** | **~5-10 GB** | **15-25 min** | --- ## 🌟 Use Cases - Personal voice assistant - Voice-controlled applications - Accessibility tool for hands-free interaction - Language learning practice - Voice-based note-taking - Meeting assistant - Smart home integration - Research in conversational AI --- ## 🤝 Contributing This is a complete, ready-to-use system with: - ✅ Modular architecture for easy customization - ✅ Platform-specific installation scripts - ✅ Comprehensive documentation - ✅ Example usage for each module Feel free to: - Add support for more languages - Integrate additional TTS voices - Improve speaker recognition accuracy - Add more dialog management features --- ## 📞 Support **Need help?** 1. Check the README in your platform folder: - `native-ubuntu/README.md` - `wsl/README.md` 2. Review comprehensive guides: - `docs/README-full.md` - Complete documentation - `docs/WSL_GUIDE.md` - WSL troubleshooting - `docs/QUICK_START.md` - Quick reference 3. Check logs: - `chatbox.log` in your platform folder 4. Platform-specific resources: - [Vosk Documentation](https://alphacephei.com/vosk/) - [Ollama Documentation](https://ollama.com/docs) - [Coqui TTS](https://github.com/coqui-ai/TTS) - [SpeechBrain](https://speechbrain.github.io/) --- ## 📄 License This project uses open-source components: - Vosk: Apache 2.0 - Ollama: MIT - Coqui TTS: MPL 2.0 - SpeechBrain: Apache 2.0 --- ## 🎉 Ready to Start? 1. **Choose your platform folder**: `native-ubuntu/` or `wsl/` 2. **Read the README** in that folder 3. **Run the installation script** 4. **Start chatting!** ```bash # Example for Native Ubuntu cd native-ubuntu/ ./install.sh source venv/bin/activate ollama serve & python main.py ``` **Enjoy your voice assistant!** 🎙️✨