# Kotai-Kyutai-livekit **Repository Path**: mirrors_lepy/Kotai-Kyutai-livekit ## Basic Information - **Project Name**: Kotai-Kyutai-livekit - **Description**: Kotai is a fully local, zero-cost voice assistant that combines the power of Kyutai TTS/STT, LiveKit, and local LLMs to create natural conversational experiences. - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-02 - **Last Updated**: 2026-04-26 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Kotai - Local Voice Assistant with Kyutai TTS/STT and LiveKit [![Watch the Demo](https://img.youtube.com/vi/PHFrchtDIoE/0.jpg)](https://youtu.be/PHFrchtDIoE) ## 🔊 Overview Kotai is a fully local, zero-cost voice assistant that combines the power of Kyutai TTS/STT, LiveKit, and local LLMs to create natural conversational experiences. This project eliminates the need for cloud-based API services by integrating: - **Kyutai TTS** for high-quality speech synthesis - **Kyutai STT** for accurate speech-to-text conversion - **LiveKit** for real-time voice communication - **Ollama** for running local large language models (Gemma3n) The result is a voice assistant with natural speech capabilities and intelligent conversation management - all running completely on your local machine. ## ✨ Features - 🎯 **100% Local** - No API costs or cloud dependencies - 🗣️ **Natural Speech** - High-quality voice synthesis with Kyutai TTS - 🎙️ **Real-time Conversation** - Fluid interaction through LiveKit - 🧠 **Local LLM Integration** - Uses Ollama to run Gemma3n locally - 👂 **Advanced Speech Recognition** - Fast local transcription with Kyutai STT - 🤖 **Optimized System Prompt** - Designed specifically for smaller LLMs - 🌍 **Bilingual Support** - English and French language support - 💬 **Conversation Management** - Handles errors, silence, and emotional intelligence ## 📋 Prerequisites Before running Kotai, you’ll need: - Python 3.12 - Kyutai TTS server running on `http://localhost:8000/v1` - Repo: https://github.com/dwain-barnes/kyutai-tts-openai-api - Kyutai STT server running on `http://localhost:8080/v1` - Repo: https://github.com/dwain-barnes/kyutai-stt-openai-api - Ollama installed with the Gemma3n model - LiveKit server access ## 🚀 Installation See youtube video Create a `.env.local` file with your configuration: ```env # Your LiveKit configuration LIVEKIT_URL=your_livekit_url LIVEKIT_API_KEY=your_api_key LIVEKIT_API_SECRET=your_api_secret ``` ## 💬 Usage 1. Make sure the Kyutai TTS server is running (default: `http://localhost:8000/v1`) 1. Make sure the Kyutai STT server is running (default: `http://localhost:8080/v1`) 1. Make sure Ollama is running with the Gemma3n model loaded: ```bash ollama run gemma3n:latest ``` 1. Run the voice assistant: ```bash python agent.py ``` 1. Connect to the LiveKit room and start interacting with Kotai ## 🔧 How It Works The system consists of several integrated components: - **Speech-to-Text (STT)**: Uses Kyutai STT for local transcription - **Language Model**: Connects to a local Ollama instance running Gemma3n - **Text-to-Speech (TTS)**: Kyutai TTS for natural speech synthesis - **Voice Pipeline**: Handles the flow between components via LiveKit - **Optimized Prompt System**: Comprehensive system prompt designed for smaller LLMs Kotai features an intelligent conversation system that: - Handles speech transcription errors gracefully - Manages conversation flow and silence - Adapts to user energy levels and preferences - Provides emotional intelligence and support ## 🔄 Customization You can modify the `agent.py` file to: - **Change the voice** by editing the `voice` parameter in the TTS setup - **Modify the personality** by editing the system prompt classes - **Adjust the LLM model** by changing the Ollama model name in `get_readable_llm_name()` - **Configure different endpoints** for any of the services - **Switch languages** by modifying the `LanguageCode` settings ## 📝 Code Explanation The main workflow in `agent.py` dev: ```python # 1) Speech-to-Text with Kyutai STT stt=openai.STT( base_url="http://localhost:8080/v1", api_key="dummy_key", model="whisper-1", language="en", ) # 2) Language Model from Ollama llm = openai.LLM.with_ollama(model="gemma3n:latest") # 3) Text-to-Speech using Kyutai TTS tts=openai.TTS.create_kyutai_client( model="tts-1", voice="nova", speed=1.1, base_url="http://localhost:8000/v1" ) # 4) Create Agent with optimized system prompt class MyAgent(Agent): def __init__(self) -> None: prompt_generator = SmalltalkInstructions() system_prompt = prompt_generator.make_system_prompt() super().__init__(instructions=system_prompt) ``` ## 🤖 System Prompt Features Kotai includes a sophisticated system prompt optimized for smaller LLMs: - **Personality**: Helpful, curious, genuine, and slightly playful - **Conversation Management**: Handles stuck conversations, emotional support, and knowledge gaps - **Speech Error Handling**: Gracefully manages transcription mistakes - **Conversation Depth**: Adapts to user preferences for light or deep discussion - **Bilingual Support**: Seamless English/French switching - **Topic Boundaries**: Thoughtful handling of sensitive subjects ## 📜 License This project is licensed under the MIT License - see the LICENSE file for details. ## 🙏 Acknowledgments - [Kyutai] for the excellent TTS and STT models - [LiveKit]for the real-time communication platform - [Ollama]