# omni-agent **Repository Path**: dong19960127/omni-agent ## Basic Information - **Project Name**: omni-agent - **Description**: Omni Agent — 全能本地智能体 🤖 零云端 · 零 API 密钥 · 数据完全本地核心能力： • 🗺️ 规划执行 — 复杂任务自动拆解，步骤级反思纠错 • 📚 RAG 知识库 — 本地文档一键索引，问答自动检索 • 🧠 混合记忆 — 短期对话 + 语义记忆 + 结构化事实 • 🔧 工具生态 — 文件/Shell/Python/搜索 + MCP 外部扩展 • 🛡️ 安全沙 - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-21 - **Last Updated**: 2026-04-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

# 🤖 Omni Agent **A fully local, extensible AI Agent with Planning, RAG, Hybrid Memory, and MCP.** [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![License: Apache-2.0](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0) [![Ollama](https://img.shields.io/badge/backend-Ollama-ff6f00?logo=ollama)](https://ollama.com) [![smolagents](https://img.shields.io/badge/built%20on-smolagents-yellow)](https://github.com/huggingface/smolagents) **Zero cloud. Zero API keys. Your data stays on your machine.** [Quick Start](#quick-start) • [Features](#features) • [Architecture](#architecture) • [Roadmap](#roadmap) • [Contributing](#contributing)

--- ## 🎯 What is Omni Agent? Omni Agent is a production-ready, **fully local** AI agent framework that runs entirely on your own hardware. No OpenAI API keys, no data leaving your machine. It combines the stability of [HuggingFace smolagents](https://github.com/huggingface/smolagents) with advanced capabilities usually found only in cloud-based agents: - 🗺️ **Plan-and-Execute** with step-level reflection and auto-retry - 📚 **RAG Knowledge Base** for document Q&A - 🧠 **Hybrid Memory** (semantic + structured + conversational) - 🔌 **MCP Ecosystem** — zero-code integration with external tools - 🛡️ **Enterprise-grade Safety Sandbox** > 💡 *Born from merging two real-world agent projects — one built on Qwen3.5 with advanced planning, another on Gemma 4 with deep RAG optimization. Omni Agent takes the best of both.* --- ## ✨ Features ### Core Capabilities | Feature | Description | |---------|-------------| | 🧠 **Plan-and-Execute** | Complex tasks are broken into multi-step plans. Each step is verified; failures trigger auto-retry or replanning. | | 🔍 **RAG Knowledge Base** | Index local documents (Markdown, Python, JSON, TXT) into ChromaDB for semantic retrieval during conversations. | | 🧬 **Hybrid Memory** | Three-layer memory: short-term sliding window, long-term semantic (ChromaDB), and structured facts (SQLite). | | 🔧 **Native Tools** | File I/O, shell commands, Python execution, web search — all with strict safety policies. | | 🔌 **MCP Bridge** | Dynamically discover and use tools from external [MCP servers](https://modelcontextprotocol.io) (filesystem, browser, databases). | | ⚡ **Async Execution** | Multiple tool calls in a single LLM response run concurrently to minimize latency. | | 🛡️ **Safety Sandbox** | Workspace isolation, command whitelist, dangerous pattern blocking, trash recovery, and auto-backup. | | 📊 **Execution Tracing** | Every plan, reflection, tool call, and LLM interaction is logged for debugging and optimization. | | 🎨 **Rich CLI** | Beautiful terminal UI with Markdown rendering, tables, and real-time status indicators. | ### Multi-Model Support Omni Agent supports multiple local models via Ollama. Switch with one environment variable: | Model | Size | VRAM | Best For | |-------|------|------|----------| | **Qwen3.5-9B** | ~6GB | 10GB+ | Speed, coding tasks, longer contexts | | **Gemma 4-26B** | ~17GB | 16GB+ | Quality, reasoning, complex planning | > Both models run entirely locally. No network required after initial download. --- ## 📸 Demo ### Plan-and-Execute in Action ``` You: Write a Python quicksort, save it to projects/quicksort.py, then run it ┏━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Step ┃ Tool ┃ Purpose ┃ ┣━━━━━━╋━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━┫ ┃ 1 ┃ write_file ┃ Create quicksort code┃ ┃ 2 ┃ read_file ┃ Verify file content ┃ ┃ 3 ┃ python_execute┃ Run and validate ┃ ┗━━━━━━┻━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━┛ Agent: ✅ Task complete! File saved to projects/quicksort.py and verified. Output: [1, 2, 3, 5, 8, 13, 21] ``` ### RAG Document Q&A ``` You: /index ~/documents/project_spec.md [RAG] Indexed /home/user/documents/project_spec.md (12 chunks) You: What are the hardware requirements? Agent: According to the project specification: - GPU: RTX 4090 Laptop (16GB VRAM) - CPU: Intel i9-13980HX - RAM: 32GB DDR5 ``` ### MCP Tool Discovery ``` You: /mcp Loaded MCP tools: filesystem_listDirectory, filesystem_readFile, sqlite_query, browser_navigate You: List the files in my workspace Agent: (calls filesystem_listDirectory via MCP) ``` --- ## 🚀 Quick Start ### Prerequisites - Python 3.10+ - [Ollama](https://ollama.com) installed and running - NVIDIA GPU with 10GB+ VRAM (CPU mode works but slower) ### 1. Install ```bash git clone https://github.com/yourname/omni-agent.git cd omni-agent pip install -e ".[dev]" ``` ### 2. Download a Model ```bash # Option A: Lightweight (recommended for first try) ollama pull qwen3.5:9b # Option B: High quality (requires 16GB VRAM) ollama pull gemma4:26b ``` ### 3. Launch ```bash omni-agent # or: python -m omni_agent.cli ``` You should see: ``` ╭──────────────────────────────────────────╮ │ 🤖 Omni Agent │ │ Model: qwen3.5:9b | Backend: Ollama │ │ Session: a1b2c3d4 │ │ Type /help for commands. │ ╰──────────────────────────────────────────╯ You: ``` ### 4. (Optional) Enable KV Cache Quantization For **16GB VRAM** users, enable KV cache quantization to fit longer contexts: ```bash # Using the built-in script ./scripts/ollama_optimize.sh start q8_0 8192 # Or manually export OLLAMA_KV_CACHE_TYPE=q8_0 export OLLAMA_NUM_PARALLEL=2 ollama serve ``` | KV Type | VRAM Saved | Speed Impact | Recommendation | |---------|-----------|--------------|----------------| | `f16` (default) | — | Baseline | 24GB+ VRAM | | `q8_0` | ~47% | Minimal | **16GB VRAM ⭐** | | `q4_0` | ~72% | Moderate | Tight VRAM budgets | --- ## 📖 Usage Guide ### CLI Commands During a conversation, type: | Command | Description | |---------|-------------| | `/help` | Show all commands | | `/clear` | Reset conversation context | | `/memory` | Show memory stats and remembered facts | | `/tools` | List available native tools | | `/mcp` | List loaded MCP external tools | | `/index ` | Index a file or directory into RAG | | `/rag ` | Search the knowledge base directly | | `/rag_stats` | Show RAG statistics | | `/trace` | Show recent execution traces | | `/reload` | Reload agent (detect new Ollama instance) | | `exit` / `quit` | Stop the agent | ### Switching Models ```bash # Use Gemma 4 (high quality, slower) export OMNIA_MODEL=gemma4-26b omni-agent # Use Qwen 3.5 (faster, lighter) export OMNIA_MODEL=qwen3.5-9b omni-agent ``` Or edit `configs/models.yaml` to add your own models. ### Connecting MCP Servers Create `~/omni-agent-workspace/mcp_config.json`: ```json { "mcpServers": { "filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/docs"] }, "sqlite": { "command": "uvx", "args": ["mcp-server-sqlite", "--db-path", "/home/user/data.db"] }, "puppeteer": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-puppeteer"] } } } ``` Restart Omni Agent — the tools will be auto-discovered and prefixed with the server name (e.g., `filesystem_readFile`). ### Workspace Safety Rules - **Read scope**: Only files inside `~/omni-agent-workspace/` - **Write backup**: Overwriting a file automatically creates `.backup.` - **Delete protection**: `rm` moves files to `~/omni-agent-workspace/.trash/` instead of deleting - **Sensitive paths**: `.ssh/`, `.env`, `token`, `secret` patterns are blocked - **Shell whitelist**: Only 40+ safe commands allowed (`ls`, `cat`, `git`, `python3`, etc.) - **Dangerous patterns**: `sudo`, `rm -rf /`, `fork bomb`, pipe-to-shell blocked automatically --- ## 🏗️ Architecture ``` ┌─────────────────────────────────────────────────────────────────────┐ │ User Interfaces │ │ CLI (Rich) → Web UI (planned) │ └──────────────────────────────────┬──────────────────────────────────┘ │ ┌──────────────────────────────────▼──────────────────────────────────┐ │ OmniAgent (Orchestrator) │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌────────────┐ │ │ │ Planner │ │ RAG │ │ Memory │ │ Tools │ │ │ │ + Reflection│ │ (ChromaDB) │ │ (Hybrid) │ │Native + MCP│ │ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └─────┬──────┘ │ │ └─────────────────┴─────────────────┴───────────────┘ │ │ │ │ │ ┌──────────────────────────┼──────────────────────────┐ │ │ │ │ │ │ │ ┌────▼────┐ ┌──────▼──────┐ ┌────────▼─────┐│ │ │ Plan │ │ smolagents │ │ Sandbox ││ │ │ Loop │◄────────────►│ ToolCalling │ │ + Security ││ │ │ │ │ Agent │ │ ││ │ └────┬────┘ └──────┬──────┘ └──────────────┘│ │ │ │ │ │ ┌────▼──────────────────────────▼────┐ │ │ │ Ollama LLM │ │ │ │ (Qwen3.5-9B / Gemma4-26B) │ │ │ └────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` **Design Philosophy:** - **smolagents** handles the battle-tested ReAct tool-calling loop - **Custom modules** add differentiation: planning, reflection, RAG, memory, safety - **MCP protocol** ensures the tool ecosystem can grow without code changes Read the full architecture doc: [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) --- ## 🛡️ Security Omni Agent is designed with a **security-first** mindset for local execution: | Layer | Protection | |-------|------------| | **Filesystem** | Path jailing to workspace + sensitive path regex blocking | | **Shell** | Whitelist (40+ commands) + dangerous pattern regex | | **Python** | Subprocess isolation, timeout, no network/exec by default | | **Recovery** | Trash bin for `rm`, automatic backups for overwrites | | **Audit** | Every operation logged to `.logs/security.log` | | **Approval** | Optional HITL (Human-in-the-Loop) for write/shell/python | --- ## 📊 Performance Tested on RTX 4090 Laptop (16GB VRAM): | Task | Qwen3.5-9B | Gemma4-26B | |------|-----------|-----------| | Simple Q&A | 3-5s | 8-12s | | File write + verify | 5-10s | 10-20s | | Web search + summarize | 15-25s | 20-35s | | Multi-step plan (3 steps) | 20-40s | 45-90s | KV Cache `q8_0` reduces VRAM usage by ~47% with minimal speed loss, enabling 8192-token contexts on 16GB cards. See [`docs/ROADMAP.md`](docs/ROADMAP.md#performance-benchmarks) for detailed benchmark methodology. --- ## 🗺️ Roadmap ### Phase 1: Foundation ✅ *(Current)* - [x] Multi-model Ollama backend - [x] Plan-and-Execute + Reflection - [x] Hybrid Memory (short / vector / structured) - [x] RAG with ChromaDB - [x] MCP bridge - [x] Safety sandbox + audit logs - [x] Execution tracing - [x] Rich CLI ### Phase 2: Developer Experience *(Next)* - [ ] Web UI (Gradio) - [ ] REST API (FastAPI) - [ ] Streaming responses - [ ] Plugin system for custom tools - [ ] Docker image ### Phase 3: Advanced Capabilities - [ ] Multi-agent collaboration - [ ] Hierarchical planning (sub-tasks) - [ ] Self-improvement from traces - [ ] Browser automation - [ ] API tool (OpenAPI spec ingestion) ### Phase 4: Production - [ ] 80%+ test coverage - [ ] CI/CD (GitHub Actions) - [ ] Documentation site (MkDocs) - [ ] Standard agent evals benchmark See the full roadmap: [`docs/ROADMAP.md`](docs/ROADMAP.md) --- ## 🤝 Contributing We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details. **Quick start for contributors:** ```bash # 1. Fork and clone git clone https://github.com/yourname/omni-agent.git cd omni-agent # 2. Install in development mode pip install -e ".[dev]" # 3. Run tests make test # 4. Lint and format make lint make format ``` ### Areas We Need Help - 🧪 **Testing** — Expand pytest coverage (currently ~20%) - 🎨 **Web UI** — Gradio/Streamlit interface - 🌍 **Localization** — i18n support (currently Chinese/English mixed) - 📖 **Documentation** — Tutorials, API reference, video demos - 🔌 **MCP Servers** — Pre-built integrations for popular tools --- ## 📜 License Omni Agent is licensed under the **Apache License 2.0**. ``` Copyright 2025 Omni Agent Contributors Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 ``` The underlying models (Qwen3.5, Gemma 4) follow their respective licenses. --- ## 🙏 Acknowledgements - [HuggingFace smolagents](https://github.com/huggingface/smolagents) — Agent framework foundation - [Ollama](https://ollama.com) — Local LLM serving - [Qwen](https://github.com/QwenLM/Qwen) & [Gemma](https://ai.google.dev/gemma) — Base models - [ChromaDB](https://www.trychroma.com/) — Vector database - [BAAI](https://github.com/FlagOpen/FlagEmbedding) — bge embedding models - [Model Context Protocol](https://modelcontextprotocol.io) — Open tool standard ---

**⭐ Star this repo if you find it useful!** **[Report Bug](https://github.com/yourname/omni-agent/issues)** • **[Request Feature](https://github.com/yourname/omni-agent/issues)**