# omni-agent
**Repository Path**: dong19960127/omni-agent
## Basic Information
- **Project Name**: omni-agent
- **Description**: Omni Agent — 全能本地智能体
🤖 零云端 · 零 API 密钥 · 数据完全本地
核心能力:
• 🗺️ 规划执行 — 复杂任务自动拆解,步骤级反思纠错
• 📚 RAG 知识库 — 本地文档一键索引,问答自动检索
• 🧠 混合记忆 — 短期对话 + 语义记忆 + 结构化事实
• 🔧 工具生态 — 文件/Shell/Python/搜索 + MCP 外部扩展
• 🛡️ 安全沙
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-04-21
- **Last Updated**: 2026-04-21
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# 🤖 Omni Agent
**A fully local, extensible AI Agent with Planning, RAG, Hybrid Memory, and MCP.**
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/Apache-2.0)
[](https://ollama.com)
[](https://github.com/huggingface/smolagents)
**Zero cloud. Zero API keys. Your data stays on your machine.**
[Quick Start](#quick-start) • [Features](#features) • [Architecture](#architecture) • [Roadmap](#roadmap) • [Contributing](#contributing)
---
## 🎯 What is Omni Agent?
Omni Agent is a production-ready, **fully local** AI agent framework that runs entirely on your own hardware. No OpenAI API keys, no data leaving your machine.
It combines the stability of [HuggingFace smolagents](https://github.com/huggingface/smolagents) with advanced capabilities usually found only in cloud-based agents:
- 🗺️ **Plan-and-Execute** with step-level reflection and auto-retry
- 📚 **RAG Knowledge Base** for document Q&A
- 🧠 **Hybrid Memory** (semantic + structured + conversational)
- 🔌 **MCP Ecosystem** — zero-code integration with external tools
- 🛡️ **Enterprise-grade Safety Sandbox**
> 💡 *Born from merging two real-world agent projects — one built on Qwen3.5 with advanced planning, another on Gemma 4 with deep RAG optimization. Omni Agent takes the best of both.*
---
## ✨ Features
### Core Capabilities
| Feature | Description |
|---------|-------------|
| 🧠 **Plan-and-Execute** | Complex tasks are broken into multi-step plans. Each step is verified; failures trigger auto-retry or replanning. |
| 🔍 **RAG Knowledge Base** | Index local documents (Markdown, Python, JSON, TXT) into ChromaDB for semantic retrieval during conversations. |
| 🧬 **Hybrid Memory** | Three-layer memory: short-term sliding window, long-term semantic (ChromaDB), and structured facts (SQLite). |
| 🔧 **Native Tools** | File I/O, shell commands, Python execution, web search — all with strict safety policies. |
| 🔌 **MCP Bridge** | Dynamically discover and use tools from external [MCP servers](https://modelcontextprotocol.io) (filesystem, browser, databases). |
| ⚡ **Async Execution** | Multiple tool calls in a single LLM response run concurrently to minimize latency. |
| 🛡️ **Safety Sandbox** | Workspace isolation, command whitelist, dangerous pattern blocking, trash recovery, and auto-backup. |
| 📊 **Execution Tracing** | Every plan, reflection, tool call, and LLM interaction is logged for debugging and optimization. |
| 🎨 **Rich CLI** | Beautiful terminal UI with Markdown rendering, tables, and real-time status indicators. |
### Multi-Model Support
Omni Agent supports multiple local models via Ollama. Switch with one environment variable:
| Model | Size | VRAM | Best For |
|-------|------|------|----------|
| **Qwen3.5-9B** | ~6GB | 10GB+ | Speed, coding tasks, longer contexts |
| **Gemma 4-26B** | ~17GB | 16GB+ | Quality, reasoning, complex planning |
> Both models run entirely locally. No network required after initial download.
---
## 📸 Demo
### Plan-and-Execute in Action
```
You: Write a Python quicksort, save it to projects/quicksort.py, then run it
┏━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ Step ┃ Tool ┃ Purpose ┃
┣━━━━━━╋━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━━━┫
┃ 1 ┃ write_file ┃ Create quicksort code┃
┃ 2 ┃ read_file ┃ Verify file content ┃
┃ 3 ┃ python_execute┃ Run and validate ┃
┗━━━━━━┻━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━━━┛
Agent: ✅ Task complete! File saved to projects/quicksort.py and verified.
Output: [1, 2, 3, 5, 8, 13, 21]
```
### RAG Document Q&A
```
You: /index ~/documents/project_spec.md
[RAG] Indexed /home/user/documents/project_spec.md (12 chunks)
You: What are the hardware requirements?
Agent: According to the project specification:
- GPU: RTX 4090 Laptop (16GB VRAM)
- CPU: Intel i9-13980HX
- RAM: 32GB DDR5
```
### MCP Tool Discovery
```
You: /mcp
Loaded MCP tools: filesystem_listDirectory, filesystem_readFile,
sqlite_query, browser_navigate
You: List the files in my workspace
Agent: (calls filesystem_listDirectory via MCP)
```
---
## 🚀 Quick Start
### Prerequisites
- Python 3.10+
- [Ollama](https://ollama.com) installed and running
- NVIDIA GPU with 10GB+ VRAM (CPU mode works but slower)
### 1. Install
```bash
git clone https://github.com/yourname/omni-agent.git
cd omni-agent
pip install -e ".[dev]"
```
### 2. Download a Model
```bash
# Option A: Lightweight (recommended for first try)
ollama pull qwen3.5:9b
# Option B: High quality (requires 16GB VRAM)
ollama pull gemma4:26b
```
### 3. Launch
```bash
omni-agent
# or: python -m omni_agent.cli
```
You should see:
```
╭──────────────────────────────────────────╮
│ 🤖 Omni Agent │
│ Model: qwen3.5:9b | Backend: Ollama │
│ Session: a1b2c3d4 │
│ Type /help for commands. │
╰──────────────────────────────────────────╯
You:
```
### 4. (Optional) Enable KV Cache Quantization
For **16GB VRAM** users, enable KV cache quantization to fit longer contexts:
```bash
# Using the built-in script
./scripts/ollama_optimize.sh start q8_0 8192
# Or manually
export OLLAMA_KV_CACHE_TYPE=q8_0
export OLLAMA_NUM_PARALLEL=2
ollama serve
```
| KV Type | VRAM Saved | Speed Impact | Recommendation |
|---------|-----------|--------------|----------------|
| `f16` (default) | — | Baseline | 24GB+ VRAM |
| `q8_0` | ~47% | Minimal | **16GB VRAM ⭐** |
| `q4_0` | ~72% | Moderate | Tight VRAM budgets |
---
## 📖 Usage Guide
### CLI Commands
During a conversation, type:
| Command | Description |
|---------|-------------|
| `/help` | Show all commands |
| `/clear` | Reset conversation context |
| `/memory` | Show memory stats and remembered facts |
| `/tools` | List available native tools |
| `/mcp` | List loaded MCP external tools |
| `/index ` | Index a file or directory into RAG |
| `/rag ` | Search the knowledge base directly |
| `/rag_stats` | Show RAG statistics |
| `/trace` | Show recent execution traces |
| `/reload` | Reload agent (detect new Ollama instance) |
| `exit` / `quit` | Stop the agent |
### Switching Models
```bash
# Use Gemma 4 (high quality, slower)
export OMNIA_MODEL=gemma4-26b
omni-agent
# Use Qwen 3.5 (faster, lighter)
export OMNIA_MODEL=qwen3.5-9b
omni-agent
```
Or edit `configs/models.yaml` to add your own models.
### Connecting MCP Servers
Create `~/omni-agent-workspace/mcp_config.json`:
```json
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/docs"]
},
"sqlite": {
"command": "uvx",
"args": ["mcp-server-sqlite", "--db-path", "/home/user/data.db"]
},
"puppeteer": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-puppeteer"]
}
}
}
```
Restart Omni Agent — the tools will be auto-discovered and prefixed with the server name (e.g., `filesystem_readFile`).
### Workspace Safety Rules
- **Read scope**: Only files inside `~/omni-agent-workspace/`
- **Write backup**: Overwriting a file automatically creates `.backup.`
- **Delete protection**: `rm` moves files to `~/omni-agent-workspace/.trash/` instead of deleting
- **Sensitive paths**: `.ssh/`, `.env`, `token`, `secret` patterns are blocked
- **Shell whitelist**: Only 40+ safe commands allowed (`ls`, `cat`, `git`, `python3`, etc.)
- **Dangerous patterns**: `sudo`, `rm -rf /`, `fork bomb`, pipe-to-shell blocked automatically
---
## 🏗️ Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ User Interfaces │
│ CLI (Rich) → Web UI (planned) │
└──────────────────────────────────┬──────────────────────────────────┘
│
┌──────────────────────────────────▼──────────────────────────────────┐
│ OmniAgent (Orchestrator) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌────────────┐ │
│ │ Planner │ │ RAG │ │ Memory │ │ Tools │ │
│ │ + Reflection│ │ (ChromaDB) │ │ (Hybrid) │ │Native + MCP│ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └─────┬──────┘ │
│ └─────────────────┴─────────────────┴───────────────┘ │
│ │ │
│ ┌──────────────────────────┼──────────────────────────┐ │
│ │ │ │ │
│ ┌────▼────┐ ┌──────▼──────┐ ┌────────▼─────┐│
│ │ Plan │ │ smolagents │ │ Sandbox ││
│ │ Loop │◄────────────►│ ToolCalling │ │ + Security ││
│ │ │ │ Agent │ │ ││
│ └────┬────┘ └──────┬──────┘ └──────────────┘│
│ │ │ │
│ ┌────▼──────────────────────────▼────┐ │
│ │ Ollama LLM │ │
│ │ (Qwen3.5-9B / Gemma4-26B) │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
**Design Philosophy:**
- **smolagents** handles the battle-tested ReAct tool-calling loop
- **Custom modules** add differentiation: planning, reflection, RAG, memory, safety
- **MCP protocol** ensures the tool ecosystem can grow without code changes
Read the full architecture doc: [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md)
---
## 🛡️ Security
Omni Agent is designed with a **security-first** mindset for local execution:
| Layer | Protection |
|-------|------------|
| **Filesystem** | Path jailing to workspace + sensitive path regex blocking |
| **Shell** | Whitelist (40+ commands) + dangerous pattern regex |
| **Python** | Subprocess isolation, timeout, no network/exec by default |
| **Recovery** | Trash bin for `rm`, automatic backups for overwrites |
| **Audit** | Every operation logged to `.logs/security.log` |
| **Approval** | Optional HITL (Human-in-the-Loop) for write/shell/python |
---
## 📊 Performance
Tested on RTX 4090 Laptop (16GB VRAM):
| Task | Qwen3.5-9B | Gemma4-26B |
|------|-----------|-----------|
| Simple Q&A | 3-5s | 8-12s |
| File write + verify | 5-10s | 10-20s |
| Web search + summarize | 15-25s | 20-35s |
| Multi-step plan (3 steps) | 20-40s | 45-90s |
KV Cache `q8_0` reduces VRAM usage by ~47% with minimal speed loss, enabling 8192-token contexts on 16GB cards.
See [`docs/ROADMAP.md`](docs/ROADMAP.md#performance-benchmarks) for detailed benchmark methodology.
---
## 🗺️ Roadmap
### Phase 1: Foundation ✅ *(Current)*
- [x] Multi-model Ollama backend
- [x] Plan-and-Execute + Reflection
- [x] Hybrid Memory (short / vector / structured)
- [x] RAG with ChromaDB
- [x] MCP bridge
- [x] Safety sandbox + audit logs
- [x] Execution tracing
- [x] Rich CLI
### Phase 2: Developer Experience *(Next)*
- [ ] Web UI (Gradio)
- [ ] REST API (FastAPI)
- [ ] Streaming responses
- [ ] Plugin system for custom tools
- [ ] Docker image
### Phase 3: Advanced Capabilities
- [ ] Multi-agent collaboration
- [ ] Hierarchical planning (sub-tasks)
- [ ] Self-improvement from traces
- [ ] Browser automation
- [ ] API tool (OpenAPI spec ingestion)
### Phase 4: Production
- [ ] 80%+ test coverage
- [ ] CI/CD (GitHub Actions)
- [ ] Documentation site (MkDocs)
- [ ] Standard agent evals benchmark
See the full roadmap: [`docs/ROADMAP.md`](docs/ROADMAP.md)
---
## 🤝 Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
**Quick start for contributors:**
```bash
# 1. Fork and clone
git clone https://github.com/yourname/omni-agent.git
cd omni-agent
# 2. Install in development mode
pip install -e ".[dev]"
# 3. Run tests
make test
# 4. Lint and format
make lint
make format
```
### Areas We Need Help
- 🧪 **Testing** — Expand pytest coverage (currently ~20%)
- 🎨 **Web UI** — Gradio/Streamlit interface
- 🌍 **Localization** — i18n support (currently Chinese/English mixed)
- 📖 **Documentation** — Tutorials, API reference, video demos
- 🔌 **MCP Servers** — Pre-built integrations for popular tools
---
## 📜 License
Omni Agent is licensed under the **Apache License 2.0**.
```
Copyright 2025 Omni Agent Contributors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
```
The underlying models (Qwen3.5, Gemma 4) follow their respective licenses.
---
## 🙏 Acknowledgements
- [HuggingFace smolagents](https://github.com/huggingface/smolagents) — Agent framework foundation
- [Ollama](https://ollama.com) — Local LLM serving
- [Qwen](https://github.com/QwenLM/Qwen) & [Gemma](https://ai.google.dev/gemma) — Base models
- [ChromaDB](https://www.trychroma.com/) — Vector database
- [BAAI](https://github.com/FlagOpen/FlagEmbedding) — bge embedding models
- [Model Context Protocol](https://modelcontextprotocol.io) — Open tool standard
---
**⭐ Star this repo if you find it useful!**
**[Report Bug](https://github.com/yourname/omni-agent/issues)** • **[Request Feature](https://github.com/yourname/omni-agent/issues)**