# Kyo
**Repository Path**: wwhdekj/kyo
## Basic Information
- **Project Name**: Kyo
- **Description**: Kyo是一款面向开发者的开源智能浏览器代理工具,基于 Open_manus 核心架构重构,将原 browser_use 模块替换为 Stagehand 协作框架,并新增元素可信度评估、匹配度优先选择等核心能力,旨在提升网页自动化操作的稳定性、安全性与精准度。
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 2
- **Created**: 2026-01-22
- **Last Updated**: 2026-01-22
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
**Key Yield Operator** - 智能浏览器代理工具
[](https://www.python.org/downloads/)
[](LICENSE)

面向开发者的开源智能浏览器代理工具,基于 Stagehand 协作框架构建
[English](#english) | [中文](#中文)
---
## 中文
### 简介
**Kyo** (/ˈkiːəʊ/,发音近似 "K-yo") 全称为 **Key Yield Operator**,是一款面向开发者的开源智能浏览器代理工具。
本项目基于 OpenManus 核心架构重构,将原 browser_use 模块替换为 Stagehand 协作框架,并新增元素可信度评估、匹配度优先选择等核心能力,旨在提升网页自动化操作的稳定性、安全性与精准度。
### 核心特性
#### 🎯 智能元素识别
- **元素可信度评估**:基于多维度指标(位置、文本、类型、属性)自动评估元素的可信度
- **匹配度优先选择**:智能选择最匹配用户意图的元素,避免误操作
- **广告自动过滤**:自动识别并避免点击广告、推广内容
#### 🛡️ 操作安全保障
- **多层验证机制**:通过多种方法(XPath、坐标、CSS选择器)验证元素
- **操作前确认**:AI 智能分析元素是否适合点击或输入
- **错误自动恢复**:操作失败时自动尝试备用方法
#### ⚡ 高性能优化
- **元素缓存机制**:减少重复检测,提升响应速度
- **异步操作支持**:充分利用异步 I/O,提高并发性能
- **智能等待策略**:根据页面加载状态动态调整等待时间
#### 🎨 可视化增强
- **元素高亮标注**:实时高亮显示所有可交互元素
- **索引标签展示**:为每个元素添加索引标签,便于识别
- **用户交互支持**:支持 Ctrl+E 快捷键暂停操作并输入新指令
### 技术架构
```
Kyo
├── 核心层 (Core Layer)
│ ├── Agent 管理
│ ├── 状态管理
│ └── 记忆系统
├── 浏览器层 (Browser Layer)
│ ├── Stagehand 协作框架
│ ├── 元素检测引擎
│ └── 操作执行引擎
├── 评估层 (Evaluation Layer)
│ ├── 元素可信度评估
│ ├── 匹配度计算
│ └── 广告检测
└── 交互层 (Interaction Layer)
├── 高亮渲染
├── 键盘监听
└── 用户中断处理
```
### 安装指南
#### 环境要求
- Python 3.12+
- Node.js 18+ (用于 Stagehand)
- Chrome/Chromium 浏览器
#### 方法一:使用 pip
```bash
# 克隆仓库
git clone https://gitee.com/xunanit/kyo.git
cd kyo
# 创建虚拟环境
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 安装依赖
pip install -r requirements.txt
# 安装 Playwright 浏览器
playwright install chromium
```
#### 方法二:使用 uv (推荐)
```bash
# 安装 uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# 克隆仓库
git clone https://gitee.com/xunanit/kyo.git
cd kyo
# 创建虚拟环境并安装依赖
uv venv --python 3.12
source .venv/bin/activate # Windows: .venv\Scripts\activate
uv pip install -r requirements.txt
# 安装 Playwright 浏览器
playwright install chromium
```
### 配置
1. 复制配置文件模板:
```bash
cp config/config.example.toml config/config.toml
```
2. 编辑 `config/config.toml` 配置 API 密钥:
```toml
[llm]
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
api_key = "sk-..." # 替换为你的 API 密钥
max_tokens = 4096
temperature = 0.0
[llm.vision]
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
api_key = "sk-..." # 替换为你的 API 密钥
```
### 快速开始
#### 基础使用
```bash
# 启动交互式程序
python main.py
# 输入指令示例:
# 打开小红书搜索寻安科技
# 打开必应搜索人工智能
# 访问 https://example.com
```
#### 编程接口
```python
from app.agent.manus import Manus
from app.schema import AgentState
async def main():
agent = Manus()
# 更新用户指令
agent.update_memory("user", "打开必应搜索人工智能")
# 执行代理
while agent.state != AgentState.FINISHED:
result = await agent.step()
print(f"步骤 {agent.current_step}: {result}")
if __name__ == "__main__":
import asyncio
asyncio.run(main())
```
### 核心能力说明
#### 1. 元素可信度评估
Kyo 使用多维度指标评估元素的可信度:
| 指标 | 权重 | 说明 |
|------|--------|------|
| 文本匹配度 | 40% | 元素文本与用户意图的匹配程度 |
| 位置合理性 | 25% | 元素在页面中的位置是否合理 |
| 类型匹配度 | 20% | 元素类型是否适合当前操作 |
| 属性完整性 | 15% | 元素是否具有必要的属性 |
**评分范围**:0-10 分,分数越高表示越可信
#### 2. 匹配度优先选择
当用户请求点击某个元素时,Kyo 会:
1. 检测页面所有可交互元素
2. 计算每个元素的可信度
3. 选择可信度最高的元素
4. 如果最佳元素与用户指定索引差异较大,提供建议
```python
# 示例:自动选择最佳元素
best_element = await agent._get_best_matching_element(
elements,
"搜索按钮",
page_context,
min_confidence=6.0
)
```
#### 3. 广告自动过滤
Kyo 自动识别并避免以下类型的广告:
- 包含广告关键词的元素(sponsored, ad, promotion 等)
- 位于 iframe 中的元素
- 父元素包含广告类名的元素
- 具有推广属性的元素
#### 4. 元素缓存机制
Kyo 使用智能缓存机制提升性能:
- **首次检测**:完整扫描页面所有元素
- **后续操作**:使用缓存的元素列表
- **页面变化**:自动检测内容变化并更新缓存
- **缓存失效**:页面跳转或刷新时清除缓存
### 快捷键
| 快捷键 | 功能 |
|--------|------|
| `Ctrl+E` | 暂停当前操作并输入新指令 |
| `Ctrl+C` | 终止当前程序 |
### 项目结构
```
kyo/
├── app/
│ ├── agent/ # 代理核心
│ │ ├── base.py # 基础代理类
│ │ ├── manus.py # Manus 代理
│ │ └── browser.py # 浏览器代理
│ ├── tool/ # 工具集合
│ │ ├── browser_use_tool.py # 浏览器操作工具
│ │ ├── keyboard_listener.py # 键盘监听
│ │ └── ...
│ ├── prompt/ # 提示词模板
│ ├── llm.py # LLM 接口
│ └── schema.py # 数据模型
├── config/ # 配置文件
├── tests/ # 测试文件
├── requirements.txt # 依赖列表
└── README.md # 本文档
```
### 开发指南
#### 运行测试
```bash
# 运行所有测试
pytest tests/
# 运行特定测试
pytest tests/test_browser.py -v
```
#### 代码规范
```bash
# 格式化代码
black app/
# 检查代码质量
flake8 app/
# 类型检查
mypy app/
```
### 贡献指南
我们欢迎任何形式的贡献!
1. Fork 本仓库
2. 创建特性分支 (`git checkout -b feature/AmazingFeature`)
3. 提交更改 (`git commit -m 'Add some AmazingFeature'`)
4. 推送到分支 (`git push origin feature/AmazingFeature`)
5. 开启 Pull Request
### 常见问题
#### Q: Kyo 与 OpenManus 有什么区别?
A: Kyo 基于 OpenManus 核心架构重构,主要区别在于:
- 使用 Stagehand 协作框架替代 browser_use
- 新增元素可信度评估机制
- 新增匹配度优先选择功能
- 优化了元素缓存和性能
#### Q: 如何自定义元素可信度评估?
A: 可以修改 `app/tool/browser_use_tool.py` 中的 `_get_best_matching_element` 方法,调整权重和评估逻辑。
#### Q: 支持哪些浏览器?
A: 目前支持 Chromium 系列浏览器(Chrome, Edge, Brave 等),未来计划支持 Firefox。
### 许可证
本项目采用 MIT 许可证 - 详见 [LICENSE](LICENSE) 文件
### 致谢
感谢以下项目的启发和支持:
- [OpenManus](https://github.com/mannaandpoem/OpenManus) - 核心架构基础
- [Stagehand](https://github.com/browserbase/stagehand) - 协作框架
- [Playwright](https://playwright.dev/) - 浏览器自动化
- [MetaGPT](https://github.com/geekan/MetaGPT) - 智能体框架
### 联系方式
- 项目主页: [https://gitee.com/xunanit/kyo](https://gitee.com/xunanit/kyo)
- 问题反馈: [Gitee Issues](https://gitee.com/xunanit/kyo/issues)
- 邮箱: g1ker@qq.com
---
## English
### Introduction
**Kyo** (/ˈkiːəʊ/, pronounced "K-yo") stands for **Key Yield Operator**, an open-source intelligent browser agent tool designed for developers.
This project is a refactored version based on the OpenManus core architecture, replacing the original browser_use module with the Stagehand collaboration framework. It adds core capabilities such as element confidence evaluation and priority matching selection, aiming to improve the stability, security, and precision of web automation operations.
### Core Features
#### 🎯 Intelligent Element Recognition
- **Element Confidence Evaluation**: Automatically evaluates element confidence based on multi-dimensional metrics (position, text, type, attributes)
- **Priority Matching Selection**: Intelligently selects the element that best matches user intent, avoiding misoperations
- **Automatic Ad Filtering**: Automatically identifies and avoids clicking ads and promotional content
#### 🛡️ Operation Security
- **Multi-layer Verification**: Verifies elements through multiple methods (XPath, coordinates, CSS selectors)
- **Pre-operation Confirmation**: AI intelligently analyzes whether an element is suitable for clicking or input
- **Automatic Error Recovery**: Automatically tries backup methods when operations fail
#### ⚡ High Performance
- **Element Caching**: Reduces redundant detection and improves response speed
- **Async Operation Support**: Fully utilizes async I/O for better concurrency
- **Smart Waiting Strategy**: Dynamically adjusts wait times based on page load status
#### 🎨 Visual Enhancement
- **Element Highlighting**: Real-time highlighting of all interactive elements
- **Index Label Display**: Adds index labels to each element for easy identification
- **User Interaction Support**: Supports Ctrl+E shortcut to pause operations and input new commands
### Installation
#### Requirements
- Python 3.12+
- Node.js 18+ (for Stagehand)
- Chrome/Chromium browser
#### Method 1: Using pip
```bash
# Clone repository
git clone https://gitee.com/xunanit/kyo.git
cd kyo
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install Playwright browsers
playwright install chromium
```
#### Method 2: Using uv (Recommended)
```bash
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone repository
git clone https://gitee.com/xunanit/kyo.git
cd kyo
# Create virtual environment and install dependencies
uv venv --python 3.12
source .venv/bin/activate # Windows: .venv\Scripts\activate
uv pip install -r requirements.txt
# Install Playwright browsers
playwright install chromium
```
### Configuration
1. Copy configuration template:
```bash
cp config/config.example.toml config/config.toml
```
2. Edit `config/config.toml` to configure API keys:
```toml
[llm]
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
api_key = "sk-..." # Replace with your API key
max_tokens = 4096
temperature = 0.0
[llm.vision]
model = "gpt-4o"
base_url = "https://api.openai.com/v1"
api_key = "sk-..." # Replace with your API key
```
### Quick Start
#### Basic Usage
```bash
# Start interactive program
python main.py
# Example commands:
# Open Xiaohongshu and search for 寻安科技
# Open Bing and search for 人工智能
# Visit https://example.com
```
#### Programming Interface
```python
from app.agent.manus import Manus
from app.schema import AgentState
async def main():
agent = Manus()
# Update user instruction
agent.update_memory("user", "Open Bing and search for AI")
# Execute agent
while agent.state != AgentState.FINISHED:
result = await agent.step()
print(f"Step {agent.current_step}: {result}")
if __name__ == "__main__":
import asyncio
asyncio.run(main())
```
### Core Capabilities
#### 1. Element Confidence Evaluation
Kyo evaluates element confidence using multi-dimensional metrics:
| Metric | Weight | Description |
|---------|---------|-------------|
| Text Match | 40% | How well the element text matches user intent |
| Position Reasonableness | 25% | Whether the element's position is reasonable |
| Type Match | 20% | Whether the element type is suitable for the operation |
| Attribute Completeness | 15% | Whether the element has necessary attributes |
**Score Range**: 0-10 points, higher score indicates higher confidence
#### 2. Priority Matching Selection
When a user requests to click an element, Kyo will:
1. Detect all interactive elements on the page
2. Calculate confidence score for each element
3. Select the element with the highest confidence
4. Provide suggestions if there's a significant difference between the best element and user-specified index
```python
# Example: Automatically select best element
best_element = await agent._get_best_matching_element(
elements,
"Search button",
page_context,
min_confidence=6.0
)
```
#### 3. Automatic Ad Filtering
Kyo automatically identifies and avoids the following types of ads:
- Elements containing ad keywords (sponsored, ad, promotion, etc.)
- Elements located in iframes
- Elements with parent elements containing ad class names
- Elements with promotional attributes
#### 4. Element Caching Mechanism
Kyo uses an intelligent caching mechanism for performance:
- **First Detection**: Complete scan of all page elements
- **Subsequent Operations**: Use cached element list
- **Page Changes**: Automatically detect content changes and update cache
- **Cache Invalidation**: Clear cache on page navigation or refresh
### Shortcuts
| Shortcut | Function |
|----------|-----------|
| `Ctrl+E` | Pause current operation and input new command |
| `Ctrl+C` | Terminate current program |
### Project Structure
```
kyo/
├── app/
│ ├── agent/ # Agent core
│ │ ├── base.py # Base agent class
│ │ ├── manus.py # Manus agent
│ │ └── browser.py # Browser agent
│ ├── tool/ # Tool collection
│ │ ├── browser_use_tool.py # Browser operation tool
│ │ ├── keyboard_listener.py # Keyboard listener
│ │ └── ...
│ ├── prompt/ # Prompt templates
│ ├── llm.py # LLM interface
│ └── schema.py # Data models
├── config/ # Configuration files
├── tests/ # Test files
├── requirements.txt # Dependencies
└── README.md # This document
```
### Development Guide
#### Running Tests
```bash
# Run all tests
pytest tests/
# Run specific test
pytest tests/test_browser.py -v
```
#### Code Style
```bash
# Format code
black app/
# Check code quality
flake8 app/
# Type checking
mypy app/
```
### Contributing
We welcome any form of contribution!
1. Fork this repository
2. Create feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to branch (`git push origin feature/AmazingFeature`)
5. Open Pull Request
### FAQ
#### Q: What's the difference between Kyo and OpenManus?
A: Kyo is a refactored version based on OpenManus core architecture, with main differences:
- Uses Stagehand collaboration framework instead of browser_use
- Adds element confidence evaluation mechanism
- Adds priority matching selection feature
- Optimizes element caching and performance
#### Q: How to customize element confidence evaluation?
A: You can modify the `_get_best_matching_element` method in `app/tool/browser_use_tool.py` to adjust weights and evaluation logic.
#### Q: Which browsers are supported?
A: Currently supports Chromium-based browsers (Chrome, Edge, Brave, etc.). Firefox support is planned for the future.
### License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
### Acknowledgments
Thanks to the following projects for inspiration and support:
- [OpenManus](https://github.com/mannaandpoem/OpenManus) - Core architecture foundation
- [Stagehand](https://github.com/browserbase/stagehand) - Collaboration framework
- [Playwright](https://playwright.dev/) - Browser automation
- [MetaGPT](https://github.com/geekan/MetaGPT) - Agent framework
### Contact
- Project Home: [https://gitee.com/xunanit/kyo](https://gitee.com/xunanit/kyo)
- Issue Tracker: [Gitee Issues](https://gitee.com/xunanit/kyo/issues)
- Email: g1ker@qq.com
---
**Made with ❤️ by the Kyo Team**
[⬆ Back to Top](#kyo)