# record-x

**Repository Path**: yuandiv/record-x

## Basic Information

- **Project Name**: record-x
- **Description**: Web录制回放工具
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-04-03
- **Last Updated**: 2026-04-10

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

## 1. 摘要

    *   🆕 新增7个核心文件,构建完整的录制-执行系统
    *   🎯 实现基于Playwright + Scrapling的自适应元素定位机制
    *   🔄 支持录制用户操作并自动回放,具备智能重试和元素匹配能力
    *   🌐 提供Web UI界面,通过WebSocket实时通信

## 2. 可视化架构图

### 系统架构与数据流

```mermaid
graph TB
    subgraph "前端界面 index.html"
        UI[用户界面]
        WS[WebSocket客户端]
        LOG[实时日志]
        STEPS[步骤预览]
    end
    
    subgraph "后端服务 server.py"
        API[FastAPI服务器]
        WS_SVR[WebSocket端点]
        SESSION[会话管理]
        STORAGE[JSON存储]
    end
    
    subgraph "录制模块 recorder.py"
        REC[Recorder录制器]
        REC_SMART[SmartWaiter]
        REC_JS[JS注入脚本]
        REC_FP[元素指纹提取]
    end
    
    subgraph "执行模块 executor.py"
        EXEC[Executor执行器]
        EXEC_SMART[SmartWaiter]
        EXEC_SIM[SimilarityScorer]
        EXEC_RETRY[SmartRetry]
        EXEC_VERIFY[ActionVerifier]
    end
    
    subgraph "浏览器层 Playwright"
        PW_BROWSER[浏览器实例]
        PW_PAGE[页面对象]
        PW_ELEM[元素操作]
    end
    
    UI -->|WebSocket| WS_SVR
    WS_SVR -->|消息| API
    API -->|管理| SESSION
    SESSION -->|读取/写入| STORAGE
    
    WS_SVR -->|start_recording| REC
    REC -->|注入| REC_JS
    REC_JS -->|监听| PW_PAGE
    PW_PAGE -->|事件| REC_FP
    REC_FP -->|生成步骤| SESSION
    
    WS_SVR -->|execute| EXEC
    EXEC -->|加载步骤| SESSION
    EXEC -->|查找元素| EXEC_SIM
    EXEC_SIM -->|自适应匹配| PW_ELEM
    PW_ELEM -->|执行操作| EXEC_VERIFY
    EXEC_VERIFY -->|验证失败| EXEC_RETRY
    EXEC_RETRY -->|重试| EXEC_SIM
    
    REC -.->|实时日志| WS
    EXEC -.->|执行状态| WS
    WS -.->|更新| LOG
    WS -.->|更新| STEPS
    
    style UI fill:#e3f2fd,color:#0d47a1
    style REC fill:#fff3e0,color:#e65100
    style EXEC fill:#f3e5f5,color:#7b1fa2
    style PW_BROWSER fill:#c8e6c9,color:#1b5e20
```

### 录制流程详细图

```mermaid
sequenceDiagram
    participant User as 👤 用户
    participant UI as 🖥️ Web界面
    participant Server as 🖧 FastAPI服务器
    participant Recorder as 📹 Recorder
    participant Browser as 🌐 Playwright浏览器
    participant Page as 📄 页面
    
    User->>UI: 输入URL并点击"开始录制"
    UI->>Server: WebSocket: {action: "start_recording"}
    Server->>Recorder: 创建Recorder实例
    Recorder->>Browser: 启动Chromium浏览器
    Browser-->>Recorder: 浏览器就绪
    Recorder->>Page: 导航到目标URL
    Recorder->>Page: 注入JS监听脚本
    Recorder-->>Server: 录制已开始
    Server-->>UI: 更新状态为"录制中"
    
    User->>Browser: 点击元素A
    Browser->>Page: 触发click事件
    Page->>Page: generateSelector()
    Page->>Page: extractFingerprint()
    Page-->>Recorder: 选择器+指纹数据
    Recorder->>Server: 记录步骤
    Server-->>UI: 实时显示步骤
    
    User->>Browser: 输入文本到输入框
    Browser->>Page: 触发change事件
    Page->>Page: generateSelector()
    Page->>Page: extractFingerprint()
    Page-->>Recorder: 选择器+指纹+值
    Recorder->>Server: 记录步骤
    Server-->>UI: 实时显示步骤
    
    User->>UI: 点击"结束录制"
    UI->>Server: WebSocket: {action: "stop_recording"}
    Server->>Recorder: 停止录制
    Recorder->>Server: 返回所有步骤
    Server->>Server: 保存到JSON文件
    Recorder->>Browser: 关闭浏览器
    Server-->>UI: 录制完成,显示步骤列表
```

### 执行流程详细图

```mermaid
sequenceDiagram
    participant User as 👤 用户
    participant UI as 🖥️ Web界面
    participant Server as 🖧 FastAPI服务器
    participant Executor as ⚙️ Executor
    participant Browser as 🌐 Playwright浏览器
    participant Page as 📄 页面
    participant Matcher as 🎯 SimilarityScorer
    
    User->>UI: 点击"开始执行"
    UI->>Server: WebSocket: {action: "execute"}
    Server->>Executor: 创建Executor实例
    Executor->>Browser: 启动浏览器
    Executor->>Page: 导航到目标URL
    
    loop 遍历每个步骤
        Executor->>UI: 发送"executing_step"
        alt 步骤类型: navigate
            Executor->>Page: goto(url)
            Page-->>Executor: 导航完成
        else 步骤类型: click
            Executor->>Page: query_selector(selector)
            alt 原始选择器成功
                Page-->>Executor: 找到元素
            else 原始选择器失败
                Executor->>Matcher: _match_by_similarity_score()
                Matcher->>Page: 收集候选元素
                Matcher->>Matcher: 计算相似度得分
                Matcher-->>Executor: 返回最佳匹配
            end
            Executor->>Page: element.click()
            Executor->>Executor: verify_click()
            Executor-->>UI: 发送"step_success"
        else 步骤类型: input
            Executor->>Page: query_selector(selector)
            alt 原始选择器成功
                Page-->>Executor: 找到元素
            else 原始选择器失败
                Executor->>Matcher: _match_by_similarity_score()
                Matcher-->>Executor: 返回最佳匹配
            end
            alt checkbox/radio
                Executor->>Page: check() / uncheck()
            else 普通输入框
                Executor->>Page: fill(value)
            end
            Executor-->>UI: 发送"step_success"
        end
    end
    
    Executor->>Browser: 关闭浏览器
    Executor-->>Server: 执行完成
    Server-->>UI: 发送"execution_completed"
```

---

## 3. 详细变更分析

### 📁 核心模块文件

#### **executor.py** (1491行) - 执行引擎

**组件名称**: `Executor` + 辅助类

**主要功能**:
- 执行录制的步骤序列
- 实现自适应元素匹配
- 提供智能重试机制
- 验证操作执行结果

**关键类与方法**:

| 类名 | 核心方法 | 功能描述 |
|------|----------|----------|
| `SimilarityScorer` | `score_element()` | 计算元素与指纹的相似度得分(0-100) |
| `SimilarityScorer` | `calculate_text_similarity()` | 文本相似度计算(支持包含关系) |
| `SimilarityScorer` | `calculate_attribute_similarity()` | 属性相似度计算(带权重) |
| `SimilarityScorer` | `_calculate_context_similarity()` | 上下文相似度(父级链匹配) |
| `SimilarityScorer` | `_calculate_semantic_similarity()` | 语义相似度(表单上下文、同级位置) |
| `SmartWaiter` | `wait_for_element_visible()` | 等待元素可见 |
| `SmartWaiter` | `wait_for_element_interactable()` | 等待元素可交互 |
| `SmartWaiter` | `wait_for_page_stable()` | 等待页面稳定(networkidle) |
| `SmartWaiter` | `smart_wait_before_action()` | 根据操作类型智能等待 |
| `ActionVerifier` | `verify_click()` | 验证点击是否成功 |
| `ActionVerifier` | `verify_input()` | 验证输入是否成功 |
| `SmartRetry` | `execute_with_retry()` | 带重试的步骤执行(最多3次) |
| `SnapshotComparator` | `capture_page_snapshot()` | 捕获页面快照 |
| `SnapshotComparator` | `compare_snapshots()` | 对比执行前后快照 |
| `Executor` | `execute()` | 执行所有步骤的主方法 |
| `Executor` | `_execute_step()` | 执行单个步骤 |
| `Executor` | `_find_element_adaptive()` | 自适应查找元素 |
| `Executor` | `_match_by_similarity_score()` | 基于相似度评分匹配 |
| `Executor` | `_collect_candidate_elements()` | 收集候选元素(13种策略) |

**相似度评分权重配置**:

```python
WEIGHTS = {
    "tag": 20.0,           # 标签匹配(最可靠)
    "text": 25.0,          # 文本匹配(重要)
    "attributes": 25.0,    # 属性匹配(可靠)
    "position": 5.0,       # 位置匹配(易受窗口影响)
    "context": 15.0,       # 上下文匹配(有用)
    "semantic": 10.0,      # 语义匹配(可能不准确)
}

ADDITIONAL_WEIGHTS = {
    "uniquenessScore": 10.0,      # 元素唯一性得分
    "interactionFeatures": 8.0,   # 交互特征匹配
    "visualAnchors": 7.0,         # 视觉锚点匹配
    "semanticContainers": 5.0,    # 语义容器匹配
}
```

**候选元素收集策略** (13种):
1. 通过`data-testid`查找
2. 通过`aria-label`查找
3. 通过`name`属性查找(表单元素)
4. 通过`placeholder`查找
5. 通过文本内容查找
6. 通过标签查找所有同类元素
7. 通过`role`属性查找
8. 查找常见可交互元素
9. 通过XPath查找
10. 通过备用选择器查找
11. 通过视觉锚点查找(图标、标题)
12. 通过语义容器查找
13. 通过附近地标查找

---

#### **recorder.py** (1775行) - 录制引擎

**组件名称**: `Recorder` + `SmartWaiter`

**主要功能**:
- 启动浏览器并监听用户操作
- 生成稳定的选择器
- 提取元素指纹(包含位置、视觉、上下文信息)
- 生成备用选择器

**关键特性**:

| 特性 | 实现方式 |
|------|----------|
| **智能选择器生成** | 优先级系统: data-testid > aria-label > 稳定ID > name > 文本 > 结构化路径 |
| **动态类名过滤** | 识别CSS Modules、Styled Components、Tailwind等动态类名 |
| **Checkbox/Radio优化** | 使用change事件而非click事件,避免重复记录 |
| **Label文本查找** | 6种策略: labels属性、for属性、父级label、相邻元素、aria-labelledby、表格上下文 |
| **元素指纹** | 包含40+字段: 标签、属性、位置、视觉、上下文、地标、交互特征等 |
| **备用选择器** | 生成3种备用: 文本选择器、XPath、位置选择器、附近地标、父级容器 |

**选择器生成优先级**:
```javascript
优先级1: data-testid, data-cy, data-test, data-test-id, data-automation-id
优先级1.5: data-id, data-key, data-name, data-field, data-value
优先级2: aria-label, aria-labelledby, role
优先级3: 稳定的ID(非动态生成)
优先级4: 表单元素专属(name, placeholder, type+value)
优先级5: 文本内容(支持动态数字模糊匹配)
优先级6: title属性
优先级7: 结构化路径(使用稳定类名)
```

**动态类名检测模式**:
```javascript
UNSTABLE_CLASS_PATTERNS = [
    /^css-[a-z0-9]+$/i,      // CSS Modules
    /^_[a-z0-9]+$/i,          // CSS-in-JS (emotion, etc.)
    /^sc-[a-zA-Z]+$/,         // Styled Components
    /^jsx-[0-9]+$/i,          // Styled JSX
    /^max-w-\[.+\]$/,         // Tailwind动态值
    /^w-\[.+\]$/,             // Tailwind动态值
    // ... 更多模式
]
```

---

#### **server.py** (171行) - FastAPI服务器

**组件名称**: `FastAPI应用` + WebSocket端点

**API端点**:

| 端点 | 方法 | 功能 |
|------|------|------|
| `/` | GET | 返回index.html页面 |
| `/ws/{client_id}` | WebSocket | WebSocket通信端点 |

**WebSocket消息类型**:

| 消息类型 | 方向 | 说明 |
|----------|------|------|
| `start_recording` | 客户端→服务器 | 开始录制 |
| `stop_recording` | 客户端→服务器 | 停止录制 |
| `execute` | 客户端→服务器 | 执行录制步骤 |
| `recording_started` | 服务器→客户端 | 录制已开始 |
| `recording_stopped` | 服务器→客户端 | 录制已停止 |
| `execution_started` | 服务器→客户端 | 执行已开始 |
| `execution_completed` | 服务器→客户端 | 执行已完成 |
| `execution_failed` | 服务器→客户端 | 执行失败 |
| `action_recorded` | 服务器→客户端 | 记录到新操作 |
| `executing_step` | 服务器→客户端 | 正在执行某步骤 |
| `step_success` | 服务器→客户端 | 步骤执行成功 |
| `step_error` | 服务器→客户端 | 步骤执行失败 |
| `adaptive_match` | 服务器→客户端 | 自适应匹配成功 |
| `log` | 服务器→客户端 | 日志消息 |
| `error` | 服务器→客户端 | 错误消息 |

---

#### **index.html** (685行) - Web UI界面

**组件名称**: `前端界面` + WebSocket客户端

**主要功能**:
- 提供用户友好的操作界面
- 实时显示录制日志和步骤
- 管理录制和执行状态
- WebSocket实时通信

**UI组件**:
- 控制面板: URL输入框、开始录制、停止录制、开始执行按钮
- 状态指示器: 就绪/录制中/执行中
- 实时日志: 彩色日志显示(成功/错误/信息/警告)
- 步骤预览: 显示录制的操作步骤列表

**样式特点**:
- 渐变背景设计
- 响应式布局(支持移动端)
- 动画效果(脉冲、滑入)
- 高对比度配色

---

### 📦 配置与依赖文件

#### **requirements.txt** - Python依赖

| 包名 | 版本 | 用途 |
|------|------|------|
| fastapi | 0.104.1 | Web框架 |
| uvicorn[standard] | 0.24.0 | ASGI服务器 |
| playwright | 1.40.0 | 浏览器自动化 |
| scrapling | 0.4.3 | 智能元素定位 |
| websockets | 12.0 | WebSocket支持 |
| python-multipart | 0.0.6 | 文件上传支持 |
| aiofiles | 23.2.1 | 异步文件操作 |

#### **start.bat** - Windows启动脚本

**功能**:
- 检查Python环境
- 安装依赖
- 安装Playwright浏览器
- 创建必要目录
- 启动服务器

**执行流程**:
```batch
[1/4] 检查并安装依赖...
[2/4] 安装Playwright浏览器...
[3/4] 创建必要目录...
[4/4] 启动服务器...
```

---

#### **readme.md** (271行) - 项目文档

**文档结构**:
1. 方案概述
2. 自适应定位的核心机制
3. 架构设计
4. 验证脚本(完整可运行)
5. 如何集成到Web应用
6. 方案优势与适用场景
7. 注意事项

**核心概念**:
- **元素指纹**: 标签、文本、属性、DOM路径、父级标签名、兄弟标签名
- **自适应匹配**: 当原始选择器失效时,根据指纹自动查找相似元素
- **无需AI/ML**: 纯规则计算,性能开销小

---

## 4. 影响与风险评估

### ✅ 优势与亮点

| 优势 | 说明 |
|------|------|
| 🎯 **一次录制,长期有效** | 即使网站UI发生中等程度变化,脚本仍能自动适应 |
| ⚡ **无需AI,轻量高效** | 基于规则匹配,性能开销小,无需GPU或API调用 |
| 🔗 **与Playwright无缝集成** | 使用原生Playwright对象,不改变现有代码结构 |
| 🛠️ **维护成本极低** | 用户无需编程,只需在网站变化后重新录制局部步骤 |
| 🎨 **友好的Web界面** | 实时日志、步骤预览、状态指示 |
| 🔄 **智能重试机制** | 失败自动重试,支持备用选择器 |
| 📊 **多维度相似度评分** | 10个维度综合评分,提高匹配准确度 |

### ⚠️ 风险与注意事项

| 风险 | 影响 | 缓解措施 |
|------|------|----------|
| **完全重构失效** | 高 | 若网站彻底重构(文本、标签全部改变),自适应可能失效,需通知用户重新录制 |
| **性能开销** | 中 | 自适应匹配在首次失败时触发,正常情况无额外开销;频繁失败可能影响执行速度 |
| **指纹数据库隔离** | 低 | 每个域名应使用独立数据库,避免指纹混淆 |
| **相似度阈值** | 低 | 可配置自适应匹配的阈值,避免误匹配 |
| **浏览器依赖** | 中 | 依赖Playwright Chromium,需要额外安装浏览器 |

### 🧪 测试建议

#### 功能测试
1. **基本录制流程**:
   - [ ] 录制导航、点击、输入操作
   - [ ] 验证步骤正确保存到JSON文件
   - [ ] 验证实时日志和步骤预览

2. **基本执行流程**:
   - [ ] 执行录制的步骤
   - [ ] 验证操作正确执行
   - [ ] 验证执行日志正确显示

3. **自适应匹配测试**:
   - [ ] 修改页面元素ID/类名
   - [ ] 执行录制步骤
   - [ ] 验证自适应匹配成功找到元素

4. **智能重试测试**:
   - [ ] 模拟元素加载延迟
   - [ ] 验证自动重试机制
   - [ ] 验证备用选择器使用

5. **特殊元素测试**:
   - [ ] Checkbox/Radio元素
   - [ ] 动态生成的元素
   - [ ] 带动态类名的元素(Tailwind、CSS Modules)
   - [ ] 图标元素(FontAwesome)

#### 边界测试
- [ ] 空URL输入
- [ ] 无效URL
- [ ] 未录制直接执行
- [ ] 录制0个步骤
- [ ] WebSocket断开重连
- [ ] 浏览器启动失败

#### 性能测试
- [ ] 录制100+步骤
- [ ] 执行100+步骤
- [ ] 复杂页面(大量元素)的自适应匹配性能
- [ ] 并发多个会话

---

## 5. 总结

1. **指纹数据库隔离**：每个域名应使用独立 SQLite 数据库，避免指纹混淆。
2. **相似度阈值**：可配置自适应匹配的阈值，避免误匹配。
3. **完全重构的处理**：若网站彻底重构（文本、标签全部改变），自适应可能失效，此时需通知用户重新录制。
4. **性能**：自适应匹配在首次失败时触发，正常情况无额外开销；但频繁失败时可能影响执行速度。

---

## 八、第一阶段增强功能（v1.1）

### 8.1 OCR文本定位

通过OCR技术识别页面文本，实现基于视觉的元素定位：

```python
from locators import OCRBasedLocator

# 通过文本定位元素
element = await OCRBasedLocator.locate_by_text(
    page,
    "登录",           # 目标文本
    fuzzy_match=True, # 模糊匹配
    min_confidence=0.5
)
```

**特性**：
- 支持 PaddleOCR（优先）和 Tesseract 双引擎
- 模糊文本匹配，支持相似度计算
- 自动查找文本附近的可交互元素
- 适用于Canvas渲染、图标按钮等难以通过DOM定位的场景

### 8.2 相对定位增强

基于页面地标元素的相对定位：

```python
from locators import RelativeLocator

# 通过地标定位附近元素
element = await RelativeLocator.locate_near_landmark(
    page, "input", "heading", "个人信息", max_distance=300
)

# 在容器内定位
element = await RelativeLocator.locate_in_container(
    page, "input", "form", container_text="姓名"
)

# 相对元素定位
element = await RelativeLocator.locate_relative_to_element(
    page, "button", anchor_element, position="below"
)
```

**支持的地标类型**：
- `heading` - 标题 (h1-h6)
- `label` - 标签
- `button` - 按钮
- `section` - 区块
- `navigation` - 导航
- `icon` - 图标

### 8.3 多策略融合定位

自动尝试多种定位策略，选择最佳结果：

```python
from locators import MultiStrategyLocator

element, strategy = await MultiStrategyLocator.locate(
    page,
    fingerprint,
    strategies=['dom_selector', 'relative_landmark', 'ocr_text', 'visual_position']
)
print(f"使用策略: {strategy}")
```

**策略优先级**：
1. DOM选择器定位
2. 相对地标定位
3. OCR文本定位
4. 视觉位置定位

### 8.4 模糊匹配增强

支持8种模糊匹配策略：

```python
from utils import FuzzyMatcher

result = FuzzyMatcher.multi_strategy_match("登录", "登錄")
print(f"得分: {result.score}, 方法: {result.method}")

# 判断是否模糊匹配
is_match = FuzzyMatcher.is_fuzzy_match("submit", "Submit", threshold=0.7)
```

**支持的匹配算法**：
| 算法 | 说明 |
|-----|------|
| exact | 精确匹配 |
| normalized | 标准化后匹配 |
| contains | 包含匹配 |
| levenshtein | 编辑距离相似度 |
| jaro_winkler | Jaro-Winkler相似度 |
| ngram | N-gram相似度 |
| word_overlap | 词重叠相似度 |
| semantic | 语义相似度（同义词） |

**语义相似度示例**：
- `login` ↔ `signin` → 0.9
- `submit` ↔ `send` → 0.9
- `cancel` ↔ `close` → 0.9

### 8.5 操作验证增强

完善的操作验证机制：

```python
from utils import EnhancedActionVerifier

# 验证点击
result = await EnhancedActionVerifier.verify_click(
    page, element, fingerprint, before_snapshot
)

# 验证输入
result = await EnhancedActionVerifier.verify_input(
    element, expected_value, element_type
)

# 验证导航
result = await EnhancedActionVerifier.verify_navigation(
    page, expected_url, expected_title
)

# 验证元素状态
result = await EnhancedActionVerifier.verify_element_state(
    element, {"visible": True, "enabled": True}
)

# 验证表单提交
result = await EnhancedActionVerifier.verify_form_submission(
    page,
    form_data,
    success_indicators=["提交成功", "保存成功"],
    failure_indicators=["错误", "失败"]
)
```

### 8.6 智能等待条件推断

自动推断最佳等待条件：

```python
from utils import SmartWaitConditionInferrer

# 推断等待条件
inferred = await SmartWaitConditionInferrer.infer_wait_condition(
    page, action_type="click"
)

# 执行等待
await SmartWaitConditionInferrer.wait_for_inferred_condition(page, inferred)
```

**推断规则**：
| 操作类型 | 推断条件 | 超时时间 |
|---------|---------|---------|
| navigate | networkidle | 15s |
| click | domcontentloaded | 5s |
| input | stable | 0.5s |

### 8.7 增强快照对比

详细的页面快照捕获和对比：

```python
from utils import EnhancedSnapshotComparator

# 捕获详细快照
snapshot = await EnhancedSnapshotComparator.capture_detailed_snapshot(page)

# 对比快照
comparison = EnhancedSnapshotComparator.compare_snapshots(before, after)
```

**快照内容**：
- URL、标题
- 元素数量、可交互元素数量
- 可见文本
- 表单、按钮、输入框、链接列表
- 加载的脚本和样式数量

**对比结果**：
- URL/标题变化
- 元素数量差异
- 文本相似度
- 变化摘要

### 8.8 备用选择器增强

录制时自动生成8种备用选择器：

```javascript
// 录制时自动生成
fallbackSelectors: [
    "button:text-is('登录')",           // 文本内容
    "xpath=//form/button",              // XPath
    "position:100,200",                  // 视口位置
    "near:heading:text('用户信息')",     // 附近地标
    "form.login > button",               // 父级容器
    "form[name='login'] button",         // 表单上下文
    "label:text('用户名') + input",      // 相邻元素
    "section:has-text('登录') button"    // 语义容器
]
```

---

## 九、文件结构

```
clickghost/
├── executor.py          # 执行器（集成所有增强功能）
├── recorder.py          # 录制器（增强选择器生成）
├── locators.py          # 定位器模块（OCR、相对定位、多策略）
├── utils.py             # 工具模块（模糊匹配、验证、等待、快照）
├── server.py            # WebSocket服务器
├── index.html           # 前端界面
├── requirements.txt     # 依赖包
├── test_phase1.py       # 第一阶段测试用例
└── readme.md            # 本文档
```

---

## 十、运行测试

```bash
# 安装依赖
pip install -r requirements.txt

# 安装OCR引擎（可选）
pip install paddleocr paddlepaddle

# 运行第一阶段测试
python test_phase1.py

# 运行定位器测试
python test_locators.py
```

---

## 十一、预期效果

| 功能 | 预计准确度提升 |
|-----|:-------------:|
| OCR文本定位 | +8-12% |
| 相对定位增强 | +10-15% |
| 模糊匹配增强 | +5-8% |
| 操作验证增强 | +10-15% |
| **第一阶段累计** | **+25-35%** |

---

## 十二、第二阶段增强功能（v1.2）

### 12.1 多模态融合定位

将多种定位模态融合，通过加权投票决策：

```python
from advanced_locators import MultiModalFusion, ModalityType

fusion = MultiModalFusion(page)
result = await fusion.locate(fingerprint)

print(f"最终得分: {result.final_score}")
print(f"获胜模态: {result.winning_modality.value}")
print(f"投票详情: {result.voting_details}")
```

**支持的模态**：
| 模态 | 说明 | 默认权重 |
|-----|------|:-------:|
| DOM_SELECTOR | DOM选择器定位 | 1.0 |
| ACCESSIBILITY_TREE | 无障碍树定位 | 0.9 |
| OCR_TEXT | OCR文本定位 | 0.85 |
| RELATIVE_LANDMARK | 相对地标定位 | 0.8 |
| SEMANTIC_CONTEXT | 语义上下文定位 | 0.75 |
| VISUAL_POSITION | 视觉位置定位 | 0.7 |

**元素类型权重调整**：
- `button`: OCR权重提升
- `input`: DOM权重提升
- `a`: 文本权重提升

### 12.2 自愈式选择器

选择器失效时自动修复：

```python
from advanced_locators import SelfHealingSelector

healer = SelfHealingSelector(page)

# 分析失败原因
analysis = await healer.analyze_failure(selector, fingerprint)
print(f"失败类型: {analysis.failure_type.value}")
print(f"建议修复: {analysis.suggested_fixes}")

# 执行自愈
result = await healer.heal(selector, fingerprint)
print(f"自愈成功: {result.success}")
print(f"使用策略: {result.strategy_used}")
print(f"新选择器: {result.new_selector}")
```

**支持的修复策略**：
| 失败类型 | 修复策略 |
|---------|---------|
| ELEMENT_NOT_FOUND | 指纹匹配 → 多模态融合 → 相对定位 → OCR |
| CLASS_CHANGED | 移除类名 → 稳定属性 → 相对定位 |
| ID_CHANGED | name属性 → aria-label → 相对定位 |
| TEXT_CHANGED | 模糊文本 → 其他属性 → OCR |
| STRUCTURE_CHANGED | 多模态融合 → 语义上下文 → 无障碍树 |

**缓存机制**：
- 成功修复的选择器自动缓存
- 下次相同场景直接使用缓存结果
- 支持自定义缓存存储

### 12.3 增强相似度评分

7维度动态权重评分：

```python
from advanced_locators import EnhancedSimilarityScorer

score, details = await EnhancedSimilarityScorer.score_element(
    element, fingerprint, page
)

print(f"总分: {score}")
print(f"各维度得分: {details['scores']}")
print(f"动态权重: {details['weights']}")
print(f"置信度: {details['confidence']}")
```

**评分维度**：
| 维度 | 权重 | 说明 |
|-----|:----:|------|
| tag | 15% | 标签匹配 |
| text | 20% | 文本相似度 |
| attributes | 25% | 属性匹配 |
| position | 10% | 位置相似度 |
| context | 15% | 上下文匹配 |
| semantic | 10% | 语义匹配 |
| visual | 5% | 视觉特征 |

**动态权重调整**：
- 低置信度时：提升文本权重，降低位置权重
- 高文本匹配时：进一步提升文本权重
- 位置偏差大时：降低位置权重

### 12.4 选择器健康度检测

评估选择器稳定性：

```python
from advanced_locators import SelectorHealthChecker

health = await SelectorHealthChecker.check_health(selector, page)

print(f"健康状态: {health['is_healthy']}")
print(f"健康分数: {health['score']}")
print(f"问题: {health['issues']}")
print(f"警告: {health['warnings']}")
print(f"建议: {health['suggestions']}")
```

**检测项目**：
- 选择器长度
- 动态类名模式（css-、sc-、jsx-等）
- 匹配元素数量
- 元素可见性和可交互性
- DOM层级深度

**健康分数计算**：
- 基础分：100分
- 过长选择器：-20分
- 动态类名：-15分/个
- 不唯一：-10分
- 不可见：-10分
- 不可交互：-5分

---

## 十三、文件结构（更新）

```
clickghost/
├── executor.py          # 执行器（集成多模态融合和自愈选择器）
├── recorder.py          # 录制器
├── locators.py          # 基础定位器（OCR、相对定位）
├── advanced_locators.py # 高级定位器（多模态融合、自愈选择器）
├── utils.py             # 工具模块
├── server.py            # WebSocket服务器
├── index.html           # 前端界面
├── requirements.txt     # 依赖包
├── test_phase1.py       # 第一阶段测试用例
├── test_phase2.py       # 第二阶段测试用例
└── readme.md            # 本文档
```

---

## 十四、运行测试（更新）

```bash
# 安装依赖
pip install -r requirements.txt

# 运行第一阶段测试
python test_phase1.py

# 运行第二阶段测试
python test_phase2.py

# 运行第三阶段测试
python test_phase3.py
```

---

## 十五、第三阶段增强功能（v1.3）

### 15.1 视觉AI定位

基于计算机视觉的元素定位：

```python
from ai_locators import VisualAILocator

visual_locator = VisualAILocator(page)

# 捕获元素视觉指纹
element = await page.query_selector("#submit-btn")
fingerprint = await visual_locator.capture_element_screenshot(element)

# 通过视觉指纹定位
result = await visual_locator.locate_by_visual(fingerprint, threshold=0.7)
print(f"置信度: {result.confidence}")
print(f"方法: {result.method.value}")
```

**支持的视觉方法**：
| 方法 | 说明 |
|-----|------|
| TEMPLATE_MATCH | 模板匹配（OpenCV） |
| FEATURE_MATCH | 特征点匹配 |
| COLOR_HISTOGRAM | 颜色直方图匹配 |
| EDGE_DETECTION | 边缘检测匹配 |

**视觉指纹包含**：
- 元素截图（Base64）
- 宽高和中心位置
- 颜色直方图
- 边缘特征
- 模板哈希

### 15.2 页面状态机

建立页面状态模型，理解操作上下文：

```python
from ai_locators import PageStateMachine

state_machine = PageStateMachine(page)

# 定义状态
state_machine.define_state(
    state_id="login_page",
    name="登录页面",
    required_elements=["#login-form"],
    optional_elements=["#forgot-password"]
)

state_machine.define_state(
    state_id="home_page",
    name="首页",
    required_elements=["#dashboard"]
)

# 定义转换
state_machine.define_transition(
    from_state="login_page",
    to_state="home_page",
    trigger="click",
    element_selector="#login-btn"
)

# 检测当前状态
current_state = await state_machine.detect_current_state()

# 验证状态
is_valid = await state_machine.verify_state("login_page")

# 执行状态转换
success = await state_machine.transition_to("home_page")
```

**状态机功能**：
- 自动检测当前状态
- 验证状态有效性
- 管理状态转换
- 记录状态历史
- 获取可能的转换

### 15.3 意图识别器

识别用户操作意图，生成高级操作：

```python
from ai_locators import IntentRecognizer

# 识别操作序列的意图
actions = [
    {"action": "click", "fingerprint": {"name": "username"}},
    {"action": "input", "value": "test@example.com", "fingerprint": {"name": "username"}},
    {"action": "click", "fingerprint": {"name": "password"}},
    {"action": "input", "value": "password123", "fingerprint": {"name": "password"}},
    {"action": "click", "fingerprint": {"tag": "button", "text": "登录"}},
]

result = IntentRecognizer.recognize(actions)

print(f"意图: {result.intent.value}")
print(f"置信度: {result.confidence}")
print(f"参数: {result.parameters}")
print(f"优化后操作: {result.optimized_actions}")
```

**支持的意图类型**：
| 意图类型 | 说明 | 触发模式 |
|---------|------|---------|
| FORM_FILL | 表单填写 | 多个输入操作 |
| LOGIN | 登录操作 | 用户名+密码+登录按钮 |
| SEARCH | 搜索操作 | 输入+搜索按钮 |
| PAGINATION | 分页操作 | 重复点击下一页 |
| DELETE | 删除操作 | 删除按钮+确认 |
| SUBMIT | 提交操作 | 填写+提交按钮 |
| CANCEL | 取消操作 | 取消/关闭按钮 |

**意图优化示例**：
```
原始操作: [点击用户名 → 输入 → 点击密码 → 输入 → 点击登录]
识别意图: LOGIN
优化操作: [login(username="xxx", password="xxx")]
```

### 15.4 参数化录制

自动识别可参数化字段：

```python
from ai_locators import ParameterizedRecording

# 识别可参数化字段
actions = [
    {"action": "input", "value": "test@example.com", "fingerprint": {"name": "email"}},
    {"action": "input", "value": "13800138000", "fingerprint": {"name": "phone"}},
]

params = ParameterizedRecording.identify_parameterizable_fields(actions)
# 返回: {"email": {"type": "email", "default_value": "test@example.com"}, ...}

# 应用新参数
new_params = {"email": "new@test.com", "phone": "18612345678"}
modified = ParameterizedRecording.apply_parameters(actions, new_params)
```

**自动识别的类型**：
| 类型 | 正则模式 | 示例 |
|-----|---------|------|
| email | `[\w.-]+@[\w.-]+\.\w+` | test@example.com |
| phone | `1[3-9]\d{9}` | 13800138000 |
| url | `https?://[\w./-]+` | https://example.com |
| date | `\d{4}-\d{2}-\d{2}` | 2024-01-15 |
| chinese_name | `[\u4e00-\u9fa5]+` | 张三 |

---

## 十六、文件结构（最终版）

```
clickghost/
├── executor.py          # 执行器（集成所有阶段功能）
├── recorder.py          # 录制器
├── locators.py          # 基础定位器（OCR、相对定位）
├── advanced_locators.py # 高级定位器（多模态融合、自愈选择器）
├── ai_locators.py       # AI定位器（视觉AI、状态机、意图识别）
├── utils.py             # 工具模块
├── server.py            # WebSocket服务器
├── index.html           # 前端界面
├── requirements.txt     # 依赖包
├── test_phase1.py       # 第一阶段测试用例
├── test_phase2.py       # 第二阶段测试用例
├── test_phase3.py       # 第三阶段测试用例
└── readme.md            # 本文档
```

---

## 十七、运行测试（最终版）

```bash
# 安装依赖
pip install -r requirements.txt

# 安装视觉AI依赖（可选）
pip install opencv-python numpy pillow

# 运行所有阶段测试
python test_phase1.py
python test_phase2.py
python test_phase3.py
```

---

## 十八、预期效果（最终版）

| 阶段 | 功能 | 准确度提升 |
|-----|-----|:---------:|
| **第一阶段** | | |
| | OCR文本定位 | +8-12% |
| | 相对定位增强 | +10-15% |
| | 模糊匹配增强 | +5-8% |
| | 操作验证增强 | +10-15% |
| | **第一阶段累计** | **+25-35%** |
| **第二阶段** | | |
| | 多模态融合定位 | +15-20% |
| | 自愈式选择器 | +12-18% |
| | 相似度优化 | +10-15% |
| | **第二阶段累计** | **+45-60%** |
| **第三阶段** | | |
| | 视觉AI定位 | +15-25% |
| | 页面状态机 | +10-15% |
| | 意图识别 | +12-18% |
| | 参数化录制 | +5-10% |
| | **第三阶段累计** | **+60-80%** |
| **总计** | | **+130-175%** |

---

## 十九、架构总览

```
┌─────────────────────────────────────────────────────────────┐
│                    ClickGhost 录制回放系统                    │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │   录制器    │  │   执行器    │  │   服务器    │         │
│  │  recorder   │  │  executor   │  │   server    │         │
│  └──────┬──────┘  └──────┬──────┘  └─────────────┘         │
│         │                │                                  │
│         ▼                ▼                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                   定位器层                          │   │
│  ├─────────────────────────────────────────────────────┤   │
│  │  第一阶段: OCR定位 + 相对定位 + 模糊匹配            │   │
│  │  第二阶段: 多模态融合 + 自愈选择器 + 增强评分       │   │
│  │  第三阶段: 视觉AI + 状态机 + 意图识别               │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                  │
│                          ▼                                  │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                   工具层                            │   │
│  ├─────────────────────────────────────────────────────┤   │
│  │  模糊匹配 | 操作验证 | 智能等待 | 快照对比          │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

这是一个**全新的Web录制回放系统**,具有以下特点:

🎯 **核心价值**: 通过Playwright + Scrapling实现自适应元素定位,解决传统录制脚本因UI变化而失效的问题

🏗️ **架构设计**: 清晰的模块化设计,分为录制器、执行器、服务器、前端UI四个部分

🔧 **技术亮点**:
- 13种候选元素收集策略
- 10维度相似度评分系统
- 智能选择器生成(7个优先级)
- 动态类名自动识别和过滤
- Checkbox/Radio专用优化
- 6种Label文本查找策略

📦 **完整交付**: 包含所有必需的代码、文档、依赖配置和启动脚本

该系统适合需要长期运行的重复性劳动场景,如每日审核、数据填报等,能够显著降低维护成本。