# DataWeaver

**Repository Path**: aidenhgl/data-weaver

## Basic Information

- **Project Name**: DataWeaver
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-17
- **Last Updated**: 2025-12-18

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Mock数据引擎

一个功能强大的Python多媒体Mock数据引擎，支持音频、文本、图片和视频四种类型数据的可扩展策略模式。


##  项目结构

```
├── src/                   # 核心代码目录
│   ├── core/              # 核心模块
│   ├── text/              # 文本模块
│   ├── image/             # 图片模块
│   ├── audio/             # 音频模块
│   └── video/             # 视频模块
├── mock_engine.py         # SDK
├── sdk_demo.py            # SDK演示脚本
├── README.md              # 本文档
└── requirements.txt       # 依赖列表
```

##  快速开始

### 安装依赖

```bash
pip install -r requirements.txt
```

### 基本使用

```python
from mock_engine import MockEngine

# 创建引擎
engine = MockEngine("./output")

# 生成文本
resp = engine.generate({
    "type": "text",
    "strategy": "random",
    "params": {"text_type": "news", "length": "medium"},
    "group": "test_suite"
})
print(f"文本内容: {resp.text[:100]}...")

# 生成图片
resp = engine.generate({
    "type": "image",
    "strategy": "placeholder",
    "params": {"width": 800, "height": 600, "text": "Demo"}
})
print(f"图片文件: {resp.file.name}")
print(f"文件大小: {resp.file.size} bytes")

# 使用with语句读取文件（推荐）
with resp.file as f:
    image_data = f.read()
    print(f"读取数据: {len(image_data)} bytes")

# 生成音频
resp = engine.generate({
    "type": "audio",
    "strategy": "tts",
    "params": {"content_type": "greeting", "voice_type": "female"}
})
print(f"音频时长: {resp.metadata['duration']} 秒")

# 生成视频
resp = engine.generate({
    "type": "video",
    "strategy": "text_animation",
    "params": {"text": "Hello World", "duration": 5}
})
print(f"视频文件: {resp.file.name}")
```

## 📚 SDK API 文档

### 核心类：MockEngine

#### 1. 同步API

**单个生成**
```python
def generate(self, request: Dict | MockRequest, safe: bool = False) -> MockResponse
```

参数：
- `request`: 生成请求，可以是dict或MockRequest对象
  - `type`: 媒体类型（"text", "image", "audio", "video"）
  - `strategy`: 策略名称（如"random", "placeholder"）
  - `params`: 策略参数（字典）
  - `group`: 分组名称（可选）
  - `tags`: 标签列表（可选）
- `safe`: 是否安全模式（True时不抛异常，返回失败响应）

返回：
- `MockResponse`对象，包含id、type、metadata、status等属性

**批量生成**
```python
def generate_batch(self, requests: List[Dict | MockRequest], continue_on_error: bool = False) -> List[MockResponse]
```

特点：
- 默认快速失败（任一失败立即中断并清理已生成文件）
- 设置`continue_on_error=True`可继续执行

#### 2. 异步API

```python
# 异步单个生成
async def generate_async(self, request: Dict | MockRequest, safe: bool = False) -> MockResponse

# 异步批量生成
async def generate_batch_async(
    self,
    requests: List[Dict | MockRequest],
    continue_on_error: bool = False,
    max_concurrency: int = 5
) -> List[MockResponse]
```

特点：
- 基于线程池实现，适合IO密集型场景
- 支持并发数控制（max_concurrency）
- 性能比同步快3-5倍

#### 3. 查询和管理

```python
# 获取生成历史
def get_history(self, type: str = None, group: str = None, tag: str = None, status: str = None) -> List[MockResponse]

# 获取统计信息
def get_statistics(self) -> Dict[str, Any]

# 清空历史
def clear_history(self, type: str = None)

# 清理所有文件
def cleanup(self)

# 上下文管理器支持
with MockEngine("./output") as engine:
    resp = engine.generate({...})
```

### 响应类型

#### TextResponse（文本响应）

```python
@dataclass
class TextResponse:
    id: str
    type: str
    content: str          # 文本内容
    metadata: Dict
    status: str
    error: Optional[str]

    @property
    def text(self) -> str
```

#### FileResponse（文件响应）

```python
@dataclass
class FileResponse:
    id: str
    type: str
    metadata: Dict
    status: str
    error: Optional[str]

    @property
    def file(self) -> MockFile          # 文件对象
    @property
    def file_path(self) -> Path | None  # 文件路径

    # 支持with语句
    def __enter__(self)
    def __exit__(self, exc_type, exc_val, exc_tb)
```

### 文件对象：MockFile

```python
class MockFile:
    path: Path          # 文件路径
    name: str           # 文件名
    size: int           # 文件大小（字节）

    def read(self, size: int = -1) -> bytes
        """读取文件内容"""

    def text(self, encoding: str = "utf-8") -> str
        """读取文本内容"""

    def open(self, mode: str = "rb")
        """打开文件"""

    # 支持with语句
    def __enter__(self)
    def __exit__(self, exc_type, exc_val, exc_tb)
```

#### MockFile使用示例

```python
# 方式1：直接读取
data = resp.file.read()
size = resp.file.size
name = resp.file.name

# 方式2：使用with语句（推荐，自动管理资源）
with resp.file as f:
    data = f.read()
    assert len(data) > 0

# 方式3：直接对响应使用with（更简洁）
with resp as f:
    data = f.read()
    process(data)

# 方式4：读取文本（如果是文本文件）
text = resp.file.text(encoding="utf-8")
```

##  完整示例

### 示例1：基础生成

```python
from mock_engine import MockEngine

engine = MockEngine("./test_output")

# 生成各类内容
requests = [
    {"type": "text", "strategy": "random", "params": {"text_type": "news"}, "group": "demo"},
    {"type": "image", "strategy": "placeholder", "params": {"width": 800, "height": 600}, "group": "demo"},
    {"type": "audio", "strategy": "tts", "params": {"content_type": "greeting"}, "group": "demo"},
    {"type": "video", "strategy": "text_animation", "params": {"text": "Hello", "duration": 3}, "group": "demo"},
]

results = engine.generate_batch(requests)

# 验证
print(f"成功生成 {len(results)} 个内容")
for resp in results:
    print(f"- {resp.type}: {resp.status}")
```

### 示例2：异步批量生成

```python
import asyncio
from mock_engine import MockEngine

async def generate_content():
    engine = MockEngine("./async_output")

    # 准备100个请求
    requests = [{"type": "text", "strategy": "random", "params": {"length": "short"}}] * 100

    # 异步批量生成（并发数限制为10）
    start = time.time()
    results = await engine.generate_batch_async(requests, max_concurrency=10)
    duration = time.time() - start

    print(f"生成100个文本耗时: {duration:.2f}秒")
    print(f"成功率: {sum(1 for r in results if r.status == 'success')}/100")

asyncio.run(generate_content())
```

### 示例3：文件操作

```python
from mock_engine import MockEngine
from pathlib import Path
import shutil

engine = MockEngine("./test_output")

# 生成测试图片
resp = engine.generate({
    "type": "image",
    "strategy": "placeholder",
    "params": {"width": 400, "height": 300, "text": "Test"}
})

# 方式1：使用with语句读取
with resp.file as f:
    image_data = f.read()
    assert len(image_data) == resp.file.size

# 方式2：直接读取
image_data = resp.file.read()
assert len(image_data) > 0

# 方式3：保存到指定位置
dest = Path("saved_images/test.png")
dest.parent.mkdir(parents=True, exist_ok=True)
shutil.copy(resp.file.path, dest)

print(f"图片已保存到: {dest}")
print(f"文件大小: {dest.stat().st_size} bytes")
```

##  依赖要求

### 基础依赖
- Python 3.8+
- Pillow (图像处理)
- Pydantic (类型验证)

### 可选依赖
- gTTS (文本转语音)
- scipy (音频处理)
- numpy (数值计算)

##  性能特点

- **快速启动**: 无需AI服务配置，立即可用
- **轻量级**: 最小依赖，核心功能仅需Pillow和Pydantic
- **可扩展**: 模块化设计，易于添加新功能
- **高效**: 批量生成，支持异步并发
- **测试友好**: 专门优化用于测试开发场景

##  贡献指南

欢迎提交Issue和Pull Request！

### 开发环境设置

```bash
# 克隆仓库
git clone <repository-url>
cd data_mock_engine

# 创建虚拟环境
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\\Scripts\\activate     # Windows

# 安装依赖
pip install -r requirements.txt

# 运行演示
python sdk_demo.py
```

## 📄 许可证

MIT License