# DataWeaver **Repository Path**: aidenhgl/data-weaver ## Basic Information - **Project Name**: DataWeaver - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-17 - **Last Updated**: 2025-12-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Mock数据引擎 一个功能强大的Python多媒体Mock数据引擎,支持音频、文本、图片和视频四种类型数据的可扩展策略模式。 ## 项目结构 ``` ├── src/ # 核心代码目录 │ ├── core/ # 核心模块 │ ├── text/ # 文本模块 │ ├── image/ # 图片模块 │ ├── audio/ # 音频模块 │ └── video/ # 视频模块 ├── mock_engine.py # SDK ├── sdk_demo.py # SDK演示脚本 ├── README.md # 本文档 └── requirements.txt # 依赖列表 ``` ## 快速开始 ### 安装依赖 ```bash pip install -r requirements.txt ``` ### 基本使用 ```python from mock_engine import MockEngine # 创建引擎 engine = MockEngine("./output") # 生成文本 resp = engine.generate({ "type": "text", "strategy": "random", "params": {"text_type": "news", "length": "medium"}, "group": "test_suite" }) print(f"文本内容: {resp.text[:100]}...") # 生成图片 resp = engine.generate({ "type": "image", "strategy": "placeholder", "params": {"width": 800, "height": 600, "text": "Demo"} }) print(f"图片文件: {resp.file.name}") print(f"文件大小: {resp.file.size} bytes") # 使用with语句读取文件(推荐) with resp.file as f: image_data = f.read() print(f"读取数据: {len(image_data)} bytes") # 生成音频 resp = engine.generate({ "type": "audio", "strategy": "tts", "params": {"content_type": "greeting", "voice_type": "female"} }) print(f"音频时长: {resp.metadata['duration']} 秒") # 生成视频 resp = engine.generate({ "type": "video", "strategy": "text_animation", "params": {"text": "Hello World", "duration": 5} }) print(f"视频文件: {resp.file.name}") ``` ## 📚 SDK API 文档 ### 核心类:MockEngine #### 1. 同步API **单个生成** ```python def generate(self, request: Dict | MockRequest, safe: bool = False) -> MockResponse ``` 参数: - `request`: 生成请求,可以是dict或MockRequest对象 - `type`: 媒体类型("text", "image", "audio", "video") - `strategy`: 策略名称(如"random", "placeholder") - `params`: 策略参数(字典) - `group`: 分组名称(可选) - `tags`: 标签列表(可选) - `safe`: 是否安全模式(True时不抛异常,返回失败响应) 返回: - `MockResponse`对象,包含id、type、metadata、status等属性 **批量生成** ```python def generate_batch(self, requests: List[Dict | MockRequest], continue_on_error: bool = False) -> List[MockResponse] ``` 特点: - 默认快速失败(任一失败立即中断并清理已生成文件) - 设置`continue_on_error=True`可继续执行 #### 2. 异步API ```python # 异步单个生成 async def generate_async(self, request: Dict | MockRequest, safe: bool = False) -> MockResponse # 异步批量生成 async def generate_batch_async( self, requests: List[Dict | MockRequest], continue_on_error: bool = False, max_concurrency: int = 5 ) -> List[MockResponse] ``` 特点: - 基于线程池实现,适合IO密集型场景 - 支持并发数控制(max_concurrency) - 性能比同步快3-5倍 #### 3. 查询和管理 ```python # 获取生成历史 def get_history(self, type: str = None, group: str = None, tag: str = None, status: str = None) -> List[MockResponse] # 获取统计信息 def get_statistics(self) -> Dict[str, Any] # 清空历史 def clear_history(self, type: str = None) # 清理所有文件 def cleanup(self) # 上下文管理器支持 with MockEngine("./output") as engine: resp = engine.generate({...}) ``` ### 响应类型 #### TextResponse(文本响应) ```python @dataclass class TextResponse: id: str type: str content: str # 文本内容 metadata: Dict status: str error: Optional[str] @property def text(self) -> str ``` #### FileResponse(文件响应) ```python @dataclass class FileResponse: id: str type: str metadata: Dict status: str error: Optional[str] @property def file(self) -> MockFile # 文件对象 @property def file_path(self) -> Path | None # 文件路径 # 支持with语句 def __enter__(self) def __exit__(self, exc_type, exc_val, exc_tb) ``` ### 文件对象:MockFile ```python class MockFile: path: Path # 文件路径 name: str # 文件名 size: int # 文件大小(字节) def read(self, size: int = -1) -> bytes """读取文件内容""" def text(self, encoding: str = "utf-8") -> str """读取文本内容""" def open(self, mode: str = "rb") """打开文件""" # 支持with语句 def __enter__(self) def __exit__(self, exc_type, exc_val, exc_tb) ``` #### MockFile使用示例 ```python # 方式1:直接读取 data = resp.file.read() size = resp.file.size name = resp.file.name # 方式2:使用with语句(推荐,自动管理资源) with resp.file as f: data = f.read() assert len(data) > 0 # 方式3:直接对响应使用with(更简洁) with resp as f: data = f.read() process(data) # 方式4:读取文本(如果是文本文件) text = resp.file.text(encoding="utf-8") ``` ## 完整示例 ### 示例1:基础生成 ```python from mock_engine import MockEngine engine = MockEngine("./test_output") # 生成各类内容 requests = [ {"type": "text", "strategy": "random", "params": {"text_type": "news"}, "group": "demo"}, {"type": "image", "strategy": "placeholder", "params": {"width": 800, "height": 600}, "group": "demo"}, {"type": "audio", "strategy": "tts", "params": {"content_type": "greeting"}, "group": "demo"}, {"type": "video", "strategy": "text_animation", "params": {"text": "Hello", "duration": 3}, "group": "demo"}, ] results = engine.generate_batch(requests) # 验证 print(f"成功生成 {len(results)} 个内容") for resp in results: print(f"- {resp.type}: {resp.status}") ``` ### 示例2:异步批量生成 ```python import asyncio from mock_engine import MockEngine async def generate_content(): engine = MockEngine("./async_output") # 准备100个请求 requests = [{"type": "text", "strategy": "random", "params": {"length": "short"}}] * 100 # 异步批量生成(并发数限制为10) start = time.time() results = await engine.generate_batch_async(requests, max_concurrency=10) duration = time.time() - start print(f"生成100个文本耗时: {duration:.2f}秒") print(f"成功率: {sum(1 for r in results if r.status == 'success')}/100") asyncio.run(generate_content()) ``` ### 示例3:文件操作 ```python from mock_engine import MockEngine from pathlib import Path import shutil engine = MockEngine("./test_output") # 生成测试图片 resp = engine.generate({ "type": "image", "strategy": "placeholder", "params": {"width": 400, "height": 300, "text": "Test"} }) # 方式1:使用with语句读取 with resp.file as f: image_data = f.read() assert len(image_data) == resp.file.size # 方式2:直接读取 image_data = resp.file.read() assert len(image_data) > 0 # 方式3:保存到指定位置 dest = Path("saved_images/test.png") dest.parent.mkdir(parents=True, exist_ok=True) shutil.copy(resp.file.path, dest) print(f"图片已保存到: {dest}") print(f"文件大小: {dest.stat().st_size} bytes") ``` ## 依赖要求 ### 基础依赖 - Python 3.8+ - Pillow (图像处理) - Pydantic (类型验证) ### 可选依赖 - gTTS (文本转语音) - scipy (音频处理) - numpy (数值计算) ## 性能特点 - **快速启动**: 无需AI服务配置,立即可用 - **轻量级**: 最小依赖,核心功能仅需Pillow和Pydantic - **可扩展**: 模块化设计,易于添加新功能 - **高效**: 批量生成,支持异步并发 - **测试友好**: 专门优化用于测试开发场景 ## 贡献指南 欢迎提交Issue和Pull Request! ### 开发环境设置 ```bash # 克隆仓库 git clone cd data_mock_engine # 创建虚拟环境 python -m venv venv source venv/bin/activate # Linux/Mac # or venv\\Scripts\\activate # Windows # 安装依赖 pip install -r requirements.txt # 运行演示 python sdk_demo.py ``` ## 📄 许可证 MIT License