# DataForge **Repository Path**: rust_us/data-forge ## Basic Information - **Project Name**: DataForge - **Description**: 基于Rust的高性能随机数生成引擎。 - 200+ 内置数据规则(姓名/地址/日期/货币等) - 支持正则表达式、模式语法、自定义生成器 - 多语言数据支持(中文/英文/日文) - **Primary Language**: Rust - **License**: MIT - **Default Branch**: master - **Homepage**: https://forge.codealy.top - **GVP Project**: No ## Statistics - **Stars**: 2 - **Forks**: 2 - **Created**: 2025-03-17 - **Last Updated**: 2026-01-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: Rust, datagenerator, rust-library ## README # DataForge [![crates.io](https://img.shields.io/badge/version-0.1.0-yellow)](https://crates.io/crates/dataforge) [//]: # ([![Documentation](https://img.shields.io/docsrs/dataforge)](https://docs.rs/dataforge)) [![](https://img.shields.io/circleci/project/github/badges/shields/master)](build_status) [![license](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue)](https://opensource.org/licenses/MIT) [![Website](https://img.shields.io/badge/官网-whosly-lightgrey?style=social&logo=world&logoColor=blue)](https://baidu.com) **High-performance Data Forge Workshop** - Random data generation and database population solution for Rust developers ## 📋 Prerequisites ``` Nightly Rust compiler $ rustc --version rustc 1.85.1 (4eb161250 2025-03-15) ``` ## ✨ Features - **High-performance Data Generation** - Rust-based high-performance random number generation engine - Multi-threaded parallel generation (powered by rayon) - Memory pool optimization technology - **Database Support** - Support for MySQL, PostgreSQL, SQLite databases - Automatic Schema inference and matching - Bulk insert optimization - **Rich Data Generators** - Name generators (Chinese, English, Japanese) - Address generators (supports Chinese regional data) - Network data generators (email, URL, IP, etc.) - Date and time generators - Number generators (phone numbers, ID cards, etc.) - **Flexible Generation Methods** - Support for regular expression pattern generation - Convenient macro interface - Support for custom generator extensions - Multi-language data support ## 🚀 Quick Start ### Installation ```toml [dependencies] dataforge = "0.1.0" # Optional features dataforge = { version = "0.1.0", features = ["database"] } ``` ### Basic Usage ```rust use dataforge::generators::*; use dataforge::forge; use serde_json::json; // Generate test user data let user = forge!({ "id" => uuid_v4(), "name" => name::zh_cn_fullname(), "age" => number::adult_age(), "email" => internet::email(), "phone" => number::phone_number_cn(), "address" => serde_json::json!({ "province": address::zh_province(), "city": "北京市", "street": address::zh_address() }), "created_at" => datetime::iso8601() }); println!("{}", serde_json::to_string_pretty(&user).unwrap()); ``` ### Using Macros to Generate Data ```rust use dataforge::{pattern, rand_num, datetime}; // Generate using patterns let phone = pattern!("1[3-9]\\d{9}"); // Generate random numbers let age = rand_num!(18, 65); // Generate date and time let timestamp = datetime!("timestamp"); let iso_date = datetime!("iso"); ``` ### Core Engine Usage ```rust use dataforge::core::{CoreEngine, GenConfig, GenerationStrategy}; let config = GenConfig { batch_size: 1000, strategy: GenerationStrategy::Random, null_probability: 0.05, ..Default::default() }; let engine = CoreEngine::new(config); let data = engine.generate_batch(100)?; // Get performance metrics let metrics = engine.metrics(); println!("Generated: {}, Errors: {}", metrics.generated_count(), metrics.error_count() ); ``` ### Database Population ```rust use dataforge::db::DatabaseForge; // Create database filler let forge = DatabaseForge::new("mysql://user:pass@localhost/db"); // Configure table and fill data let result = forge .table("users", 1000, |t| { t.field("id", || uuid_v4()) .field("name", || name::zh_cn_fullname()) .field("email", || internet::email()) }) .fill_sync()?; println!("Filled {} records", result); ``` ### Custom Generators ```rust use dataforge::{DataForge, Language}; use serde_json::Value; // Create data generator let mut forge = DataForge::new(Language::ZhCN); // Register custom generator forge.register("product_id", || { serde_json::json!(format!("PROD-{:06}", rand::random::() % 1000000)) }); // Use custom generator let product_id = forge.generate("product_id"); ``` ## Generator Types ### Name Generators - `name::zh_cn_fullname()` - Chinese full name - `name::en_us_fullname()` - English full name - `name::ja_jp_fullname()` - Japanese full name ### Address Generators - `address::zh_province()` - Chinese province - `address::zh_address()` - Chinese address - `address::us_state()` - US state name - `address::us_city()` - US city ### Network Data Generators - `internet::email()` - Email address - `internet::url()` - Website URL - `internet::ip_address()` - IP address - `internet::mac_address()` - MAC address - `internet::user_agent()` - User agent string ### Number Generators - `number::phone_number_cn()` - Chinese mobile number - `number::id_card_cn()` - Chinese ID card number - `number::credit_card_number()` - Bank card number - `number::adult_age()` - Adult age - `number::currency(min, max)` - Currency amount ### Date and Time Generators - `datetime::iso8601()` - ISO8601 format date - `datetime::timestamp()` - Timestamp - `datetime::birthday()` - Birthday date - `datetime::work_time()` - Work time ## Advanced Features ### Parallel Generation ```rust use dataforge::core::{CoreEngine, GenConfig, GenerationStrategy}; let config = GenConfig { batch_size: 1000, strategy: GenerationStrategy::Random, parallelism: 4, ..Default::default() }; let engine = CoreEngine::new(config); let results = engine.generate_batch(10000)?; ``` ### Memory Optimization ```rust use dataforge::memory::{MemoryPool, MemoryPoolConfig}; let config = MemoryPoolConfig::default(); let mut pool = MemoryPool::new(config); let buffer = pool.allocate(1024)?; ``` ### Rule Engine ```rust use dataforge::rules::{RuleEngine, Rule, RuleType}; let mut engine = RuleEngine::new(); engine.add_rule(Rule { name: "adult_user".to_string(), rule_type: RuleType::Condition, condition: "age >= 18".to_string(), action: "generate_adult_data".to_string(), }); ``` ## Configuration File Support Supports TOML and YAML configuration files: ```toml # dataforge.toml [generation] batch_size = 1000 strategy = "Random" null_probability = 0.05 [database] url = "mysql://user:pass@localhost/db" batch_size = 5000 ``` ## Performance Features - **Multi-threaded Parallelism**: Efficient parallel processing based on rayon - **Memory Pool**: Reduce memory allocation overhead - **Batch Operations**: Optimize database insert performance - **Lazy Loading**: Load data files on demand - **Zero Copy**: Reduce unnecessary memory copying ## Project Structure ``` dataforge/ ├── src/ │ ├── core.rs # Core engine │ ├── generators/ # Data generators │ ├── regions/ # Regional data │ ├── filling/ # Database filling │ ├── multithreading/ # Multi-threaded processing │ ├── memory/ # Memory management │ ├── customization/ # User customization │ ├── generation/ # Data generation │ ├── db/ # Database related │ │ └── schema.rs # Schema parsing │ ├── config.rs # Configuration management │ ├── rules/ # Rule engine │ └── macros.rs # Macro definitions ├── data/ # External data files ├── tests/ # Test files └── doc/ # Documentation ``` ## 📚 Ecosystem dataforge-faker: Ruby Faker-compatible syntax dataforge-sqlx: Async database support via sqlx dataforge-cli: Command-line data generation tool ## License This project is licensed under either MIT or Apache-2.0 dual license. ## Contributing Welcome to submit Issues and Pull Requests!