# spark-data-generator **Repository Path**: mirrors_minio/spark-data-generator ## Basic Information - **Project Name**: spark-data-generator - **Description**: Generates dummy parquet, csv, json files for testing and validating MinIO compatibility - **Primary Language**: Unknown - **License**: CC-BY-4.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-10-22 - **Last Updated**: 2025-11-29 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # spark-data-generator Generates dummy parquet, csv, json files for testing and validating MinIO S3 API compatibility. ## TODO - Support configurable columns - Support CSV, JSON ## Spark Example shows how to generate parquet files for 1 billion rows, this example assumes that you have configured `spark-defaults.conf` to talk to MinIO deployment. ``` ~ sbt package ~ spark-submit --class "ParquetGenerator" --master spark://masternode:7077 \ --packages org.apache.hadoop:hadoop-aws:3.1.2 --driver-memory 100G \ --executor-memory 200G --total-executor-cores 256 \ target/scala-2.12/parquet-data-generator_2.12-1.0.jar 1000000000 1000 500 s3a://benchmarks/1b-500/ ```