# BenchmarkTool **Repository Path**: mirrors_StarRocks/BenchmarkTool ## Basic Information - **Project Name**: BenchmarkTool - **Description**: Benchmark tool to test StarRocks using several benchmarks. - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-06-25 - **Last Updated**: 2025-12-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Benchmark tool Benchmark tool to test StarRocks using several benchmarks. ## Tool description ### Requirements * **python3** * python libraries: **pymysql** > Use command `pip3 install pymysql` to install. > > Use command `yum install python-pip` to install pip3 if the machine does not have **pip3**. * **mysqlslap**: This benchmark tool uses mysqlslap to test the StarRocks's performance > Use command `yum install mysql` to install mysqlslap. ### Project directories * `bin`: directory for some scripts * `conf`: directory for conf files * `result`: directory to store query results * `sql`: directory for all SQL files, there will be some sub-directories for different benchmarks * `tpch`: tpch benchmark SQL files including `create`, `load` and `query` * `ssb`: ssb benchmark SQL files including `create`, `load` and `query` * `src`: directory for tool codes * `thirdparty`: directory to store third party modules, such as dbgen for tpch, ssb ### Scripts All the scripts under `bin` directory: * `gen_data`: tools to gen data like tpch, ssb, ... * **gen-tpch.sh**: script to gen tpch data * **gen-ssb.sh**: script to gen ssb data * **create_db_table.sh**: script to create tables * **stream_load.sh**: script to load data into StarRocks using `stream load` * **broker_load.sh**: script to load data into StarRocks using `broker load` (not finished yet) * **flat_insert.sh**: script to load data into StarRocks using `insert into` (not finished yet) * **benchmark.sh**: script to test the performance or check the result correctness ### Test steps 1. Make sure the `Requirements` finished. 2. Compile the dbgen tool under `thirdparty` directory that you want. * tpch's `dbgen` binary is directly provided, we will add `Makefile` later. 3. Make sure a StarRocks cluster is ready, and you know the configuration that will be used in `conf/starrocks.conf` file. 4. Choose the benchmark you want, follow the specified steps bellow. ## SSB (Star Schema benchmark) > not finished yet ## TPC-H benchmark 1. Configure the StarRocks cluster info in file `conf/starrocks.conf` You should check and modify the IP, port, database info if needed. You can change other parameters if know them well. 2. Create tables ```bash # create tables for 100GB data ./bin/create_db_table.sh ddl_100 ``` You can specify other directory name (under sql/tpch directory) in which there are `create table` SQL files. There are some subtle differences between the same table's SQL files under different directories, like: different bucket size, different column order, which are for performance only. You can directly use `create table` SQL files under ddl_100 for smaller data, such as 1GB. 3. Generate data ```bash # generate 100GB data under the `data_100` directory ./bin/gen_data/gen-tpch.sh 100 data_100 # generate 1TB data under the `data_1T` directory ./bin/gen_data/gen-tpch.sh 1000 data_1T ``` You can change `100` to `1` to gen 1G data quickly for test. Such as: `./bin/gen_data/gen-tpch.sh 1 data_1G` You can use either absolute or relative directory path to store generated data. Such as: `./bin/gen_data/gen-tpch.sh 1 data/data_1G-2` > This *gen-tpch.sh* script just wraps the tpch-dbgen tool for convenience. > > You can run command `make` under `thirdparty/tpch-dbgen` directory to gen `dbgen` binary, where the dbgen source version is 3.0.0 downloaded from [tpc.org](http://tpc.org/tpc_documents_current_versions/current_specifications5.asp) . > > You can also download the latest version of **tpch-dbgen** tool from [tpc.org](http://www.tpc.org) directly by yourself, or see more information from other web pages, like [Data generation tool](https://docs.deistercloud.com/content/Databases.30/TPCH%20Benchmark.90/Data%20generation%20tool.30.xml), etc. 4. Load data using stream load ```bash # load 100GB data into StarRocks ./bin/stream_load.sh data_100 ``` `data_100` is the directory path with data you generated. You can either specify a absolute path or a relative path. 5. Test the performance ```bash ./bin/benchmark.sh -v -p -d tpch ``` See more information with `./bin/benchmark.sh -h` 6. Check the result ```bash ./bin/benchmark.sh -v -c -d tpch ``` Recently, you can check the result in the logs. (The expected result hasn't been put in the `result` directory yet) ## Project directories in detail It's for developers or testers. You can add in more benchmarks, including **data gen** tool, **SQL query** file, etc. ### SQL directory All SQL files are under the `sql` directory. There are several sub-directories for different benchmarks, one benchmark a directory. Such as `ssb`, `tpch`, `tpcds`, etc. Under each benchmark directory (just take the `tpch` directory for an example), there are serveral kinds of directories: * `ddl*`: There is usually a `***_create.sql` file to create all the tables. Different directories are for different data size with some different `create table` properties. See detail info in [tpch-README](sql/tpch/README.md) * `query`: There may be several sub-directories for different query purposes. Take the `ssb` benchmark for an example, there are `ssb`, `ssb-flat`, `ssb-low_cardinality` sub-directories, where the `ssb-flat` is for queries on the flatten table `lineorder_flat`, and the `ssb-low_cardinality` is for queries in **low cardinality** situation. * `insert`: We can insert data into a flatten **wide table** from other tables, mainly for `ssb` benchmark recently. ### Thirdparty directory Tools to generate data for different benchmarks. A simple copy for each. > Add links here (TODO)