# nvidia-model-benchmark **Repository Path**: mirrors_datastax/nvidia-model-benchmark ## Basic Information - **Project Name**: nvidia-model-benchmark - **Description**: Performance benchmark for NVIDIA embedding models - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-11 - **Last Updated**: 2025-12-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # nvidia-model-benchmark Performance benchmark for NVIDIA embedding models. ## Overview This tool benchmarks NVIDIA embedding models by sending requests with various configurations and measuring performance metrics such as latency and throughput. ## Usage ### Running with a Configuration File The benchmark suite now requires a configuration file to configure the benchmark parameters: ```bash ./run.sh my_bench.conf ``` Where `my_bench.conf` is a configuration file containing the benchmark parameters. #### Configuration File Format The configuration file uses a simple key-value format: ``` # Required parameters URL=http://localhost:8000 MODEL=nvidia/nv-embed-base # Optional parameters (comment out to use defaults) # Space-separated values for arrays MODE=query passage BATCH_SIZE=1 16 32 64 CONCURRENCY=1 4 8 16 ``` ##### Required Parameters - `URL`: The URL of the embedding service - `MODEL`: The model name to use for embeddings (the "nvidia/" prefix is optional and will be added automatically if omitted) ##### Optional Parameters - `MODE`: Space-separated list of modes to test (default: "query passage") - `BATCH_SIZE`: Space-separated list of batch sizes to test (default: "1 16 32 64") - `CONCURRENCY`: Space-separated list of concurrency levels to test (default: "1 4 8 16") An example configuration file is provided in `example.conf`. ## Results Results are saved to `result.csv` in the current directory, with the following columns: - Model: The model name - Tokens: Number of tokens per chunk - Batch size: Size of each batch - Concurrency: Number of concurrent operations - Min (ms): Minimum latency in milliseconds - Median (ms): Median latency in milliseconds - P90 (ms): 90th percentile latency in milliseconds - P99 (ms): 99th percentile latency in milliseconds - Max (ms): Maximum latency in milliseconds - Throughput: Requests per second