# vast-data-vua **Repository Path**: zeldaw/vast-data-vua ## Basic Information - **Project Name**: vast-data-vua - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-14 - **Last Updated**: 2026-01-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

VAST DATA Logo

--- # VUA VUA is library code for LLM inference engines for external storage of KV caches. **NOTE: The functionality of the VUA vLLM connector is now consolidated into the generic [GDS Backend of LMCache](https://docs.lmcache.ai/kv_cache/gds.html).** ## Developer Quick Start ## VUA as plugin to vLLM 0.8.5 and above With vLLM 0.8.5 and above, the following is supported: - Prefix search (chunk size is based on `--max-num-batched-tokens` parameter). - Tensor parallelism - KV cache load via GDS Utilize the vLLM KV Connector plugin by importing it before running vLLM. ``` from vua.vllm.kv_connector_v1 import VUAStorageConnector_V1 ``` **Or** use the wrapper that comes with this package: ``` bin/vua-vllm serve ``` Where in `parameters`, you can pass the KV connector configuration, e.g: ``` --kv-transfer-config '{"kv_connector":"VUAStorageConnector_V1","kv_role":"kv_both","kv_connector_extra_config": {"shared_storage_path": "/mnt/shared-storage"}}' ``` See [guide](doc/kv-connector-v1.md) on how to see to measure performance using this connector. ## Standalone VUA Core There is also standalone library code that LLM engines can use. ### Developer Quick Start 1. **Understanding VUA core** VUA splits tokens into groups defined by a fixed split factor (see `VUAConfig.split_factor`). Token prefixes are hashes and converted to directory names where cache data is stored. The `put()` method splits key-value caches and stores fragmented tensor data into these directories, while `get_closest()` fetches the closest matching cache fragment based on supplied token prefixes. 2. **Exploring the Codebase** - **`src/vua/core.py`**: Contains the primary VUA implementation with `put()` and `get_closest()`, plus token-to-path conversion logic. - **`src/vua/serdes.py`**: Provides tensor serialization/deserialization functions (`tensor_to_bytes` and `bytes_to_tensor`) for efficiently handling PyTorch tensors. Compatible to safetensors. - **`tests/test_vua.py`**: Offers tests to validate token processing, cache storage and retrieval, and tensor serialization. 3. **Running the Tests** Run the tests by executing: ```bash uv run python -m unittest discover -s tests ``` 4. **Using VUA in Your Project** - Create a VUA instance by providing a configuration (e.g. `VUAConfig`) and a directory path for cache storage. - Utilize `put()` to store computed key-value caches and `get_closest()` to retrieve cached data based on token queries. - **Batched Operations:** VUA supports batched put and get operations. If you provide a 2D tensor of tokens to `put()`, it processes each sequence in parallel. Similarly, calling `get_closest()` with a list of token tensors returns a list of corresponding `ClosestKV` results. **Example:** ```python import os import torch from vua.core import VUA, VUAConfig # Set up cache storage directory cache_dir = "./vua_cache" os.makedirs(cache_dir, exist_ok=True) # Create a VUA instance vua = VUA(VUAConfig, cache_dir) # Generate sample tokens ensuring the length is divisible by split_factor tokens = torch.randint(0, 0xFFFF, (1, 512), dtype=torch.uint16) trimmed_tokens = VUAConfig.trim_to_split_factor(tokens) # Create a sample key-value cache (for demonstration purposes) # This is a list with one layer and two tensor pairs (keys and values) kvcache = [[torch.randn(1, 2, trimmed_tokens.size(1), 64), torch.randn(1, 2, trimmed_tokens.size(1), 64)]] # Store the cache using put() vua.put(trimmed_tokens, kvcache) # Retrieve the cache using get_closest() result = vua.get_closest(trimmed_tokens, device="cpu") print("Retrieved tokens:", result.tokens) print("Retrieved data:", result.data) ``` 5. **Examples directory** The examples directory contains two main use cases: - Usage with 'transformers' library ``` uv run python ./example/on-transformers.py ``` - Usage to experimental vLLM V0 connector for offline kvcache storage using VUA core: ``` uv run ./example/serve-vllm.sh ``` 6. **Debugging and Logging** VUA leverages Python’s `logging` module for detailed debug output. Configure custom log handlers during development to monitor directory navigation and cache operations effectively. # License VUA is released under the Apache License Version 2.0. See the file LICENSE for more details.