# aistore **Repository Path**: mirrors_NVIDIA/aistore ## Basic Information - **Project Name**: aistore - **Description**: AIStore: scalable storage for AI applications - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-08-18 - **Last Updated**: 2026-05-23 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README **AIStore: High-Performance, Scalable Storage for AI Workloads** ![License](https://img.shields.io/badge/license-MIT-blue.svg) ![Version](https://img.shields.io/badge/version-v4.5-green.svg) ![Go Report Card](https://goreportcard.com/badge/github.com/NVIDIA/aistore) AIStore (AIS) is a lightweight distributed storage stack tailored for AI applications. It's an elastic cluster that can grow and shrink at runtime and can be ad-hoc deployed, with or without Kubernetes, anywhere from a single Linux machine to a bare-metal cluster of any size. Built from scratch, AIS provides linear scale-out, consistent performance, and a flexible deployment model. AIS is a reliable storage cluster that can natively operate on both in-cluster and remote data, without treating either as a cache. AIS consistently shows [balanced I/O distribution and linear scalability](https://aistore.nvidia.com/blog/2025/07/26/smooth-max-line-speed) across an arbitrary number of clustered nodes. The system supports fast data access, reliability, and rich customization for data transformation workloads. ## Features * ✅ **Multi-Cloud Access:** Seamlessly access and manage content across multiple [cloud backends](/docs/overview.md#at-a-glance) (including AWS S3, GCS, Azure, and OCI), with fast-tier performance, configurable redundancy, and namespace-aware bucket identity (same-name buckets can coexist across accounts, endpoints, and providers). * ✅ **Deploy Anywhere:** AIS runs on any Linux machine, virtual or physical. Deployment options range from a [minimal container-based deployment](https://github.com/NVIDIA/aistore/blob/main/deploy/prod/docker/compose/README.md) and [Google Colab](https://aistore.nvidia.com/blog/2024/09/18/google-colab-aistore) to petascale [Kubernetes clusters](https://github.com/NVIDIA/ais-k8s). There are [no built-in limitations](https://github.com/NVIDIA/aistore/blob/main/docs/overview.md#no-limitations-principle) on deployment size or functionality. * ✅ **High Availability:** Redundant control and data planes. Self-healing, end-to-end protection, n-way mirroring, and erasure coding. Arbitrary number of lightweight access points (AIS proxies). * ✅ **HTTP-based API:** A feature-rich, native API (with user-friendly SDKs for Go and Python), and compliant [Amazon S3 API](/docs/s3compat.md) for running unmodified S3 clients. * ✅ **Monitoring:** Comprehensive observability with integrated Prometheus metrics, Grafana dashboards, detailed logs with configurable verbosity, and CLI-based performance tracking for complete cluster visibility and troubleshooting. See [AIStore Observability](/docs/monitoring-overview.md) for details. * ✅ **Chunked Objects:** High-performance chunked object representation, with independently retrievable chunks, metadata v2, and checksum-protected manifests. Supports rechunking, parallel reads, and seamless integration with [Get-Batch](/docs/get_batch.md), [blob-downloader](/docs/blob_downloader.md), and multipart uploads to supported cloud backends. * ✅ **JWT Authentication and Authorization:** [Validates request JWTs](/docs/auth_validation.md) to provide cluster- and bucket-level access control using static keys or dynamic OIDC issuer JWKS lookup. * ✅ **Secure Redirects:** Configurable cryptographic signing of redirect URLs using HMAC-SHA256 with a versioned cluster key (distributed via metasync, stored in memory only). * ✅ **Load-Aware Throttling:** Dynamic request throttling based on a multi-dimensional load vector (CPU, memory, disk, file descriptors, goroutines) to protect AIS clusters under stress. * ✅ **Unified Namespace:** Attach AIS clusters together to provide unified access to datasets across independent clusters, allowing users to reference shared buckets with cluster-specific identifiers. * ✅ **Turn-key Cache:** In addition to robust data protection features, AIS offers a per-bucket configurable LRU-based cache with eviction thresholds and storage capacity watermarks. * ✅ **ETL Offload:** Execute I/O intensive data transformations [close to the data](/docs/etl.md), either inline (on-the-fly as part of each read request) or offline (batch processing, with the destination bucket populated with transformed results). * ✅ **Get-Batch:** Retrieve multiple objects and/or [archived files](/docs/archive.md) with a single call. Designed for ML/AI pipelines, [Get-Batch](/docs/get_batch.md) fetches an entire training batch in one operation, assembling a TAR (or other supported [serialization formats](/docs/archive.md)) that contains all requested items in the exact user-specified order ([paper](https://arxiv.org/abs/2602.22434)). * ✅ **Data Consistency:** Guaranteed [consistency](/docs/terminology.md#read-after-write-consistency) across all gateways, with [write-through](/docs/terminology.md#write-through) semantics in presence of [remote backends](/docs/terminology.md#backend-provider). * ✅ **Serialization & Sharding:** Native, first-class support for TAR, TGZ, TAR.LZ4, and ZIP [archives](/docs/archive.md) for efficient storage and processing of small-file datasets. Features include seamless integration with existing unmodified workflows across all APIs and subsystems. * ✅ **Kubernetes:** For production, AIS runs natively on Kubernetes. The dedicated [ais-k8s](https://github.com/NVIDIA/ais-k8s) repository includes the AIS K8s Operator, Ansible playbooks, Helm charts, and deployment guidance. * ✅ **Batch Jobs:** More than 30 cluster-wide [batch operations](/docs/batch.md) that you can start, monitor, and control otherwise. The list currently includes: ```console $ ais show job --help NAME: archive blob-download cleanup copy-bucket copy-objects delete-objects download dsort ec-bucket ec-get ec-put ec-resp elect-primary etl-bucket etl-inline etl-objects evict-objects evict-remote-bucket get-batch list lru-eviction mirror prefetch-objects promote-files put-copies rebalance rechunk rename-bucket resilver summary warm-up-metadata ``` > The feature set continues to grow and also includes: [native bucket inventory (NBI)](/docs/nbi.md); [blob-downloader](/docs/blob_downloader.md); [AuthN - authentication and authorization server](/docs/authn.md); runtime management of [TLS certificates](/docs/cli/x509.md); full support for [adding/removing nodes at runtime](/docs/lifecycle_node.md); adaptive [rate limiting](/docs/rate_limit.md); and more. > For the original **white paper** and design philosophy, please see [AIStore Overview](/docs/overview.md), which also includes high-level block diagram, terminology, APIs, CLI, and more. > For our 2024 KubeCon presentation, please see [AIStore: Enhancing petascale Deep Learning across Cloud backends](https://www.youtube.com/watch?v=N-d9cbROndg). ## CLI AIS includes an integrated, scriptable [CLI](/docs/cli.md) for managing clusters, buckets, and objects, running and monitoring batch jobs, viewing and downloading logs, generating performance reports, and more: ```console $ ais advanced cluster etl ls prefetch search tls alias config evict ml put show wait archive cp get mpu remote-cluster space-cleanup auth create help nbi rmb start blob-download download job object rmo stop bucket dsort log performance scrub storage ``` ## Developer Tools AIS runs natively on Kubernetes and features open format - thus, the freedom to copy or move your data from AIS at any time using the familiar Linux `tar(1)`, `scp(1)`, `rsync(1)` and similar. For developers and data scientists, there's also: * [Go API](https://github.com/NVIDIA/aistore/tree/main/api) used in [CLI](/docs/cli.md) and [benchmarking tools](/docs/aisloader.md) * [Python SDK](https://github.com/NVIDIA/aistore/tree/main/python/aistore/sdk) + [Reference Guide](https://docs.nvidia.com/aistore/python/aistore/sdk) * [PyTorch integration](https://github.com/NVIDIA/aistore/tree/main/python/aistore/pytorch) and usage examples * [Boto3 support](https://docs.nvidia.com/aistore/python/aistore/botocore_patch) ## Quick Start 1. Read the [Getting Started Guide](/docs/getting_started.md) for a 5-minute local install, or 2. Run a [minimal container-based](https://github.com/NVIDIA/aistore/tree/main/deploy/prod/docker/compose) AIS cluster consisting of a single gateway and a single storage node, or 3. Clone the repo and run `make kill cli aisloader deploy` followed by `ais show cluster` --------------------- ## Deployment options AIS deployment options, as well as intended (development vs. production vs. first-time) usages, are all [summarized here](https://github.com/NVIDIA/aistore/blob/main/deploy/README.md). Prerequisites essentially boil down to having Linux with a disk. Deployment options range from a [minimal container-based deployment](https://github.com/NVIDIA/aistore/tree/main/deploy/prod/docker/compose) to petascale bare-metal clusters of any size, and from a single VM to multiple racks of high-end servers. Practical use cases require, of course, further consideration. Some of the most popular deployment options include: | Option | Use Case | |------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | [Local playground](https://github.com/NVIDIA/aistore/blob/main/docs/getting_started.md#local-playground) | AIS developers or first-time users, Linux or Mac OS. Run `make kill cli aisloader deploy <<< $'N\nM'`, where `N` is a number of [targets](/docs/terminology.md#target), `M` is a number of [gateways](/docs/terminology.md#proxy) | | [Minimal container-based deployment](https://github.com/NVIDIA/aistore/tree/main/deploy/prod/docker/compose) | Quick testing and evaluation; single-node setup | | [GCP/GKE automated install](https://github.com/NVIDIA/aistore/blob/main/docs/getting_started.md#kubernetes-playground) | Developers, first-time users, AI researchers | | [Large-scale production deployment](https://github.com/NVIDIA/ais-k8s) | Requires Kubernetes; provided via [ais-k8s](https://github.com/NVIDIA/ais-k8s) | > For performance tuning, see [performance](/docs/performance.md) and [AIS K8s Playbooks](https://github.com/NVIDIA/ais-k8s/tree/main/playbooks/host-config). ## Existing Datasets AIS supports multiple ingestion modes: * ✅ **On Demand:** Transparent cloud access during workloads. * ✅ **PUT:** Locally accessible files and directories. * ✅ **Promote:** Import local target directories and/or NFS/SMB shares mounted on AIS targets. * ✅ **Copy:** Full buckets, virtual subdirectories (recursively or non-recursively), lists or ranges (via Bash expansion). * ✅ **Download:** HTTP(S)-accessible datasets and objects. * ✅ **Prefetch:** Remote buckets or selected objects (from remote buckets), including subdirectories, lists, and/or ranges. * ✅ **Archive:** [Group and store](https://aistore.nvidia.com/blog/2024/08/16/ishard) related small files from an original dataset. ## Install from Release Binaries You can install the CLI and benchmarking tools using: ```console ./scripts/install_from_binaries.sh --help ``` The script installs [aisloader](/docs/aisloader.md) and [CLI](/docs/cli.md) from the latest or previous GitHub [release](https://github.com/NVIDIA/aistore/releases) and enables CLI auto-completions. ## PyTorch integration PyTorch integration is a growing set of datasets (both iterable and map-style), samplers, and dataloaders: * [Taxonomy of abstractions and API reference](https://docs.nvidia.com/aistore/python/aistore/pytorch) * [AIS plugin for PyTorch: usage examples](https://github.com/NVIDIA/aistore/tree/main/python/aistore/pytorch/README.md) * [Jupyter notebook examples](https://github.com/NVIDIA/aistore/tree/main/python/examples/pytorch/) ## AIStore Badge Let others know your project is powered by high-performance AI storage: [![aistore](https://img.shields.io/badge/powered%20by-AIStore-76B900?style=flat&labelColor=000000)](https://github.com/NVIDIA/aistore) ```markdown [![aistore](https://img.shields.io/badge/powered%20by-AIStore-76B900?style=flat&labelColor=000000)](https://github.com/NVIDIA/aistore) ``` ## More Docs & Guides * [Overview and Design](/docs/overview.md) * [Terminology and Core Abstractions](/docs/terminology.md) * [Networking Model](/docs/networking.md) * [Getting Started](/docs/getting_started.md) * [AIS Buckets: Design and Operations](/docs/bucket.md) * [Observability](/docs/monitoring-overview.md) * [Technical Blog](https://aistore.nvidia.com/blog) * [S3 Compatibility](/docs/s3compat.md) * [Batch Jobs](/docs/batch.md) * [Performance](/docs/performance.md) and [CLI: performance](/docs/cli/performance.md) * [CLI Reference](/docs/cli.md) * [Production Deployment: Kubernetes Operator, Ansible Playbooks, Helm Charts, Monitoring](https://github.com/NVIDIA/ais-k8s) ### How to find information * See [Extended Index](/docs/docs.md) * Use CLI `search` command, e.g.: `ais search copy` * Clone the repository and run `git grep`, e.g.: `git grep -n out-of-band -- "*.md"` ## License MIT ## Author Alex Aizman (NVIDIA)