# async-file-io

**Repository Path**: mirrors_databricks/async-file-io

## Basic Information

- **Project Name**: async-file-io
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-10-25
- **Last Updated**: 2025-11-22

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# async-file-io

An implementation of Apache Iceberg's FileIO that downloads files asynchronously.

Async downloads are started when a new `InputFile` is created from the `FileIO` instance. The `InputFile` returned will block when `newStream` is called until the download completes.

The underlying `ResolvingFileIO` is used for `newOutputFile` and `deleteFile`.

## Building

To build, run gradle build:

```
./gradlew build
```

## Configuration

To configure this `FileIO`, set the `io-impl` property on a catalog.

Here is an example of Spark configuration for a catalog named `prod`:

```
spark.sql.catalog.prod=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.prod.type=rest
spark.sql.catalog.prod.uri=https://api.tabular.io/ws
spark.sql.catalog.prod.credential=...
spark.sql.catalog.prod.warehouse=prod
spark.sql.catalog.prod.io-impl=io.tabular.AsyncFileIO
spark.sql.catalog.prod.async.cache-location=file:/tmp
```

Where data is locally stored is configured by `async.cache-location`. The cache location can be either a local path (e.g. `file:/tmp`) or `memory:/` to cache data in an in-memory `FileIO`.

To configure the number of background threads, set the Java system property `iceberg.worker.num-threads`.