# async-file-io **Repository Path**: mirrors_databricks/async-file-io ## Basic Information - **Project Name**: async-file-io - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-10-25 - **Last Updated**: 2025-10-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # async-file-io An implementation of Apache Iceberg's FileIO that downloads files asynchronously. Async downloads are started when a new `InputFile` is created from the `FileIO` instance. The `InputFile` returned will block when `newStream` is called until the download completes. The underlying `ResolvingFileIO` is used for `newOutputFile` and `deleteFile`. ## Building To build, run gradle build: ``` ./gradlew build ``` ## Configuration To configure this `FileIO`, set the `io-impl` property on a catalog. Here is an example of Spark configuration for a catalog named `prod`: ``` spark.sql.catalog.prod=org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.prod.type=rest spark.sql.catalog.prod.uri=https://api.tabular.io/ws spark.sql.catalog.prod.credential=... spark.sql.catalog.prod.warehouse=prod spark.sql.catalog.prod.io-impl=io.tabular.AsyncFileIO spark.sql.catalog.prod.async.cache-location=file:/tmp ``` Where data is locally stored is configured by `async.cache-location`. The cache location can be either a local path (e.g. `file:/tmp`) or `memory:/` to cache data in an in-memory `FileIO`. To configure the number of background threads, set the Java system property `iceberg.worker.num-threads`.