# S3-FileSystem **Repository Path**: mirrors_adobe/S3-FileSystem ## Basic Information - **Project Name**: S3-FileSystem - **Description**: An implementation of the Hadoop FileSystem contract backed by AWS S3 and DynamoDB - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-11-06 - **Last Updated**: 2026-01-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README S3-FileSystem ====== ### Introduction `S3-FileSystem` is an implementation of the Hadoop file system contract backed by AWS S3. For a details on configuration see our usage [guide](./docs/Usage.md). ### Goals `S3-FileSystem` was created to enable a more efficient usage of AWS S3. This means: - provide strong read after write consistency (in the meantime AWS has also rolled out native s3 [strong consystency](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html#ConsistencyModel)). - provide file `rename` as an atomic O(1) operation. Natively, files cannot be renamed in S3(other file system implementations on top of S3 implement file rename as a copy + delete). - avoid S3 partition [hotspot problem](./docs/S3PartitionHotSpotting.md) regardless of client defined file paths. ### Non-Goals `S3-FileSystem` does not aim be a drop in replacement for HDFS nor to fully implement the FileSystem specification. There are differences between `HDFS` and `S3-FileSystem`, most notably: - `S3-FileSystem` does not support atomic rename of directories. - `S3-FileSystem` does not support POSIX like permissions. For a full list of differences between `S3-FileSystem` and the Hadoop API specification see our contract [definition](./src/integrationTest/resources/contract/s3k.xml) and our API compatibility [analysis](./docs/HadoopFsApiCompatibility.md). For the full Hadoop API specification please see these [docs](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html). For the implicit assumptions(including atomicity and concurrency) of the API please see these [docs](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/introduction.html). ### Similar projects A few projects that tackle the same issues: - [S3 Guard](https://hadoop.apache.org/docs/r3.0.3/hadoop-aws/tools/hadoop-aws/s3guard.html) tackles S3 consistency issues: - Since S3 rolled out native strong consistency, the open source community has decided to deprecate S3 Guard. - [S3A committers](https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html) tackles both consistency and S3's rename problems - The S3A committers do not attempt to solve these issues at the `FileSystem` level, but at the `OutputCommitter` level. Thus, they are primarily targeted at improving Spark/MR job performance and correctness when running on S3. ### Contributing Contributions are welcomed! Read the [Contributing Guide](./docs/Contributing.md) for more information. ### Licensing This project is licensed under the Apache V2 License. See [LICENSE](LICENSE) for more information.