# datafuse
**Repository Path**: endpoint_rust/datafuse
## Basic Information
- **Project Name**: datafuse
- **Description**: No description available
- **Primary Language**: Rust
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-09-09
- **Last Updated**: 2021-09-09
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
Datafuse
The Open Source Cloud Warehouse for Everyone
https://datafuse.rs
- [What is Datafuse?](#what-is-datafuse)
- [Design Overview](#design-overview)
- [Meta Service Layer](#meta-service-layer)
- [Compute Layer](#compute-layer)
- [Storage Layer](#storage-layer)
- [Getting Started](#getting-started)
- [Roadmap](#roadmap)
## What is Datafuse?
Datafuse is an open source **elastic** and **reliable** cloud warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.
Datafuse design principles:
1. **Elastic** In Datafuse, storage and compute resources can dynamically scale up and down on demand.
2. **Secure** All data files and network traffic in Datafuse is encrypted end-to-end, and provide Role Based Access Control in SQL level.
3. **User-friendly** Datafuse is an ANSI SQL compliant cloud warehouse, it is easy for data scientist and engineers to use.
4. **Cost-efficient** Datafuse processes queries with high performance, and the user only pays for what is actually used.
## Design Overview

Datafuse consists of three components: `meta service` layer, and the decoupled `compute` and `storage` layers.
### Meta Service Layer
The meta service is a layer to service multiple tenants.
In current implementation, the meta service has components:
* **Metadata** - Which manages all metadata of databases, tables, clusters, the transaction, etc.
* **Administration** Which stores user info, user management, access control information, usage statistics, etc.
* **Security** Which performs authorization and authentication to protect the privacy of users' data.
### Compute Layer
The compute layer is the clusters that running computing workloads, each cluster have many nodes, each node has components:
* **Planner** - Builds execution plan from the user's SQL statement.
* **Optimizer** - Optimizer rules like predicate push down or pruning of unused columns.
* **Processors** - Vector-based query execution pipeline, which is build by planner instructions.
* **Cache** - Caching Data and Indexes based on the version.
Many clusters can attach the same database, so they can serve the query in parallel by different users.
### Storage Layer
Datafuse stores data in an efficient, columnar format as Parquet files.
For efficient pruning, Datafuse also creates indexes for each Parquet file to speed up the queries.
## Getting Started
* [Quick Start](https://datafuse.rs/overview/building-and-running/)
* [Whitepapers](https://datafuse.rs/overview/architecture/)
* [Performance](https://datafuse.rs/overview/performance/)
* [CLI Design](https://datafuse.rs/rfcs/cli/0001-cli-design/)
* [Contributing](https://datafuse.rs/development/contributing/)
* [Datafuse Weekly](https://datafuselabs.github.io/weekly/)
## Roadmap
Datafuse is currently in **Alpha** and is not ready to be used in production, [Roadmap 2021](https://github.com/datafuselabs/datafuse/issues/746)
## License
Datafuse is licensed under [Apache 2.0](LICENSE).