# datafuse
**Repository Path**: golden-eagle/datafuse
## Basic Information
- **Project Name**: datafuse
- **Description**: clone from https://github.com/datafuselabs/datafuse.git
- **Primary Language**: Rust
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 1
- **Forks**: 0
- **Created**: 2021-06-19
- **Last Updated**: 2021-07-29
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
Datafuse
Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture
Datafuse is a Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture written
in Rust, inspired by [ClickHouse](https://github.com/ClickHouse/ClickHouse) and powered by [arrow-rs](https://github.com/apache/arrow-rs), built to make it easy to power the Data Cloud.
## Principles
* **Fearless**
- No data races, No unsafe, Minimize unhandled errors
* **High Performance**
- Everything is Parallelism
* **High Scalability**
- Everything is Distributed
* **High Reliability**
- Datafuse primary design goal is reliability
## Architecture

## Performance
* **Memory SIMD-Vector processing performance only**
* Dataset: 100,000,000,000 (100 Billion)
* Hardware: AMD Ryzen 7 PRO 4750U, 8 CPU Cores, 16 Threads
* Rust: rustc 1.53.0-nightly (673d0db5e 2021-03-23)
* Build with Link-time Optimization and Using CPU Specific Instructions
* ClickHouse server version 21.4.6 revision 54447
| Query | FuseQuery (v0.4.1) | ClickHouse (v21.4.6) |
| ------------------------------------------------------------ | --------------------------------------------------- | ------------------------------------------------------------ |
| SELECT avg(number) FROM numbers_mt(100000000000) | 3.87 s.
(25.83 billion rows/s., 206.79 GB/s.) | **×1.6 slow, (6.04 s.)**
(16.57 billion rows/s., 132.52 GB/s.) |
| SELECT sum(number) FROM numbers_mt(100000000000) | 4.86 s.
(20.57 billion rows/s., 164.70 GB/s.) | **×1.2 slow, (5.90 s.)**
(16.95 billion rows/s., 135.62 GB/s.) |
| SELECT min(number) FROM numbers_mt(100000000000) | 5.61 s.
(17.82 billion rows/s., 142.65 GB/s.) | **×2.3 slow, (13.05 s.)**
(7.66 billion rows/s., 61.26 GB/s.) |
| SELECT max(number) FROM numbers_mt(100000000000) | 5.61 s.
(17.82 billion rows/s., 142.67 GB/s.) | **×2.5 slow, (14.07 s.)**
(7.11 billion rows/s., 56.86 GB/s.) |
| SELECT count(number) FROM numbers_mt(100000000000) | 3.12 s.
(32.03 billion rows/s., 256.48 GB/s.) | **×1.2 slow, (3.71 s.)**
(26.93 billion rows/s., 215.43 GB/s.) |
| SELECT sum(number+number+number) FROM numbers_mt(100000000000) | 17.85 s.
(5.60 billion rows/s., 44.85 GB/s.) | **×16.9 slow, (233.71 s.)**
(427.87 million rows/s., 3.42 GB/s.) |
| SELECT sum(number) / count(number) FROM numbers_mt(100000000000) | 4.02 s.
(24.86 billion rows/s., 199.10 GB/s.) | **×2.4 slow, (9.70 s.)**
(10.31 billion rows/s., 82.52 GB/s.) |
| SELECT sum(number) / count(number), max(number), min(number) FROM numbers_mt(100000000000) | 9.60 s.
(10.41 billion rows/s., 83.38 GB/s.) | **×3.4 slow, (32.87 s.)**
(3.04 billion rows/s., 24.34 GB/s.) |
| SELECT number FROM numbers_mt(10000000000) ORDER BY number DESC LIMIT 1000 | 5.34 s.
(1.87 billion rows/s., 14.99 GB/s.) | **×2.6 slow, (13.95 s.)**
(716.62 million rows/s., 5.73 GB/s.) |
| SELECT max(number),sum(number) FROM numbers_mt(1000000000) GROUP BY number % 3, number % 4, number % 5 | 9.03 s.
(110.71 million rows/s., 886.50 MB/s.) | **×3.5 fast, (2.60 s.)**
(385.28 million rows/s., 3.08 GB/s.) |
Note:
* ClickHouse system.numbers_mt is 16-way parallelism processing, [gist](https://gist.github.com/BohuTANG/bba7ec2c23da8017eced7118b59fc7d5)
* FuseQuery system.numbers_mt is 16-way parallelism processing, [gist](https://gist.github.com/BohuTANG/8c37f5390e129cfc9d648ff930d9ef03)
## Status
#### General
- [x] SQL Parser
- [x] Query Planner
- [x] Query Optimizer
- [x] Predicate Push Down
- [x] Limit Push Down
- [x] Projection Push Down
- [x] Type coercion
- [x] Parallel Query Execution
- [x] Distributed Query Execution
- [x] Shuffle Hash GroupBy
- [x] Merge-Sort OrderBy
- [ ] Joins (WIP)
#### SQL Support
- [x] Projection
- [x] Filter (WHERE)
- [x] Limit
- [x] Aggregate Functions
- [x] Scalar Functions
- [x] UDF Functions
- [x] SubQueries
- [x] Sorting
- [ ] Joins (WIP)
- [ ] Window (TODO)
## Getting Started
* [Quick Start](https://datafuse.rs/overview/architecture/)
* [Architecture](https://datafuse.rs/overview/architecture/)
* [Performance](https://datafuse.rs/overview/performance/)
## Roadmap
Datafuse is currently in **Alpha** and is not ready to be used in production, [Roadmap 2021](https://github.com/datafuselabs/datafuse/issues/746)
## Contributing
* [Contribution Guide](https://datafuse.rs/development/contributing/)
* [Coding Guidelines](https://datafuse.rs/development/coding-guidelines/)
## License
Datafuse is licensed under [Apache 2.0](LICENSE).