A Spark SQL execution engine with vectorization optimization, which is used to replace the original execution engine of Spark SQL and provides higher performance.
The Community repo is to store all the information about openEuler Community, inclouding governance, SIGs(project teams), Communications and etc.
This repository contains common information and common tools of bigdata.
packaging and testing of the Apache Hadoop ecosystem
Bigtop-manager provides a modern, low-threshold web application to simplify the deployment and management of components for bigtop, similar to Apache Ambari and Cloudera Manager.
Apache Iceberg is a new table format for storing large, slow-moving tabular data.
Delta Lake is an open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python.
Apache Durid is a real-time database to power modern analytics application.
A unified analytics engine for large-scale data processing
Alluxio (formerly known as Tachyon) is a virtual distributed storage system.
The Apache Hive (TM) data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL.
A software platform for processing vast amounts of data
Information summary and discussion platform for CloudNative SIG
a high-performance service for building distributed applications
Apache Calcite is a dynamic data management framework.
Apache Ambari is a tool for provisioning, managing, and monitoring Apache Hadoop clusters.
A free and open source distributed realtime computation system
A Distributed Storage System for Structured Data