# spark-oracle **Repository Path**: mirrors_oracle/spark-oracle ## Basic Information - **Project Name**: spark-oracle - **Description**: On the fly, translation of Spark programs to run natively on your Oracle DB. Your Spark programs require no changes. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-01-15 - **Last Updated**: 2025-09-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Spark_On_Oracle - Currently, data lakes comprising Oracle Data Warehouse and Apache Spark have these characteristics: - They have **separate data catalogs,** even if they access the same data in an object store. - Applications built entirely on Spark have to **compensate for gaps in data management.** - Applications that federate across Spark and Oracle usually suffer from **inefficient data movement.** - Operating Spark clusters are expensive because they lack administration tooling and they have gaps in data management. **Therefore, price-performance advantages of Spark are overstated.** ![current deployments](https://github.com/oracle/spark-oracle/wiki/uploads/currentDeploymentDrawbacks.png) This project fixes those issues: - It provides a single catalog: Oracle Data Dictionary. - Oracle is responsible for data management, including: - Consistency - Isolation - Security - Storage layout - Data lifecycle - Data in an object store managed by Oracle as external tables - It provides support for a full Spark programming model. - **Spark on Oracle** has these characteristics: - Full pushdown on SQL workloads: Query, DML on all tables, DDL for external tables. - Push SQL operations of other workloads. - Surface Oracle capabilities like machine learning and streaming in the Spark programming model. - Co-processor on Oracle instances to run certain kinds of Scala code. Co-processors are isolated and limited and therefore are easy to manage. - Enable simpler, smaller Spark clusters. ![spark on oracle](https://github.com/oracle/spark-oracle/wiki/uploads/spark-on-oracle.png) **Feature summary:** - Catalog integration. (See [this page](https://github.com/oracle/spark-oracle/wiki/Oracle-Catalog).) - Significant support for SQL pushdown, to the extent that more than 95 (of 99) [TPCDS queries](https://github.com/oracle/spark-oracle/wiki/TPCDS-Queries) are completely pushed to Oracle instance. (See [Operator](https://github.com/oracle/spark-oracle/wiki/Operator-Translation) and [Expression](https://github.com/oracle/spark-oracle/wiki/Expression-Translation) translation pages.) - Deployable as a Spark extension jar for Spark 3 environments. - [Language integration beyond SQL](https://github.com/oracle/spark-oracle/wiki/Language-Integration) and [DML](https://github.com/oracle/spark-oracle/wiki/Write-Path-Flow) support. See [Project Wiki](https://github.com/oracle/spark-oracle/wiki/home) for complete documentation. ## Installation Spark on Oracle can be deployed on any Spark 3.1 or above environment. See the [Quick Start Guide](https://github.com/oracle/spark-oracle/wiki/Quick-Start-Guide). ## Documentation See the [wiki](https://github.com/oracle/spark-oracle/wiki/home). ## Examples The [demo script](https://github.com/oracle/spark-oracle/wiki/Demo) walks you through the features of the library. ## Help Please file Github issues. ## Contributing This project welcomes contributions from the community. Before submitting a pull request, please [review our contribution guide](./CONTRIBUTING.md). ## Security Please consult the [security guide](./SECURITY.md) for our responsible security vulnerability disclosure process. ## License Copyright (c) 2021, 2023 Oracle and/or its affiliates. Released under the Universal Permissive License v1.0 as shown at .