# dagster **Repository Path**: mirrors_databricks/dagster ## Basic Information - **Project Name**: dagster - **Description**: A data orchestrator for machine learning, analytics, and ETL. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-10-19 - **Last Updated**: 2025-10-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
# Dagster Dagster is a data orchestrator for machine learning, analytics, and ETL Dagster lets you define pipelines in terms of the data flow between reusable, logical components, then test locally and run anywhere. With a unified view of pipelines and the assets they produce, Dagster can schedule and orchestrate Pandas, Spark, SQL, or anything else that Python can invoke. Dagster is designed for data platform engineers, data engineers, and full-stack data scientists. Building a data platform with Dagster makes your stakeholders more independent and your systems more robust. Developing data pipelines with Dagster makes testing easier and deploying faster. ### Develop and test on your laptop, deploy anywhere With Dagster’s pluggable execution, the same pipeline can run in-process against your local file system, or on a distributed work queue against your production data lake. You can set up Dagster’s web interface in a minute on your laptop, or deploy it on-premise or in any cloud. ### Model and type the data produced and consumed by each step Dagster models data dependencies between steps in your orchestration graph and handles passing data between them. Optional typing on inputs and outputs helps catch bugs early. ### Link data to computations Dagster’s Asset Manager tracks the data sets and ML models produced by your pipelines, so you can understand how your they were generated and trace issues when they don’t look how you expect. ### Build a self-service data platform Dagster helps platform teams build systems for data practitioners. Pipelines are built from shared, reusable, configurable data processing and infrastructure components. Dagster’s web interface lets anyone inspect these objects and discover how to use them. ### Avoid dependency nightmares Dagster’s repository model lets you isolate codebases, so that problems in one pipeline don’t bring down the rest. Each pipeline can have its own package dependencies and Python version. Pipelines run in isolated processes so user code issues can't bring the system down. ### Debug pipelines from a rich UI Dagit, Dagster’s web interface, includes expansive facilities for understanding the pipelines it orchestrates. When inspecting a pipeline run, you can query over logs, discover the most time consuming tasks via a Gantt chart, re-execute subsets of steps, and more.
pip install dagster dagit
Integration | Dagster Library | |
![]() |
Apache Airflow | dagster-airflow Allows Dagster pipelines to be scheduled and executed, either containerized or uncontainerized, as Apache Airflow DAGs. |
![]() |
Apache Spark | dagster-spark · dagster-pyspark
Libraries for interacting with Apache Spark and PySpark. |
![]() |
Dask | dagster-dask
Provides a Dagster integration with Dask / Dask.Distributed. |
![]() |
Datadog | dagster-datadog
Provides a Dagster resource for publishing metrics to Datadog. |
![]() ![]() |
Jupyter / Papermill | dagstermill Built on the papermill library, dagstermill is meant for integrating productionized Jupyter notebooks into dagster pipelines. |
![]() |
PagerDuty | dagster-pagerduty
A library for creating PagerDuty alerts from Dagster workflows. |
![]() |
Snowflake | dagster-snowflake
A library for interacting with the Snowflake Data Warehouse. |
Cloud Providers | ||
![]() |
AWS | dagster-aws
A library for interacting with Amazon Web Services. Provides integrations with Cloudwatch, S3, EMR, and Redshift. |
![]() |
Azure | dagster-azure
A library for interacting with Microsoft Azure. |
![]() |
GCP | dagster-gcp
A library for interacting with Google Cloud Platform. Provides integrations with GCS, BigQuery, and Cloud Dataproc. |