# terraform-databricks-mlops-azure-project-with-sp-linking **Repository Path**: mirrors_databricks/terraform-databricks-mlops-azure-project-with-sp-linking ## Basic Information - **Project Name**: terraform-databricks-mlops-azure-project-with-sp-linking - **Description**: This module creates and configures service principals with appropriate permissions and entitlements to run CI/CD for a project, and creates a workspace directory as a container for project-specific resources for the Azure Databricks staging and prod workspaces. It also links pre-existing Azure Active Directory (AAD) applications to the service principals. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-06-25 - **Last Updated**: 2025-10-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # MLOps Azure Project Module with Service Principal Linking In both of the specified staging and prod workspaces, this module: * Links a [pre-existing AAD applications](https://docs.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#create-a-service-principal) and associates it with a newly created Azure Databricks service principal, configuring appropriate permissions and entitlements to run CI/CD for a project. * Creates a workspace directory as a container for project-specific resources The service principals are granted `CAN_MANAGE` permissions on the created workspace directories. **_NOTE:_** 1. This module is in preview so it is still experimental and subject to change. Feedback is welcome! 2. The [Databricks providers](https://registry.terraform.io/providers/databricks/databricks/latest/docs) that are passed into the module should be configured with workspace admin permissions. 3. The module assumes that one of the two Azure Infrastructure Modules (with [Creation](https://registry.terraform.io/modules/databricks/mlops-azure-infrastructure-with-sp-creation/databricks/latest) or [Linking](https://registry.terraform.io/modules/databricks/mlops-azure-infrastructure-with-sp-linking/databricks/latest)) has already been applied, namely that service principal groups with token usage permissions have been created with the default name `"mlops-service-principals"` or by specifying the `service_principal_group_name` field. 4. The service principal AAD tokens are short-lived (<60 minutes in most cases). If a long-lived token is desired, the AAD token can be used to authenticate into a Databricks provider and provision a personal access token (PAT) for the service principal. ## Usage ### Option 1: Authentication with Azure Client Secret and Tenant ID This option will use the client secrets and tenant IDs to generate AAD tokens for authentication. The advantage to this approach is not having to manually generate short-lived AAD tokens (normally maximum lifetime of 60 minutes) each time this module needs to be used. **_NOTE:_** This option requires that Python 3.8+ be installed to obtain the service principal's AAD token. ```hcl provider "databricks" { alias = "staging" # Authenticate using preferred method as described in Databricks provider } provider "databricks" { alias = "prod" # Authenticate using preferred method as described in Databricks provider } module "mlops_azure_project_with_sp_linking" { source = "databricks/mlops-azure-project-with-sp-linking/databricks" providers = { databricks.staging = databricks.staging databricks.prod = databricks.prod } service_principal_name = "example-name" project_directory_path = "/dir-name" azure_staging_client_id = "k9l8m7n6o5-e5f6-g7h8-i9j0-a1b2c3d4p4" azure_staging_client_secret = var.azure_staging_client_secret # This value is sensitive. azure_staging_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4" azure_prod_client_id = "k9l8m7n6p4-e5f6-g7h8-i9j0-a1b2c3d4o5" azure_prod_client_secret = var.azure_prod_client_secret # This value is sensitive. azure_prod_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4" } ``` ### Option 2: Authentication with Azure Active Directory (AAD) Token This option will use the provided AAD tokens for authentication. The advantage to this approach is not having to create/provide client secrets and tenant IDs. ```hcl provider "databricks" { alias = "staging" # Authenticate using preferred method as described in Databricks provider } provider "databricks" { alias = "prod" # Authenticate using preferred method as described in Databricks provider } module "mlops_azure_project_with_sp_linking" { source = "databricks/mlops-azure-project-with-sp-linking/databricks" providers = { databricks.staging = databricks.staging databricks.prod = databricks.prod } service_principal_name = "example-name" project_directory_path = "/dir-name" azure_staging_client_id = "k9l8m7n6o5-e5f6-g7h8-i9j0-a1b2c3d4p4" azure_staging_aad_token = var.azure_staging_aad_token # This value is sensitive. azure_prod_client_id = "k9l8m7n6p4-e5f6-g7h8-i9j0-a1b2c3d4o5" azure_prod_aad_token = var.azure_prod_aad_token # This value is sensitive. } ``` ### Usage example with Git credentials for service principal This can be helpful for common use cases such as Git authorization for [Remote Git Jobs](https://docs.databricks.com/repos/jobs-remote-notebook.html). ```hcl data "databricks_current_user" "staging_user" { provider = databricks.staging } data "databricks_current_user" "prod_user" { provider = databricks.prod } provider "databricks" { alias = "staging_sp" host = data.databricks_current_user.staging_user.workspace_url token = module.mlops_azure_project_with_sp_linking.staging_service_principal_aad_token } provider "databricks" { alias = "prod_sp" host = data.databricks_current_user.prod_user.workspace_url token = module.mlops_azure_project_with_sp_linking.prod_service_principal_aad_token } resource "databricks_git_credential" "staging_git" { provider = databricks.staging_sp git_username = var.git_username git_provider = var.git_provider personal_access_token = var.git_token # This should be configured with `repo` scope for Databricks Repos. } resource "databricks_git_credential" "prod_git" { provider = databricks.prod_sp git_username = var.git_username git_provider = var.git_provider personal_access_token = var.git_token # This should be configured with `repo` scope for Databricks Repos. } ``` ### Usage example with [MLOps Azure Infrastructure Module with Service Principal Linking](https://registry.terraform.io/modules/databricks/mlops-azure-infrastructure-with-sp-linking/databricks/latest) ```hcl provider "databricks" { alias = "dev" # Authenticate using preferred method as described in Databricks provider } provider "databricks" { alias = "staging" # Authenticate using preferred method as described in Databricks provider } provider "databricks" { alias = "prod" # Authenticate using preferred method as described in Databricks provider } module "mlops_azure_infrastructure_with_sp_linking" { source = "databricks/mlops-azure-infrastructure-with-sp-linking/databricks" providers = { databricks.dev = databricks.dev databricks.staging = databricks.staging databricks.prod = databricks.prod } staging_workspace_id = "123456789" prod_workspace_id = "987654321" azure_staging_client_id = "k9l8m7n6o5-e5f6-g7h8-i9j0-a1b2c3d4p4" azure_staging_client_secret = var.azure_staging_client_secret # This value is sensitive. azure_staging_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4" azure_prod_client_id = "k9l8m7n6p4-e5f6-g7h8-i9j0-a1b2c3d4o5" azure_prod_client_secret = var.azure_prod_client_secret # This value is sensitive. azure_prod_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4" additional_token_usage_groups = ["users"] # This field is optional. } module "mlops_azure_project_with_sp_linking" { source = "databricks/mlops-azure-project-with-sp-linking/databricks" providers = { databricks.staging = databricks.staging databricks.prod = databricks.prod } service_principal_name = "example-name" project_directory_path = "/dir-name" azure_staging_client_id = "k9l8m7n6o5-e5f6-g7h8-i9j0-a1b2c3d4p4" azure_staging_client_secret = var.azure_staging_client_secret # This value is sensitive. azure_staging_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4" azure_prod_client_id = "k9l8m7n6p4-e5f6-g7h8-i9j0-a1b2c3d4o5" azure_prod_client_secret = var.azure_prod_client_secret # This value is sensitive. azure_prod_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4" service_principal_group_name = module.mlops_azure_infrastructure_with_sp_linking.service_principal_group_name # The above field is optional, especially since in this case service_principal_group_name will be mlops-service-principals either way, # but this also serves to create an implicit dependency. Can also be replaced with the following line to create an explicit dependency: # depends_on = [module.mlops_azure_infrastructure_with_sp_linking] } ``` ## Requirements | Name | Version | |------|---------| |[terraform](https://registry.terraform.io/)|\>=1.1.6| |[databricks](https://registry.terraform.io/providers/databricks/databricks/0.5.8)|\>=0.5.8| |[python](https://www.python.org/downloads/release/python-380/) \