# terraform-databricks-mlops-azure-infrastructure-with-sp-linking **Repository Path**: mirrors_databricks/terraform-databricks-mlops-azure-infrastructure-with-sp-linking ## Basic Information - **Project Name**: terraform-databricks-mlops-azure-infrastructure-with-sp-linking - **Description**: This module sets up multi-workspace model registry between an Azure Databricks development (dev) workspace, staging workspace, and production (prod) workspace, allowing READ access from dev/staging workspaces to staging & prod model registries. It also links pre-existing Azure Active Directory (AAD) applications to the service principals. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-06-25 - **Last Updated**: 2025-10-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # MLOps Azure Infrastructure Module with Service Principal Linking This module sets up [multi-workspace model registry](https://docs.microsoft.com/en-us/azure/databricks/applications/machine-learning/manage-model-lifecycle/multiple-workspaces) between a development (dev) workspace, a staging workspace, and a production (prod) workspace, allowing READ access from dev/staging workspaces to staging & prod model registries. The module performs this setup by linking [pre-existing AAD applications](https://docs.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#create-a-service-principal) with newly created Azure Databricks service principals in the staging and prod workspaces, then giving them READ-only access to their respective model registries. It will also create secret scopes and store the necessary secrets in the dev and staging workspaces, and only give READ access to this secret scope to the `"users"` group and the generated service principals group. The output of this module will be the secret scope names and prefixes since these values are needed to be able to [access the remote model registry](https://docs.microsoft.com/en-us/azure/databricks/applications/machine-learning/manage-model-lifecycle/multiple-workspaces#specify-a-remote-registry). **_NOTE:_** 1. This module is in preview so it is still experimental and subject to change. Feedback is welcome! 2. The [Databricks providers](https://registry.terraform.io/providers/databricks/databricks/latest/docs) that are passed into the module must be configured with workspace admin permissions. 3. In order to create tokens for service principals, they are added to a group, which is then given `token_usage` [permission](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/permissions#token-usage). However, in order to set this permission, there must be [at least 1 personal access token in the workspace](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/permissions#token-usage), and this permission [strictly overwrites existing permissions](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/obo_token#example-usage). Currently, running this module will overwrite permissions to allow token usage only for members of the generated service principals group in the staging and prod workspaces. If additional groups are desired to have `token_usage` permissions, they can be set via the `additional_token_usage_groups` input variable. 4. The service principal tokens stored for remote model registry access are created with a default expiration of 100 days (8640000 seconds), and the module will need to be re-applied after this time to refresh the tokens. ## Usage ### Option 1: Authentication with Azure Client Secret and Tenant ID This option will use the client secrets and tenant IDs to generate AAD tokens for authentication. The advantage to this approach is not having to manually generate short-lived AAD tokens (normally maximum lifetime of 60 minutes) each time this module needs to be used. **_NOTE:_** This option requires that Python 3.8+ be installed to obtain the service principal's AAD token. ```hcl provider "databricks" { alias = "dev" # Authenticate using preferred method as described in Databricks provider } provider "databricks" { alias = "staging" # Authenticate using preferred method as described in Databricks provider } provider "databricks" { alias = "prod" # Authenticate using preferred method as described in Databricks provider } module "mlops_azure_infrastructure_with_sp_linking" { source = "databricks/mlops-azure-infrastructure-with-sp-linking/databricks" providers = { databricks.dev = databricks.dev databricks.staging = databricks.staging databricks.prod = databricks.prod } staging_workspace_id = "123456789" prod_workspace_id = "987654321" azure_staging_client_id = "k9l8m7n6o5-e5f6-g7h8-i9j0-a1b2c3d4p4" azure_staging_client_secret = var.azure_staging_client_secret # This value is sensitive. azure_staging_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4" azure_prod_client_id = "k9l8m7n6p4-e5f6-g7h8-i9j0-a1b2c3d4o5" azure_prod_client_secret = var.azure_prod_client_secret # This value is sensitive. azure_prod_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4" additional_token_usage_groups = ["users"] # This field is optional. } ``` ### Option 2: Authentication with Azure Active Directory (AAD) Token This option will use the provided AAD tokens for authentication. The advantage to this approach is not having to create/provide client secrets and tenant IDs. ```hcl provider "databricks" { alias = "dev" # Authenticate using preferred method as described in Databricks provider } provider "databricks" { alias = "staging" # Authenticate using preferred method as described in Databricks provider } provider "databricks" { alias = "prod" # Authenticate using preferred method as described in Databricks provider } module "mlops_azure_infrastructure_with_sp_linking" { source = "databricks/mlops-azure-infrastructure-with-sp-linking/databricks" providers = { databricks.dev = databricks.dev databricks.staging = databricks.staging databricks.prod = databricks.prod } staging_workspace_id = "123456789" prod_workspace_id = "987654321" azure_staging_client_id = "k9l8m7n6o5-e5f6-g7h8-i9j0-a1b2c3d4p4" azure_staging_aad_token = var.azure_staging_aad_token # This value is sensitive. azure_prod_client_id = "k9l8m7n6p4-e5f6-g7h8-i9j0-a1b2c3d4o5" azure_prod_aad_token = var.azure_prod_aad_token # This value is sensitive. additional_token_usage_groups = ["users"] # This field is optional. } ``` ### Usage example with [MLOps Azure Project Module with Service Principal Linking](https://registry.terraform.io/modules/databricks/mlops-azure-project-with-sp-linking/databricks/latest) ```hcl provider "databricks" { alias = "dev" # Authenticate using preferred method as described in Databricks provider } provider "databricks" { alias = "staging" # Authenticate using preferred method as described in Databricks provider } provider "databricks" { alias = "prod" # Authenticate using preferred method as described in Databricks provider } module "mlops_azure_infrastructure_with_sp_linking" { source = "databricks/mlops-azure-infrastructure-with-sp-linking/databricks" providers = { databricks.dev = databricks.dev databricks.staging = databricks.staging databricks.prod = databricks.prod } staging_workspace_id = "123456789" prod_workspace_id = "987654321" azure_staging_client_id = "k9l8m7n6o5-e5f6-g7h8-i9j0-a1b2c3d4p4" azure_staging_client_secret = var.azure_staging_client_secret # This value is sensitive. azure_staging_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4" azure_prod_client_id = "k9l8m7n6p4-e5f6-g7h8-i9j0-a1b2c3d4o5" azure_prod_client_secret = var.azure_prod_client_secret # This value is sensitive. azure_prod_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4" additional_token_usage_groups = ["users"] # This field is optional. } module "mlops_azure_project_with_sp_linking" { source = "databricks/mlops-azure-project-with-sp-linking/databricks" providers = { databricks.staging = databricks.staging databricks.prod = databricks.prod } service_principal_name = "example-name" project_directory_path = "/dir-name" azure_staging_client_id = "k9l8m7n6o5-e5f6-g7h8-i9j0-a1b2c3d4p4" azure_staging_client_secret = var.azure_staging_client_secret # This value is sensitive. azure_staging_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4" azure_prod_client_id = "k9l8m7n6p4-e5f6-g7h8-i9j0-a1b2c3d4o5" azure_prod_client_secret = var.azure_prod_client_secret # This value is sensitive. azure_prod_tenant_id = "a1b2c3d4-e5f6-g7h8-i9j0-k9l8m7n6o5p4" service_principal_group_name = module.mlops_azure_infrastructure_with_sp_linking.service_principal_group_name # The above field is optional, especially since in this case service_principal_group_name will be mlops-service-principals either way, # but this also serves to create an implicit dependency. Can also be replaced with the following line to create an explicit dependency: # depends_on = [module.mlops_azure_infrastructure_with_sp_linking] } ``` ## Requirements | Name | Version | |------|---------| |[terraform](https://registry.terraform.io/)|\>=1.1.6| |[databricks](https://registry.terraform.io/providers/databricks/databricks/0.5.8)|\>=0.5.8| |[python](https://www.python.org/downloads/release/python-380/) \