# from-jupyter-to-production-ml-platform **Repository Path**: mirrors_codecentric/from-jupyter-to-production-ml-platform ## Basic Information - **Project Name**: from-jupyter-to-production-ml-platform - **Description**: Example project for the creation of a machine learning platform with ZenML. - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-10-25 - **Last Updated**: 2025-09-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 🚀 ZenML Machine Learning Platform Demo on Azure Welcome to our demo setup for a Machine Learning Platform built with ZenML🧘. In this exercise, we will tackle the classic Titanic problem, where our goal is to predict the survival of passengers aboard the Titanic based on various features such as age, sex, ticket class, and other factors. This dataset is famously used in machine learning due to its rich set of features and the clear binary classification task it presents. Throughout this guide, we will walk through the entire machine learning pipeline from data preparation and feature engineering to model training with xgboost. Additionally, we will utilize MLflow for experiment tracking and managing our model registry to keep track of our model versions and metrics. Finally, we will deploy the trained model with BentoML. The provided components and services will support every stage of the lifecycle, ensuring a comprehensive understanding of how to build and deploy a machine learning model using ZenML. ## Objective In this exercise you will go through the essential steps needed to start, develop, and deploy a machine learning project. By assigning specific roles such as Data Scientist, Platform Engineer, and ML Engineer, you will gain practical experience by solving interactive tasks that mimic real-world scenarios. The aim is to deepen and expand knowledge and skills in machine learning, experiment tracking, feature management, and model deployment using a Machine Learning Platform build with ZenML. ## 🛠️ Setup / Stack Description In this exercise, we utilize a setup provided on Azure, encompassing various components. Here is an overview of the components used: ```mermaid graph TD subgraph Azure A[AKS Cluster] --> B[ZenML Server] A --> C[MLflow Server] A --> D[BentoML Model Deployer] E[Azure Blob Storage] -.-> B F[Azure Container Registry] -.-> B B --> G[ZenML Artifacts] G -.->E C --> H[MLflow Experiments] end D --> I[Model Deployment] ``` * AKS Cluster (Azure Kubernetes Service): The central orchestrator, where the ZenML Server and the MLflow Server are operated. * ZenML Server: The core of our ML orchestration and management platform. * MLflow Server: For logging experiments and managing the model registry. * BentoML Model Deployer: Used for the deployment of the trained models. * Azure Blob Storage: Used as the ZenML Artifact Store and for the Mlflow artifacts. * Azure Container Registry: Used as the Container Registry for ZenML. ## 🚧 Environment Setup Before you begin, ensure you have Docker installed if you choose to use the Code Server. If you prefer to use your own IDE, ensure you have uv installed for dependency management. ### 🛠️ Installation To start the Code Server service, run the following command: ```bash docker-compose up -d ``` You can access the Code Server instance at http://localhost:8443. #### Optional: Installing Python Extension in Code Server Open the Code Server and install the Python extension. This extension provides syntax highlighting, code completion, and other essential features for your Python development. You can typically find the extension marketplace in the sidebar of the Code Server interface. #### Using UV for Dependency Management (If you use your own IDE) We will utilize uv, a package and dependency manager that simplifies the creation of Python environments. It helps manage packages and dependencies in a streamlined manner. To install uv please see the following [link](https://docs.astral.sh/uv/getting-started/installation/). To create the virtual environment and install necessary dependencies, run the following command: ```bash uv sync --all-extras uv pip install -e . ``` This command will set up the virtual environment based on the project requirements. #### Activating the Virtual Environment (If you use your own IDE) After running the uv sync command, activate the virtual environment using the following command: ```bash source .venv/bin/activate ``` #### Add an environment variable Please add a `.env` file to the root of the project folder. In this file please insert the following line and replace `` with your group name. ```dotenv GROUP_NAME= ``` #### Testing Your Setup Once the virtual environment is activated, verify that ZenML is correctly installed by checking its version with: ```bash zenml --version ``` If you see the version number, you are all set to proceed with the next steps in the demo! ### 🔌 Connect to the ZenML Server For all further steps, you must connect to the ZenML server. You can do this with the following command. Make sure that you have previously logged in to the ZenML server with your user. ```bash zenml login ``` ## 🫵 Steps of the exercise In this role-play, you will be divided into three groups, each taking on a specific role within the machine learning project: Data Scientists, Platform Engineers, and ML Engineers. ### Role: Data Scientist As a Data Scientist, your main tasks are to establish the training and validation of a xgboost model and using MLflow for experiment tracking. These steps are crucial to ensure that your model is both accurate and reliable, and that all experiments are tracked effectively for reproducibility. #### Prepare the model training First, you need to provide the data to be used for feature engineering and training as a ZenML artifact. There is already a pipeline that you can use for this. Simply execute the following command from the root directory of the project. ```bash python src/titanicsurvivors/pipelines/prepare_raw_data.py ``` Once the raw data is available as an artifact in ZenML, this artifact can be used to perform feature engineering. Various methods are used to clean up the data and add other useful features that can help to create a better model. There is also an existing pipeline for this. Take a look at the code if you are interested in how feature engineering is carried out. The individual methods can be found under `src/titanicsurvivors/steps/feature_engineering`. You can also run the pipeline from the project root directory with the following command. ```bash python src/titanicsurvivors/pipelines/feature_engineering.py ``` Now everything is ready to start training the model. #### Training the XGBoost Model First, take a look at the training pipeline. You can find it under `src/titanicsurvivors/pipelines/train_xgb_classifier.py`. In the second line of the pipeline function, you will see that we want to use the ZenML client to retrieve the artifact with the training data. We can do this by entering a name. The name of the artifact is `combined_features`. Please add the name of the artifact in the code. Now we need to implement the actual training. To do this, please open the file `src/titanicsurvivors/steps/training.py` in which the function / step `train_xgb_classifier` can be found. First activate autologging for mlflow. You can do this with the following call. ```python import mlflow mlflow.autolog() ``` Now please add the following code to the step so that a training can be done. ```python dtrain = xgb.DMatrix(inputs, label=targets) params = { "max_depth": max_depth, "eta": eta, "objective": objective, "eval_metric": eval_metric, } cv_results = xgb.cv( params=params, dtrain=dtrain, num_boost_round=1000, nfold=5, metrics=["error"], early_stopping_rounds=10, stratified=True, ) best_iteration = cv_results["test-error-mean"].idxmin() best_model = xgb.train(params, dtrain, num_boost_round=best_iteration + 1) return best_model ``` Last but not least, we want to make the created model available as an artifact. To do this, the type of the return value of the function must be adapted. Please adjust the type of the return value so that it looks like this. ```python @step def train_xgb_classifier( ... ) -> Annotated[ xgb.Booster, "xgb_model_{os.getenv(‘GROUP_NAME’, ‘Default’)}" ]: ... ``` #### Validate the model Last but not least, we want to collect metrics that can tell us how good the model is. For this purpose, the `validate_xgb_model` step must be adapted. This can be found in the file `src/titanicsurvivors/steps/validation.py`. Please insert the following content in the correct place. ```python from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score accuracy = accuracy_score(targets, predictions) precision = precision_score(targets, predictions) recall = recall_score(targets, predictions) f1 = f1_score(targets, predictions) ``` Finally, you can now runt the training by executing the following command. ```bash python src/titanicsurvivors/pipelines/train_xgb_classifier.py ``` ### Role: Platform Engineer As a Platform Engineer, you will be responsible for registering the stack components with the ZenML Server. This includes the Orchestrator, Container Registry, Artifact Store, and MLflow as our Experiment Tracker. These components are fundamental for orchestrating, managing, and tracking the machine learning workflows and assets. > [!IMPORTANT] > Please remember to always replace the group name in the following commands (). #### Registering ZenML Components Ensure you have access to the ZenML server and establish a connection. This step is crucial for interacting with the ZenML Server and configuring stack components. #### Registering the Orchestrator Register the Azure Kubernetes Service (AKS) as the Orchestrator. This service will coordinate and manage the execution of your machine learning pipelines. ```bash zenml orchestrator register azure_kubernetes_ --flavor kubernetes ``` ```bash zenml orchestrator connect azure_kubernetes_ --connector azure-service-principal ``` #### Registering the Container Registry Register the Azure Container Registry to manage Docker images. This registry will store and facilitate the efficient use of Docker images required for pipeline steps if using a remote orchestrator like kubernetes. ```bash zenml container-registry register azure_container_registry_ -f azure --uri=mlopswszenmlcontainerregistry.azurecr.io ``` ```bash zenml container-registry connect azure_container_registry_ --connector azure-service-principal ``` ### Registering the Artifact Store Register the Azure Blob Storage as the Artifact Store. This storage will handle all artifacts produced during pipeline runs, such as datasets, models, and logs. ```bash zenml artifact-store register azure_store_ -f azure --path=az://mlopsws-zenmlartifactstore ``` ```bash zenml artifact-store connect azure_store_ --connector azure-service-principal ``` #### Registering the MLflow Experiment Tracker Register MLflow as the Experiment Tracker. MLflow will log metrics, parameters, and artifacts, and help in tracking the various experiments conducted. First we have to create a secret in ZenML, which is used to authenticate with the MLflow server. ```bash zenml secret create mlflow_secret_ --username= --password= ``` Now we can use this secret to add the MLflow server. ```bash zenml experiment-tracker register mlflow_ --flavor=mlflow --tracking_username={{mlflow_secret_.username}} --tracking_password={{mlflow_secret_.password}} --tracking_uri= ``` #### Creating the ZenML Stack Combine the registered components into a stack. This stack will define the infrastructure setup for executing and managing machine learning workflows. ```bash zenml stack register -a azure_store_ -o default -c azure_container_registry_ -e mlflow_ azure_ ``` #### Setting the Active ZenML Stack Activating the stack is essential to ensure that the components registered are correctly configured and ready for use in your ZenML pipelines. ```bash zenml stack set azure_ ``` To check if everything looks good please run the following command. ```bash zenml stack describe ``` ### Role: ML Engineer As an ML Engineer, your responsibility is to create a pipeline to deploy the trained model using BentoML. Deployment ensures that the trained model is accessible for inference and can be integrated into production systems. #### Register Bento Model Deployer Please execute the following command to create a model deployer stack component (change group name). ```bash zenml model-deployer register bentoml_deployer_ --flavor=bentoml ``` #### Creating the Bento Service First, a service must be created that will be used by Bento to process the requests for a prediction. This service must be stored in a Python file located right next to the deployment pipeline. Please create the file `src/titanicsurvivors/pipelines/deploy/service.py`. The content of the file must be as follows. Make sure that the name of the model to be loaded matches the model name that you define in the Bento deployment. ```python from typing import List import bentoml import numpy as np import pandas as pd import xgboost as xgb @bentoml.service( name="TitanicService", ) class TitanicService: def __init__(self): self.model = bentoml.xgboost.load_model("titanic-classifier") @bentoml.api() async def predict(self, inputs: List[dict]) -> List[str]: dinput = xgb.DMatrix(pd.DataFrame(inputs)) output_tensor = self.model.predict(dinput) print(output_tensor) return np.where( np.array([round(output) for output in output_tensor]) == 1, "Survived", "Died", ).tolist() ``` #### Create the deployment pipeline Now we need to create the deployment pipeline. This is located in the file `src/titanicsurvivors/pipelines/deploy/deploy_xgboost.py`. First get the current model artifact. You can obtain the name of the artifact from your data scientist. The name of the artifact must appear in the method call ` f“_{os.getenv(‘GROUP_NAME’, ‘Default’)}”` The next small step is to give the bento you are creating a model name. This name must match the model name from the service. The following code creates an InProcess Server that provides an endpoint that you can use for inferences. ```python from zenml.integrations.bentoml.steps import ( bentoml_model_deployer_step, ) bentoml_model_deployer_step( bento=bento, model_name="", # Name of the model port=3001, # Port to be used by the http server ) ``` ## 🧑‍🧑‍🧒 And now all together Now let's combine your work and see if everything worked. First, the data scientist needs to become active. Please enter the following command in your terminal to activate the stack that your platform engineer created for you. ```bash zenml stack set azure_ ``` Now please run all pipelines again. ```bash python src/titanicsurvivors/pipelines/prepare_raw_data.py ``` ```bash python src/titanicsurvivors/pipelines/feature_engineering.py ``` ```bash python src/titanicsurvivors/pipelines/train_xgb_classifier.py ``` Since everything now runs via Docker and remote execution, these steps will take some time. During this time, familiarize yourself with the ZenML dashboard and the results in MLflow. As soon as all pipelines have been run, the Machine Learning Engineer comes into play. First you have to create another stack to run the deployment pipeline. Please execute the following code (change your group name). ```bash zenml stack register -o default -a azure_store_ -d bentoml_deployer_ local_deploy_ zenml stack set local_deploy_ ``` Please run the deployment pipeline yourself. Make sure that the name of the model artifact matches that of the training pipeline. ```bash python src/titanicsurvivors/pipelines/deploy/deploy_xgboost.py ``` ### Testing the Deployment Access the Bento service in your web browser at http://localhost:3000 and test the Predict API using the provided JSON data. This will verify that the model is correctly deployed and capable of making predictions. ```json { "Age": 2.0, "Deck_0": 1.0, "Deck_1": 0.0, "Deck_2": 0.0, "Deck_3": 0.0, "Embarked_0": 1.0, "Embarked_1": 0.0, "Embarked_2": 0.0, "Family_Size_Grouped_0": 0.0, "Family_Size_Grouped_1": 0.0, "Family_Size_Grouped_2": 1.0, "Fare": 2, "Parch": 1, "Pclass_1": 1.0, "Pclass_2": 0.0, "Pclass_3": 0.0, "Sex_0": 1.0, "Sex_1": 0.0, "SibSp": 0, "Ticket_Frequency": 2, "Title_0": 1.0, "Title_1": 0.0, "Title_2": 0.0, "Title_3": 0.0, "is_married": 1 } ``` Congratulations! You have successfully completed the tasks for your respective roles, working together to build and deploy a machine learning model using ZenML on Azure. 🎉