# AzDataMaker **Repository Path**: mirrors_Azure/AzDataMaker ## Basic Information - **Project Name**: AzDataMaker - **Description**: It is a sample .NET Core app that runs in a Linux Azure Container Instance that generates files and uploads them to Azure Blob Storage. - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-08-29 - **Last Updated**: 2026-03-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README --- page_type: sample name: AzDataMaker topic: sample description: | It is a sample .NET Core app that runs in a Linux Azure Container Instance that generates files and uploads them to Azure Blob Storage. languages: - csharp products: - azure - azure-blob-storage urlFragment: azdatamaker --- # AzDataMaker The goal of this app is to generate files and upload them to Azure Blob Storage. It is a .NET Core app that runs in a Linux Azure Container Instance. You can configure the deployment script to start as many instances as your target storage account can take. > NOTE: AzCopy has a "benchmark" feature, [more info here](https://docs.microsoft.com/azure/storage/common/storage-ref-azcopy-bench). AzCopy Benchmark is designed to run a performance benchmark by uploading or downloading test data to or from a specified destination. We recommend using AzCopy for running performance benchmarks. This sample is designed to create massive quantities of data, leveraging the horizontal scale of the Azure Container Instance service. ## Repository Content | File/folder | Description | |-------------|-------------| | `src` | This folder contains the AzDataMaker Core Module | ## Deploying AzDataMaker Shared parameters. Run this each time you start a new console session. ``` bash # login az login # show the current subscriptions az account list --all -o table # change the default subscription (if needed) az account set -s "My Subscription Name" # the region you want to deploy to # this should match the region the storage account is in if you already have a storage account REGION="southcentralus" # the name of the resource group that you want to deploy the resources to # the sample assumes everything is in the same resource group, your situation might differ RG="testrg" # the name of the storage account you want to load data to STORAGEACCT="testacct" # the name you want to use for your Azure Container Registry ACR="testacr" # the name prefix you want to use for the ACI instance(s) ACIPREFIX="testaci" # the number of ACI instances you want to use ACICOUNT=5 ``` Deploy the RG, the ACR and build and publish the container to ACR Run this once to seploy the resources the Data Maker needs. If you already have some of these resources you can skip some of these steps ``` bash # Create the Resource Group az group create -n $RG -l $REGION # Create Storage Account if you dont already have one az storage account create --name $STORAGEACCT --access-tier Hot --kind StorageV2 --sku Standard_LRS --https-only true -g $RG -l $REGION # Create Container Registry az acr create --name $ACR --admin-enabled true --sku Standard -g $RG -l $REGION # Package the sample into a docker container and publish it to ACR # PICK ONE OPTION # Here we are building with the published sample code directly from GitHub az acr build -g $RG -r $ACR https://github.com/Azure/azdatamaker.git -f src/AzDataMaker/AzDataMaker/Dockerfile --image azdatamaker:latest # To build using a copy of the code you downloaded use this command # You will need to be in the AzDataMaker directory for this to work az acr build -g $RG -r $ACR . -f src/AzDataMaker/AzDataMaker/Dockerfile --image azdatamaker:latest ``` Create instances of the running app. Environment Variables > Omit options to take the default - **FileCount**: the number of files you want to create. This is the number that you want a given ACR instance to create. So if you have 1 instance and want 100 files, use 100. On the other hand, if you have 5 instances and want 100 files, use 20. int, default 100 - **MaxFileSize**: the max size of the files to create in MiB. double, default 100 - **MinFileSize**: the min size of the files you want to create in MiB. double, default 4 - **ReportStatusIncrement**: after how many files should the application report a status update to the console. If you are creating small files you want to use a larger number to reduce the performance impact of status updates. If you are creating larger files you can use a smaller number. int, default 1000 - **RandomFileContents**: should the objects be filled with all 0's (false) or all random bytes (true). Creating random bytes does require more cpu. bool, default false - **Threads**: the number of threads to use. int, default 2x the number of cpu cores - **BlobContainers**: the storage containers to create the objects in, round robin style - If an number is provided, we will create that many containers using GUIDs as the container names. - If a comma separated list of names is provided, we will use those names - If no value is specified then we will create 5 containters with guids as names. - **ConnectionStrings__MyStorageConnection**: the connection string to the storage account you want the files created in (**Option 1: Connection String auth**) - **StorageAccountUri**: the blob service endpoint URI for the storage account (e.g. `https://.blob.core.windows.net/`) (**Option 2: Managed Identity auth**). When this is set, the container instance must have a Managed Identity with the `Storage Blob Data Contributor` role on the target storage account. > **Authentication options:** Either `ConnectionStrings__MyStorageConnection` or `StorageAccountUri` must be configured. If both are set, the connection string takes precedence. > > If the target storage account has **shared key access disabled**, connection-string auth will not work even if you can read the account key from the management plane. In that case, use `StorageAccountUri` with Managed Identity. ``` bash # Request authentication information from container registry ACRSVR="$(az acr show --name $ACR -g $RG --query loginServer -o tsv)" ACRUSER="$(az acr credential show --name $ACR -g $RG --query username -o tsv)" ACRPWD="$(az acr credential show --name $ACR -g $RG --query passwords[0].value -o tsv)" # Request authentication information from the storage account STORAGEACCTCS="$(az storage account show-connection-string --name $STORAGEACCT -g $RG -o tsv)" # Find the number of currently running instances MAXACI=$(az container list -g $RG --query "max([?starts_with(name, '$ACIPREFIX-')].name)" -o tsv) if [ -z "$MAXACI" ]; then MAXACI=0; else MAXACI=${MAXACI:$(expr length "$ACIPREFIX")+1:$(expr length "$MAXACI")-$(expr length "$ACIPREFIX")-1}; fi for ((x=MAXACI+1; x<=$ACICOUNT ; x++)); do { ACINAME="$(printf -v x %02d $x; echo "$ACIPREFIX-$x";)" echo "Create $ACINAME" az container create \ --name "$ACINAME" \ --resource-group $RG \ --location $REGION \ --cpu 1 \ --memory 1 \ --registry-login-server $ACRSVR \ --registry-username $ACRUSER \ --registry-password $ACRPWD \ --image "$ACRSVR/azdatamaker:latest" \ --restart-policy Never \ --no-wait \ --environment-variables \ FileCount="" \ MaxFileSize="" \ MinFileSize="" \ ReportStatusIncrement="" \ BlobContainers="" \ RandomFileContents="" \ Threads="" \ --secure-environment-variables \ ConnectionStrings__MyStorageConnection=$STORAGEACCTCS } done ``` Alternatively, deploy using **Managed Identity** instead of a connection string. This avoids storing storage account keys and is the recommended approach for production workloads. For Azure Container Instances, a **user-assigned managed identity** is often the most reliable option because you can grant blob access before the container starts. When using a user-assigned identity, also set `AZURE_CLIENT_ID` so `DefaultAzureCredential` selects the intended identity. ``` bash # Get the storage account blob endpoint URI STORAGEACCTURI="https://${STORAGEACCT}.blob.core.windows.net/" # Create a user-assigned managed identity in the deployment resource group IDENTITY="azdatamaker-id" IDENTITYID="$(az identity create -g $RG -n $IDENTITY --query id -o tsv)" IDENTITYCLIENTID="$(az identity show -g $RG -n $IDENTITY --query clientId -o tsv)" IDENTITYPRINCIPALID="$(az identity show -g $RG -n $IDENTITY --query principalId -o tsv)" # Grant the identity blob data access before the container starts # If the storage account is in a different resource group, replace $RG below with that resource group name. STORAGEACCTID="$(az storage account show --name $STORAGEACCT -g $RG --query id -o tsv)" az role assignment create \ --assignee-object-id $IDENTITYPRINCIPALID \ --assignee-principal-type ServicePrincipal \ --role "Storage Blob Data Contributor" \ --scope $STORAGEACCTID az container create \ --name "${ACIPREFIX}-uami" \ --resource-group $RG \ --location $REGION \ --cpu 1 \ --memory 1 \ --registry-login-server $ACRSVR \ --registry-username $ACRUSER \ --registry-password $ACRPWD \ --image "$ACRSVR/azdatamaker:latest" \ --assign-identity $IDENTITYID \ --restart-policy Never \ --environment-variables \ FileCount="" \ MaxFileSize="" \ MinFileSize="" \ ReportStatusIncrement="" \ BlobContainers="" \ RandomFileContents="" \ Threads="" \ StorageAccountUri=$STORAGEACCTURI \ AZURE_CLIENT_ID=$IDENTITYCLIENTID ``` If you prefer not to create a separate identity resource, the following **system-assigned managed identity** flow also works. Be aware that the first run can race the role assignment because the container may start before the permission has propagated. ``` bash # Get the storage account blob endpoint URI STORAGEACCTURI="https://${STORAGEACCT}.blob.core.windows.net/" # Find the number of currently running instances MAXACI=$(az container list -g $RG --query "max([?starts_with(name, '$ACIPREFIX-')].name)" -o tsv) if [ -z "$MAXACI" ]; then MAXACI=0; else MAXACI=${MAXACI#$ACIPREFIX-}; fi for ((x=MAXACI+1; x<=$ACICOUNT ; x++)); do { ACINAME="$ACIPREFIX-$(printf '%02d' $x)" echo "Create $ACINAME" # Create the container instance with a system-assigned managed identity. # --no-wait is intentionally omitted: the container must be fully provisioned # before we can read its principalId and create the role assignment. az container create \ --name "$ACINAME" \ --resource-group $RG \ --location $REGION \ --cpu 1 \ --memory 1 \ --registry-login-server $ACRSVR \ --registry-username $ACRUSER \ --registry-password $ACRPWD \ --image "$ACRSVR/azdatamaker:latest" \ --restart-policy Never \ --assign-identity \ --environment-variables \ FileCount="" \ MaxFileSize="" \ MinFileSize="" \ ReportStatusIncrement="" \ BlobContainers="" \ RandomFileContents="" \ Threads="" \ StorageAccountUri=$STORAGEACCTURI # Wait until the principal ID is available (identity provisioning may lag slightly) PRINCIPALID="" while [ -z "$PRINCIPALID" ]; do sleep 5 PRINCIPALID=$(az container show --name "$ACINAME" -g $RG --query identity.principalId -o tsv) done # Grant the container's system-assigned identity the Storage Blob Data Contributor role # If the storage account is in a different resource group, replace $RG below with that resource group name. STORAGEACCTID=$(az storage account show --name $STORAGEACCT -g $RG --query id -o tsv) az role assignment create \ --assignee $PRINCIPALID \ --role "Storage Blob Data Contributor" \ --scope $STORAGEACCTID } done # Find the number of currently running instances MAXACI=$(az container list -g $RG --query "max([?starts_with(name, '$ACIPREFIX-')].name)" -o tsv) if [ -z "$MAXACI" ]; then MAXACI=0; else MAXACI=${MAXACI:$(expr length "$ACIPREFIX")+1:$(expr length "$MAXACI")-$(expr length "$ACIPREFIX")-1}; fi # Remove Instances if needed for ((x=MAXACI ; x>$ACICOUNT ; x--)); do { ACINAME="$(printf -v x %02d $x; echo "$ACIPREFIX-$x";)" echo "Delete $ACINAME" az container delete \ --name "$ACINAME" \ --resource-group $RG \ --yes } done ``` ## Tips - To reduce memory consumption the application creates all files on disk first and then uploads them. This can create lots of IO, and depending on the size files you want to create and number of threads you are using, will consume lots of local disk space. When creating larger files reduce the number of threads used. In the above examples we deploy to ACI, consider deploying to an environment with more local disk space if very large files are required. ## Contributing This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA. This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.