# osdu-data-load-tno **Repository Path**: mirrors_Azure/osdu-data-load-tno ## Basic Information - **Project Name**: osdu-data-load-tno - **Description**: Data loading process for OSDU on Azure - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-06-22 - **Last Updated**: 2026-03-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # OSDU Data Load TNO - C# Implementation An improved C# application for loading TNO (Netherlands Organisation for Applied Scientific Research) data into the OSDU platform. ## Key Features - **Simple CLI Interface** - Three intuitive commands to get you started - **Automatic Processing** - Handles all TNO data types in the correct dependency order - **File Upload Support** - Complete 4-step OSDU file upload workflow - **Secure Authentication** - Uses Azure Identity for passwordless authentication - **Progress Tracking** - Real-time progress updates and detailed logging - **Error Resilience** - Comprehensive retry policies and error handling - **Clean Architecture** - CQRS pattern with proper separation of concerns ## Data Loading Process Overview The application follows a comprehensive 6-step process to load TNO data into OSDU: 1. **Downloads TNO Dataset Files** - Retrieves official TNO test data from GitLab repository 2. **Creates Legal Tag** - Establishes required legal compliance tags for data governance 3. **Uploads Files to OSDU** - Executes 4-step file upload workflow: - Requests file upload URL from File API - Uploads file content to storage - Submits metadata to File Service - Maintains registry of uploaded files with IDs and versions 4. **Generates Non-Work Product Manifests** - Creates manifests for master data: - Uses CSV templates to generate individual manifests for each data row - Processes reference data, wells, wellbores, and related entities 5. **Generates Work Product Manifests** - Creates work product metadata: - Iterates through uploaded files registry - Retrieves JSON metadata from work product folders - Updates manifests with legal tags, ACL permissions, and data partition IDs 6. **Uploads Manifests** - Submits all manifests to OSDU in correct dependency order For detailed information about each step, see [Data Load Process Documentation](docs/DATA_LOAD_PROCESS.md). ## Quick Start ### 1. Prerequisites Before you begin, ensure you have: - **.NET 9.0** or later installed - **Azure CLI** for authentication: `az login --tenant your-tenant-id` - [**Azure Developer CLI (azd)** for deployments](https://learn.microsoft.com/en-us/azure/developer/azure-developer-cli/install-azd) - **OSDU Platform Access** with `users.datalake.ops` and `users@.dataservices.energy roles` role - **Visual Studio** or **VS Code** (optional, for development) ### 2. Configure the Application Update `appsettings.json` in the `src/OSDU.DataLoad.Console/` directory with your OSDU instance details: ```json { "Osdu": { "BaseUrl": "https://your-osdu-instance.com", "TenantId": "your-tenant-id", "ClientId": "your-client-id", "DataPartition": "your-data-partition", "LegalTag": "{DataPartition}-your-legal-tag", "AclViewer": "data.default.viewers@{DataPartition}.dataservices.energy", "AclOwner": "data.default.owners@{DataPartition}.dataservices.energy" } } ``` **Note**: You can provide environment variables instead. See: **[Configuration Guide](docs/CONFIGURATION.md)** ### 3. Build and Run ```bash # Navigate to the console project cd src/OSDU.DataLoad.Console # Build the solution dotnet build # Run commands directly dotnet run -- help dotnet run -- download --destination "~/osdu-data/tno" dotnet run -- load --source "~/osdu-data/tno" ``` ## Available Commands ### Default Behavior (No Arguments) ```bash # Run without any arguments - downloads data if needed, then loads it dotnet run ``` When run without arguments, the application will: 1. Check for TNO data in `~/osdu-data/tno/` (user home directory) 2. Download the test data if not present (~2.2GB) 3. Load all data types into OSDU platform automatically This is the **easiest way to get started** - just configure your OSDU settings and run! ### Help Command ```bash # From console project directory (recommended) dotnet run -- help # Or from src directory dotnet run --project OSDU.DataLoad.Console --working-directory OSDU.DataLoad.Console -- help ``` Shows available commands, usage examples, and current configuration status. ### Download TNO Test Data ```bash # Download ~2.2GB of official test data (from console project directory) dotnet run -- download --destination "~/osdu-data/tno" # Overwrite existing data dotnet run -- download --destination "~/osdu-data/tno" --overwrite ``` ### Load Data ```bash # Load all TNO data types in dependency order (from console project directory) dotnet run -- load --source "~/osdu-data/tno" ``` ## Azure Deployments ### Configure Environment 1. Create an azd environment ```bash # Navigate to the project root azd init -e dev ``` 2. Configure the environment variables ```bash azd env set OSDU_TenantId $(az account show --query tenantId -o tsv ) azd env set AZURE_SUBSCRIPTION_ID azd env set AZURE_LOCATION azd env set OSDU_BaseUrl azd env set OSDU_ClientId azd env set OSDU_DataPartition azd env set OSDU_LegalTag <{DataPartition}-your-legal-tag> azd env set OSDU_AclViewer azd env set OSDU_AclOwner ``` ### Deploy the Infrastructure ```bash azd provision ``` ### Assign managed identity `users.datalake.ops` role **Important**: Get the object ID of the managed identity and assign it `users.datalake.ops` and `users@.dataservices.energy roles`on your data partition. ### Deploy the Application and monitor the container's console output ```bash azd deploy ``` ## Additional Resources For detailed information on specific topics, see our documentation: - **[Data Loading Process](docs/DATA_LOAD_PROCESS.md)** - Detailed workflow and processing order - **[Configuration Guide](docs/CONFIGURATION.md)** - Advanced configuration options and environment variables --- ## Common Issues and Solutions ### 1. Authentication Failures **Symptoms**: HTTP 401 errors, "Failed to authenticate" messages **Solutions**: - **Azure CLI**: Ensure you're logged in: `az login --tenant your-tenant-id` - **Permissions**: Verify you have the `users.datalake.ops` and `users@.dataservices.energy roles` role in OSDU - **Configuration**: Check TenantId and ClientId in configuration - **Managed Identity**: Verify Managed Identity is configured (when running on Azure) - **Scope**: Ensure the scope is correctly set to `{ClientId}/.default` - **Environment Variables**: Verify `AZURE_CLIENT_ID`, `AZURE_TENANT_ID` are set correctly ### 2. Performance Issues **Symptoms**: Slow upload speeds, timeouts **Solutions**: - **Run upload in Azure**: See [Azure Deployments](#azure-deployments) - **Adjust batch size**: Adjust the MasterDataManifestSubmissionBatchSize value to increae the number of manifests submitted in a single workflow request. ### 3. File Upload - Metadata Issues **Symptoms**: The file is uploaded and metadata is created, but /v2/records/{id} returns 404 ``` fail: OSDU.DataLoad.Infrastructure.Services.OsduHttpClient[0] [2e82ab6a] GET https://pm44a0805b33bc4.oep.ppe.azure-int.net/api/storage/v2/records/opendes:dataset--File.Generic:e4f2b1ee-2732-4259-ab47-d30ff4c2a095 failed with status NotFound fail: OSDU.DataLoad.Infrastructure.Services.OsduHttpClient[0] [2e82ab6a] Step 4 Failed: Could not retrieve record version for FileID: opendes:dataset--File.Generic:e4f2b1ee-2732-4259-ab47-d30ff4c2a095 ``` **Solutions**: - Restart the OSDU-Storage pods ### 4. No container app logs **Symptoms**: No logs in the container app. You may see a kubernetes error. **Solutions**: - **Redeploy**: Redeploy the container with `az deploy` ## Contributing This solution follows Clean Architecture and CQRS principles. For detailed information on contributing: - Review the existing code patterns and structure - Follow established naming conventions - Add appropriate unit tests for new features - Update documentation as needed This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA. This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. ## Trademarks This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies. OSDU is a trademark of The Open Group.