# k8s-launch-kit **Repository Path**: mirrors_NVIDIA/k8s-launch-kit ## Basic Information - **Project Name**: k8s-launch-kit - **Description**: K8s Launch Kit (l8k) is a CLI tool for deploying and managing NVIDIA cloud-native solutions on Kubernetes. The tool helps provide flexible deployment workflows for optimal network performance with SR-IOV, RDMA, and other networking technologies. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-11-16 - **Last Updated**: 2026-03-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # K8s Launch Kit - CLI for configuring NVIDIA cloud-native solutions K8s Launch Kit (l8k) is a CLI tool for deploying and managing NVIDIA cloud-native solutions on Kubernetes. The tool helps provide flexible deployment workflows for optimal network performance with SR-IOV, RDMA, and other networking technologies. ## Operation Phases ### Discover Cluster Configuration Deploy a minimal Network Operator profile to automatically discover your cluster's network capabilities and hardware configuration. This phase can be skipped if you provide your own configuration file. ### Select the Deployment Profile Specify the desired deployment profile via CLI flags or with the natural language prompt for the LLM. ### Generate Deployment Files Based on the discovered/provided configuration, generate a complete set of YAML deployment files tailored to your selected network profile. ## Installation ### Build from source ```bash git clone cd launch-kubernetes make build ``` The binary will be available at `build/l8k`. ### Install After building, install the binary, profiles, and config to `/usr/local`: ```bash make install # Copies binary, profiles, config to /usr/local make dev-install # Symlinks instead of copies (for development) ``` This runs `scripts/install.sh`, which places: - `/bin/l8k` - `/share/l8k/profiles/` - `/share/l8k/l8k-config.yaml` Default prefix is `/usr/local`. Override with `PREFIX=/opt/l8k make install`. ### Docker ```bash make docker-build # Build Docker image (l8k:v0.1.0 + l8k:latest) make docker-build-local # Build inside container, extract binary to host build/l8k ``` `docker-build-local` is useful when you don't have the Go toolchain installed — it compiles inside a container and copies the resulting binary to `build/l8k` on your host. ```bash # Run from the Docker image docker run --net=host \ -v ~/.kube:/kube:ro \ -v $(pwd):/output \ l8k:latest discover --kubeconfig /kube/config \ --save-cluster-config /output/cluster-config.yaml ``` ## Usage ``` K8s Launch Kit (l8k) is a CLI tool for deploying and managing NVIDIA cloud-native solutions on Kubernetes. The tool helps provide flexible deployment workflows for optimal network performance with SR-IOV, RDMA, and other networking technologies. ### Discover Cluster Configuration Deploy a minimal Network Operator profile to automatically discover your cluster's network capabilities and hardware configuration by using --discover-cluster-config. This phase can be skipped if you provide your own configuration file by using --user-config. This phase requires --kubeconfig to be specified. ### Generate Deployment Files Based on the discovered or provided configuration, generate a complete set of YAML deployment files for the selected network profile. Files can be saved to disk using --save-deployment-files. The profile can be defined manually with --fabric, --deployment-type and --multirail flags, OR generated by an LLM-assisted profile generator with --prompt (requires --llm-api-key and --llm-vendor). ### Deploy to Cluster Apply the generated deployment files to your Kubernetes cluster by using --deploy. This phase requires --kubeconfig and can be skipped if --deploy is not specified. Usage: l8k [flags] l8k [command] Available Commands: completion Generate the autocompletion script for the specified shell help Help about any command version Print the version number Flags: --ai Enable AI deployment --deploy Deploy the generated files to the Kubernetes cluster --deployment-type string Select the deployment type (sriov, rdma_shared, host_device) --discover-cluster-config Deploy a thin Network Operator profile to discover cluster capabilities --enabled-plugins string Comma-separated list of plugins to enable (default "network-operator") --fabric string Select the fabric type to deploy (infiniband, ethernet) --group string Generate templates for a specific group only (e.g., group-0) -h, --help help for l8k --kubeconfig string Path to kubeconfig file for cluster deployment (required when using --deploy) --label-selector string Filter nodes for discovery by label (default "feature.node.kubernetes.io/pci-15b3.present=true") --llm-api-key string API key for the LLM API (required when using --prompt) --llm-api-url string API URL for the LLM API --llm-interactive Enable interactive chat mode for LLM-assisted profile selection --llm-model string Model name for the LLM API (e.g., claude-3-5-sonnet-20241022, gpt-4) --llm-vendor string Vendor of the LLM API: openai, openai-azure, anthropic, gemini (default "openai-azure") --log-file string Write logs to file instead of stderr --log-level string Enable logging at specified level (debug, info, warn, error) --multiplane-mode string Spectrum-X multiplane mode: swplb, hwplb, uniplane (requires --spectrum-x) --multirail Enable multirail deployment --network-operator-namespace string Override the network operator namespace from the config file --number-of-planes int Number of planes for Spectrum-X (requires --spectrum-x) --prompt string Path to file with a prompt to use for LLM-assisted profile generation --save-cluster-config string Save discovered cluster configuration to the specified path (defaults to --user-config path if set, otherwise ./cluster-config.yaml) --save-deployment-files string Save generated deployment files to the specified directory (default "./deployment") --spcx-version string Spectrum-X firmware version (requires --spectrum-x) --spectrum-x Enable Spectrum X deployment --user-config string Use provided cluster configuration file (as base config for discovery or as full config without discovery) Use "l8k [command] --help" for more information about a command. ``` > **Note:** The help text above is auto-generated. Run `make update-readme` after CLI changes to refresh it. ## Usage Examples ### Subcommand Workflow (Recommended) Discover cluster hardware: ```bash l8k discover --kubeconfig ~/.kube/config \ --save-cluster-config ./cluster-config.yaml ``` Generate deployment manifests: ```bash l8k generate --user-config ./cluster-config.yaml \ --fabric ethernet --deployment-type sriov --multirail \ --save-deployment-files ./deployments ``` Interactive AI-assisted troubleshooting or profile selection: ```bash l8k chat --kubeconfig ~/.kube/config \ --user-config ./cluster-config.yaml \ --llm-api-key $KEY --llm-vendor anthropic \ --llm-model claude-sonnet-4-20250514 ``` Collect a diagnostic dump: ```bash l8k sosreport --kubeconfig ~/.kube/config ``` ### Complete Workflow (Root Command) The root command still supports all flags for backward compatibility and running the full pipeline in one shot: ```bash l8k --discover-cluster-config --save-cluster-config ./cluster-config.yaml \ --fabric ethernet --deployment-type sriov --multirail \ --save-deployment-files ./deployments \ --deploy --kubeconfig ~/.kube/config ``` ### Discover Cluster Configuration Using the subcommand: ```bash l8k discover --kubeconfig ~/.kube/config \ --save-cluster-config ./my-cluster-config.yaml ``` Filter discovery to specific nodes using a label selector: ```bash l8k discover --kubeconfig ~/.kube/config \ --save-cluster-config ./my-cluster-config.yaml \ --label-selector "feature.node.kubernetes.io/pci-15b3.present=true" ``` Or using the root command (backward compatible): ```bash l8k --discover-cluster-config --save-cluster-config ./my-cluster-config.yaml \ --kubeconfig ~/.kube/config ``` ### Discovery with User-Provided Base Config Use your own config file (with custom network operator version, subnets, etc.) as the base for discovery. Without `--save-cluster-config`, the file is rewritten in place with discovery results: ```bash l8k discover --user-config ./my-config.yaml \ --kubeconfig ~/.kube/config ``` Save discovery results to a separate file instead: ```bash l8k discover --user-config ./my-config.yaml \ --save-cluster-config ./discovered-config.yaml \ --kubeconfig ~/.kube/config ``` ### Use Existing Configuration Generate and deploy with pre-existing config: ```bash l8k generate --user-config ./existing-config.yaml \ --fabric ethernet --deployment-type sriov --multirail \ --save-deployment-files ./deployments \ --deploy --kubeconfig ~/.kube/config ``` ### Generate Deployment Files ```bash l8k generate --user-config ./config.yaml \ --fabric ethernet --deployment-type sriov --multirail \ --save-deployment-files ./deployments ``` ### Generate Deployment Files for a Specific Node Group In heterogeneous clusters, discovery produces multiple node groups. Use `--group` to generate manifests for a single group: ```bash l8k generate --user-config ./config.yaml \ --fabric infiniband --deployment-type sriov --multirail \ --group group-0 \ --save-deployment-files ./deployments ``` ### Generate Deployment Files using Natural Language Prompt ```bash echo "I want to enable multirail networking in my AI cluster" > requirements.txt l8k generate --user-config ./config.yaml \ --prompt requirements.txt --llm-vendor openai-azure --llm-api-key \ --save-deployment-files ./deployments ``` ### Troubleshooting Network Operator Issues Use the `chat` subcommand for interactive AI-assisted troubleshooting. The AI agent can collect and analyze diagnostic data (sosreport) from the cluster: ```bash l8k chat --kubeconfig ~/.kube/config \ --user-config ./cluster-config.yaml \ --llm-api-key $KEY --llm-vendor anthropic \ --llm-model claude-sonnet-4-20250514 ``` In the session, ask about issues: *"My OFED driver pods are crashing, can you investigate?"* The AI agent will automatically collect a sosreport from the cluster, examine the diagnostic data, and provide analysis with remediation steps. You can also collect a sosreport separately and provide it to the chat session (no cluster access needed): ```bash l8k sosreport --kubeconfig ~/.kube/config l8k chat --sosreport-path ./network-operator-sosreport-20260306-120000 \ --llm-api-key $KEY --llm-vendor anthropic \ --llm-model claude-sonnet-4-20250514 ``` ### AI Agent / Automation Usage l8k supports structured output for AI agents and CI/CD pipelines. Use `--output json` to get machine-readable output, `--yes` to skip interactive prompts, and `--dry-run` to preview changes safely. #### Structured JSON Output ```bash # Get structured output for programmatic consumption l8k generate --user-config ./config.yaml \ --fabric ethernet --deployment-type sriov --multirail \ --save-deployment-files ./deployments \ --output json --yes 2>/dev/null | jq . ``` Example JSON output: ```json { "success": true, "phase": "generate", "profile": { "fabric": "ethernet", "deployment": "sriov", "multirail": "true" }, "generatedFiles": [ "./deployments/network-operator/nic-cluster-policy.yaml", "./deployments/network-operator/sriov-network-node-policy.yaml" ], "deployed": false, "messages": [ {"level": "info", "message": "Generating files for profile: SR-IOV Ethernet RDMA", "timestamp": "..."} ] } ``` #### Dry-Run Preview Preview what would be deployed without making changes: ```bash l8k generate --user-config ./config.yaml --spectrum-x --deploy \ --dry-run --output json --kubeconfig ~/.kube/config ``` #### Schema Discovery AI agents can programmatically discover l8k's capabilities: ```bash l8k schema ``` This outputs a JSON description of available phases, fabrics, deployment types, flags, exit codes, and output formats. #### Exit Codes | Code | Meaning | |------|---------| | 0 | Success | | 1 | General error | | 2 | Validation error (bad flags, invalid config) | | 3 | Cluster error (API unreachable, discovery failed) | | 4 | Deployment error (apply failed) | | 5 | Partial success (discovery ok but deploy failed) | In JSON mode, errors include structured fields (`code`, `category`, `transient`, `suggestion`) to help agents decide whether to retry or fix input. ## Configuration file During cluster discovery stage, Kubernetes Launch Kit creates a configuration file, which it later uses to generate deployment manifests from the templates. This config file can be edited by the user to customize their deployment configuration. The user can provide the custom config file to the tool using the `--user-config` cli flag — either as a standalone config (skipping discovery) or as a base config combined with `l8k discover` / `--discover-cluster-config` (discovery takes network operator parameters from the file and adds discovered cluster config). The tool resolves configuration and profile paths in order: local directory first (`./l8k-config.yaml`, `./profiles`), then installed location (`/usr/local/share/l8k/`), then binary-relative. ### DOCA Driver The `docaDriver` section controls the OFED driver deployment in the NicClusterPolicy. Set `enable: true` to include the `ofedDriver` section in generated manifests, or `enable: false` to omit it. This can also be overridden via the `--enable-doca-driver` CLI flag. #### OFED Dependent Module Blacklisting When the DOCA/OFED driver loads on a node, it replaces the inbox MLX kernel modules (`mlx5_core`, `mlx5_ib`, `ib_core`, etc.) with its own versions. However, if third-party or distribution-specific kernel modules depend on the inbox MLX modules (e.g., `iw_cm`, `nfsrdma`), they will block the inbox modules from being unloaded, causing the DOCA driver to fail to load or leaving the system in an inconsistent state. To solve this, `unloadDependentModules: true` enables a pre-flight check during cluster discovery. The tool execs into `nic-configuration-daemon` pods and builds a full reverse dependency graph from `/sys/module/*/holders/` for all loaded modules, then BFS-traverses from each of the following MLX/OFED kernel modules to find all transitive non-MOFED dependents: `mlx5_core`, `mlx5_ib`, `ib_umad`, `ib_uverbs`, `ib_ipoib`, `rdma_cm`, `rdma_ucm`, `ib_core`, `ib_cm` Any kernel modules found as transitive dependents of these — but not the MLX modules themselves — are saved per group as `ofedDependentModules`. During manifest generation, these modules are passed to the DOCA driver pod via the `UNLOAD_CUSTOM_MODULES` environment variable (space-separated), which tells the driver to blacklist and unload them before attempting to replace the inbox modules. Module discovery always runs during cluster discovery (so results are saved for inspection), but the `UNLOAD_CUSTOM_MODULES` env var is only rendered when `unloadDependentModules` is `true`. When multiple node groups are merged, their dependent modules are aggregated as a union. ```yaml docaDriver: enable: true version: doca3.3.0-26.01-1.0.0.0-0 unloadStorageModules: true enableNFSRDMA: false unloadDependentModules: true # Enable dependent module discovery and unloading ``` After discovery, the config will contain the discovered dependents: ```yaml clusterConfig: - identifier: group-0 ofedDependentModules: - iw_cm ``` The generated NicClusterPolicy `ofedDriver` section will include: ```yaml env: - name: UNLOAD_CUSTOM_MODULES value: "iw_cm" ``` ### NV-IPAM Subnet Configuration The `nvIpam` section supports two modes for subnet configuration: **Option 1: Manual subnet list** — List each subnet explicitly. This takes precedence if the list is non-empty: ```yaml nvIpam: poolName: nv-ipam-pool subnets: - subnet: 192.168.2.0/24 gateway: 192.168.2.1 - subnet: 192.168.3.0/24 gateway: 192.168.3.1 ``` **Option 2: Auto-generate subnets** — When the `subnets` list is empty but `startingSubnet`, `mask`, and `offset` are all set, subnets are automatically generated. Each cluster config group gets its own unique, non-overlapping subnet slice. The gateway for each subnet is the first usable address (network + 1). ```yaml nvIpam: poolName: nv-ipam-pool startingSubnet: "192.168.2.0" mask: 24 offset: 1 ``` With the auto-generation example above, a cluster with 2 groups (4 east-west PFs each) would receive: - Group 0: 192.168.2.0/24, 192.168.3.0/24, 192.168.4.0/24, 192.168.5.0/24 - Group 1: 192.168.6.0/24, 192.168.7.0/24, 192.168.8.0/24, 192.168.9.0/24 The `offset` parameter controls how many subnet blocks to skip between consecutive subnets (offset=1 is contiguous, offset=2 skips every other). Example of the configuration file discovered from the cluster: ```yaml networkOperator: version: v26.1.0 componentVersion: network-operator-v26.1.0 repository: nvcr.io/nvidia/mellanox namespace: nvidia-network-operator docaDriver: enable: true version: doca3.2.0-25.10-1.2.8.0-2 unloadStorageModules: false enableNFSRDMA: false unloadDependentModules: false nvIpam: poolName: nv-ipam-pool subnets: - subnet: 192.168.2.0/24 gateway: 192.168.2.1 - subnet: 192.168.3.0/24 gateway: 192.168.3.1 - subnet: 192.168.4.0/24 gateway: 192.168.4.1 - subnet: 192.168.5.0/24 gateway: 192.168.5.1 - subnet: 192.168.6.0/24 gateway: 192.168.6.1 - subnet: 192.168.7.0/24 gateway: 192.168.7.1 - subnet: 192.168.8.0/24 gateway: 192.168.8.1 - subnet: 192.168.9.0/24 gateway: 192.168.9.1 - subnet: 192.168.10.0/24 gateway: 192.168.10.1 - subnet: 192.168.11.0/24 gateway: 192.168.11.1 - subnet: 192.168.12.0/24 gateway: 192.168.12.1 - subnet: 192.168.13.0/24 gateway: 192.168.13.1 - subnet: 192.168.14.0/24 gateway: 192.168.14.1 - subnet: 192.168.15.0/24 gateway: 192.168.15.1 - subnet: 192.168.16.0/24 gateway: 192.168.16.1 - subnet: 192.168.17.0/24 gateway: 192.168.17.1 - subnet: 192.168.18.0/24 gateway: 192.168.18.1 - subnet: 192.168.19.0/24 sriov: ethernetMtu: 9000 infinibandMtu: 4000 numVfs: 8 priority: 90 resourceName: sriov_resource networkName: sriov-network hostdev: resourceName: hostdev-resource networkName: hostdev-network rdmaShared: resourceName: rdma_shared_resource hcaMax: 63 ipoib: networkName: ipoib-network macvlan: networkName: macvlan-network nicConfigurationOperator: deployNicInterfaceNameTemplate: true # Enable NIC rename when needed (see NIC Interface Name Templates section) rdmaPrefix: "rdma_r%rail%" # RDMA device name template (%rail% substituted per rail) netdevPrefix: "eth_r%rail%" # Network interface name template (%rail% substituted per rail) spectrumX: nicType: "1023" overlay: none rdmaPrefix: roce_p%plane%_r%rail% # Spectrum-X uses its own prefixes (with %plane%) netdevPrefix: eth_p%plane%_r%rail% clusterConfig: - identifier: group-0 capabilities: nodes: sriov: true rdma: true ib: true pfs: - deviceID: a2dc rdmaDevice: "" pciAddress: "0000:19:00.0" networkInterface: "" traffic: east-west rail: 0 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:2a:00.0 networkInterface: "" traffic: east-west rail: 1 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:3b:00.0 networkInterface: "" traffic: east-west rail: 2 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:4c:00.0 networkInterface: "" traffic: east-west rail: 3 - deviceID: 101f rdmaDevice: "" pciAddress: 0000:5a:00.0 networkInterface: "" traffic: east-west rail: 4 - deviceID: 101f rdmaDevice: "" pciAddress: 0000:5a:00.1 networkInterface: "" traffic: east-west rail: 5 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:9b:00.0 networkInterface: "" traffic: east-west rail: 6 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:ab:00.0 networkInterface: "" traffic: east-west rail: 7 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:c1:00.0 networkInterface: "" traffic: east-west rail: 8 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:cb:00.0 networkInterface: "" traffic: east-west rail: 9 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:d8:00.0 networkInterface: "" traffic: east-west rail: 10 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:d8:00.1 networkInterface: "" traffic: east-west rail: 11 workerNodes: - pdx-g22r13-2894-lh2-w01 - pdx-g24r13-2894-lh2-w02 nodeSelector: nvidia.com/gpu.machine: ThinkSystem-SR680a-V3 - identifier: group-1 capabilities: nodes: sriov: true rdma: true ib: true pfs: - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:1a:00.0 networkInterface: "" traffic: east-west rail: 0 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:3c:00.0 networkInterface: "" traffic: east-west rail: 1 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:4d:00.0 networkInterface: "" traffic: east-west rail: 2 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:5e:00.0 networkInterface: "" traffic: east-west rail: 3 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:9c:00.0 networkInterface: "" traffic: east-west rail: 4 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:9d:00.0 networkInterface: "" traffic: east-west rail: 5 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:9d:00.1 networkInterface: "" traffic: east-west rail: 6 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:bc:00.0 networkInterface: "" traffic: east-west rail: 7 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:cc:00.0 networkInterface: "" traffic: east-west rail: 8 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:dc:00.0 networkInterface: "" traffic: east-west rail: 9 workerNodes: - pdx-g22r23-2894-dh2-w03 - pdx-g24r23-2894-dh2-w04 nodeSelector: nvidia.com/gpu.machine: PowerEdge-XE9680 - identifier: group-2 capabilities: nodes: sriov: true rdma: true ib: true pfs: - deviceID: a2dc rdmaDevice: "" pciAddress: "0000:09:00.0" networkInterface: "" traffic: east-west rail: 0 - deviceID: a2dc rdmaDevice: "" pciAddress: "0000:23:00.0" networkInterface: "" traffic: east-west rail: 1 - deviceID: a2dc rdmaDevice: "" pciAddress: "0000:35:00.0" networkInterface: "" traffic: east-west rail: 2 - deviceID: a2dc rdmaDevice: "" pciAddress: "0000:35:00.1" networkInterface: "" traffic: east-west rail: 3 - deviceID: a2dc rdmaDevice: "" pciAddress: "0000:53:00.0" networkInterface: "" traffic: east-west rail: 4 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:69:00.0 networkInterface: "" traffic: east-west rail: 5 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:8f:00.0 networkInterface: "" traffic: east-west rail: 6 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:9c:00.0 networkInterface: "" traffic: east-west rail: 7 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:cd:00.0 networkInterface: "" traffic: east-west rail: 8 - deviceID: a2dc rdmaDevice: "" pciAddress: 0000:f1:00.0 networkInterface: "" traffic: east-west rail: 9 workerNodes: - pdx-g22r31-2894-ch2-w05 - pdx-g24r31-2894-ch2-w06 nodeSelector: nvidia.com/gpu.machine: UCSC-885A-M8-H22 ``` ### North-South Traffic Detection During cluster discovery, the tool automatically identifies BlueField DPU devices (as opposed to SuperNICs or ConnectX NICs) by matching each device's `partNumber` against a known list of DPU product codes in [pkg/networkoperatorplugin/ns-product-ids](pkg/networkoperatorplugin/ns-product-ids). Devices matching a DPU product code are classified as **north-south** traffic (management/external), while all other devices are classified as **east-west** traffic (GPU interconnect). North-south PFs are included in the saved cluster configuration for visibility, but are **automatically filtered out** during template rendering so that only east-west PFs appear in the generated manifests. Each east-west PF is assigned a sequential rail number (rail-0, rail-1, rail-2, ...) used for naming resources like SriovNetworkNodePolicy and IPPool entries. Example of mixed traffic types in the config: ```yaml clusterConfig: - identifier: group-0 pfs: - deviceID: a2dc pciAddress: "0000:19:00.0" traffic: east-west # SuperNIC — included in manifests rail: 0 - deviceID: a2dc pciAddress: "0000:2a:00.0" traffic: east-west rail: 1 - deviceID: a2dc pciAddress: "0000:3b:00.0" traffic: north-south # BlueField DPU — excluded from manifests ``` ### NIC Interface Name Templates The `nicConfigurationOperator.deployNicInterfaceNameTemplate` setting controls whether a `NicInterfaceNameTemplate` CR is deployed to rename NIC interfaces to predictable, rail-based names (e.g., `eth_r0`, `eth_r1`). When set to `true`, the tool treats it as "enable when needed" rather than "always enable". The NicInterfaceNameTemplate CR and associated `nicConfigurationOperator` section in NicClusterPolicy are only deployed when one of the following conditions is met: 1. **Merged groups with PCI address conflicts** — When multiple node groups share the same GPU product type and are merged into a single group, but the same PCI address appears at different rail positions across groups. In this case PCI addresses alone cannot identify the correct rail, so interface name templates are used instead. 2. **rdma_shared deployment with empty network interface names** — When the deployment type is `rdma_shared` (macvlan-rdma-shared or ipoib-rdma-shared profiles) and PFs have empty `networkInterface` fields. The `rdmaSharedDevicePlugin` uses `ifNames` selectors that require interface names, so NicInterfaceNameTemplate must be enabled to provide them. This typically happens when discovery finds multiple nodes per group and omits device names for safety. When neither condition holds, name templates are disabled and the device plugin uses PCI addresses directly, avoiding the overhead of deploying the NIC configuration operator. ## Docker container You can run the l8k tool as a docker container: ```bash docker run -v ~/launch-kubernetes/user-prompt:/user-prompt -v ~/remote-cluster/:/remote-cluster -v /tmp:/output --net=host nvcr.io/nvidia/cloud-native/k8s-launch-kit:v26.1.0 --discover-cluster-config --kubeconfig /remote-cluster/kubeconf.yaml --save-cluster-config /output/config.yaml --log-level debug --save-deployment-files /output --fabric infiniband --deployment-type rdma_shared --multirail ``` Don't forget to enable `--net=host` and mount the necessary directories for input and output files with `-v`. ## Development ### Building ```bash make build # Build for current platform make build-all # Build for all platforms make clean # Clean build artifacts ``` ### Testing ```bash make test # Run tests make coverage # Run tests with coverage ``` ### Linting ```bash make lint # Run linter make lint-check # Install and run linter ``` ### Docker ```bash make docker-build # Build Docker image make docker-run # Run Docker container ```