# open-nvdebug **Repository Path**: mirrors_NVIDIA/open-nvdebug ## Basic Information - **Project Name**: open-nvdebug - **Description**: Tool to collect debug logs from NVIDIA server components, in band and out-of-band. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-31 - **Last Updated**: 2026-03-21 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # OPEN-NVDEBUG > SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. > > SPDX-License-Identifier: Apache-2.0 ## Description **open-nvdebug** is NVIDIA's comprehensive diagnostic collection tool that gathers system information from NVIDIA server platforms to troubleshoot issues effectively. It collects data through multiple methods including Out-of-Band (OOB) access via BMC and In-Band (IB) access via host systems using Redfish, SSH, and IPMI protocols. ## Features - **Comprehensive Data Collection**: Gathers logs from multiple sources in a single command - **Out-of-Band (OOB)**: Remote collection via BMC using Redfish and IPMI - **In-Band (IB)**: Direct collection from host operating system via SSH - **Combined Mode**: Simultaneous OOB and IB collection for complete diagnostics - **Multi-Protocol Support**: - **Redfish API**: BMC log collection via Redfish interface - **SSH**: Direct SSH access to BMC and host systems - **IPMI**: IPMI-over-LAN for BMC communication - **Broad Platform Support**: Supports NVIDIA HGX™, MGX™, GB series, GH series, and Workstation platforms - **Automated Platform Detection**: Automatically detects baseboard type and platform architecture - **Remote & Local Operation**: Works from remote machines or directly on the target system - **Standardized Output**: Generates structured logs with HTML reports for easy analysis - **Parallel Collection**: Optimized multi-threaded collection for faster performance - **Configurable Collectors**: Spreadsheet-driven collector definitions for easy customization ## Prerequisites Before you begin, ensure you have met the following requirements: ### Client Host Requirements - **Operating System**: Linux-based OS (Ubuntu 24.04 recommended, Ubuntu 20.04+ supported) - **Kernel**: Linux Kernel 4.4 or later (4.15+ recommended) - **Python**: Python 3.12 (required) - **Required Packages**: ```shell sudo apt-get install ipmitool sshpass ``` - **Hardware**: Minimum 4GB RAM, 2GB free disk space - **Network**: Access to target systems via BMC (Redfish/IPMI) and SSH ### Server/BMC Requirements For full functionality, target systems should have: - BMC accessible via Redfish, SSH, and IPMI-over-LAN - For host collection: SSH access to host OS with sudo privileges - For advanced collectors: Additional tools installed (nvme-cli, pciutils, dmidecode, lshw, nvidia-fabricmanager, mft-tools, NVIDIA Graphics Driver, doca-sosreport v4.8.0+, etc.) ## Quick Start Get started with nvdebug in 5 minutes: ### Step 1: Verify Installation ```shell python -m src.tool.main --version ``` ### Step 2: Run Your First Collection **Out-of-Band Collection (OOB)** Collect logs remotely via BMC without host OS access: ```shell python -m src.tool.main collect -i -u -p ``` **In-Band Collection (IB)** Collect logs directly from the host OS: ```shell python -m src.tool.main collect -I -U -H ``` **Combined OOB + IB Collection** Collect both BMC and host logs: ```shell python -m src.tool.main collect -i -u -p \ -I -U -H ``` ### Step 3: Specify Baseboard (Optional) nvdebug automatically detects your baseboard, but you can specify it manually: ```shell # List available baseboards python -m src.tool.main list-baseboards # Collect with specific baseboard python -m src.tool.main collect -i -u -p -b "" ``` ### Example Collection ```shell # ARM64 system () with auto-detection python -m src.tool.main collect -i 192.168.1.100 -u admin -p password123 # With verbose output for detailed progress python -m src.tool.main collect -i 192.168.1.100 -u admin -p password123 -v # With custom output directory python -m src.tool.main collect -i 192.168.1.100 -u admin -p password123 -o /tmp/my_logs # Combined OOB and IB collection for python -m src.tool.main collect -i 192.168.1.100 -u bmc_user -p bmc_pass \ -I 192.168.1.101 -U host_user -H host_pass \ -b "" -o /tmp/nvdebug_output ``` ## Advanced Usage ### Local Mode Run nvdebug directly on the target system: ```shell # With BMC access python -m src.tool.main collect -i -u -p --local # Without BMC access (host-only collection) python -m src.tool.main collect --local ``` ### Preflight Checks Run preflight checks to verify system readiness before collection: ```shell python -m src.tool.main preflight -i -u -p ``` ### List Available Resources ```shell # List all supported baseboards python -m src.tool.main list-baseboards # List all available collectors python -m src.tool.main list-collectors # List collectors for specific baseboard python -m src.tool.main list-collectors -b "" ``` ### Configuration File Usage Create a DUT configuration file (`dut_config.yaml`) for repeated collections: ```yaml duts: - name: -node-01 bmc_ip: 192.168.1.100 bmc_user: admin bmc_pass: password123 host_ip: 192.168.1.101 host_user: host_user host_pass: host_password baseboard: "" ``` Run collection using configuration file: ```shell python -m src.tool.main collect --dut-config dut_config.yaml ``` ### Collection Options ```shell # Verbose output for detailed progress python -m src.tool.main collect -i -u -p -v # Very verbose output for debugging python -m src.tool.main collect -i -u -p -vv # Specify custom output directory python -m src.tool.main collect -i -u -p -o /custom/path # Specify baseboard manually (skip auto-detection) python -m src.tool.main collect -i -u -p -b "" ``` ## Understanding Output After running nvdebug, you'll find a timestamped directory containing all collected data: ``` nvdebug_logs__