# IIoT-IDS **Repository Path**: csnow9/ids ## Basic Information - **Project Name**: IIoT-IDS - **Description**: 面向工业物联网的轻量化动态开集入侵检测系统 - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 2 - **Forks**: 0 - **Created**: 2025-10-28 - **Last Updated**: 2026-03-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # IIoT-IDS This folder contains Python scripts that mirror the notebooks (`IIoT-IDS.ipynb`): from raw CSVs through **closed-set classification**, **open-set recognition**, and **dynamic update detection**, so you can reproduce and integrate the pipeline from the command line. --- ## Requirements - Python 3.10+ (3.10–3.12 recommended) - Typical stack: `pandas`, `numpy`, `scikit-learn`, `matplotlib`, `seaborn`, `imbalanced-learn` (`imblearn`), `tensorflow` / `keras`, `scipy` Install example: ```bash pip install pandas numpy scikit-learn matplotlib seaborn imbalanced-learn tensorflow scipy ``` --- ## Recommended pipeline ``` Raw data (gas_final.arff.csv) ↓ data_preprocess.py # clean, one-hot, standardize, SMOTEENN, export balanced data, etc. ↓ data_seen_unseen.py # known/unknown split, MinMax, write seen_*.npy ↓ vae_closed_set_training.py # 1D Conv VAE + frozen-encoder classifier; save weights + closed_set_predictions.npz ↓ confusion_matrix_viz.py # (optional) closed-set confusion-matrix heatmaps ↓ open_set_detection.py # softmax + reconstruction threshold + EVT/GPD; write open_set_for_dynamic.npz ↓ dynamic_update_detection.py # student/teacher + distillation; augment with unknown + final_dataset, distill again ``` If you already have `gas_encoded_balance_data0.csv` and the four `seen_*.npy` files, you can start from the matching step. --- ## Scripts | File | Role | |------|------| | `data_preprocess.py` | Load `gas_final.arff.csv`, one-hot encode, standardize, SMOTEENN, split and export balanced data (e.g. `gas_encoded_balance_data0.csv`, `normalized_data.csv`, `.npy` artifacts). | | `data_seen_unseen.py` | Load balanced data, remap labels, `MinMaxScaler`, train/test split, known (0–6) vs unknown (7), save four `seen_*.npy` files. | | `vae_closed_set_training.py` | Closed set: VAE pre-training + softmax classifier on frozen encoder; saves `open_set_vae.weights.h5`, `open_set_classifier.weights.h5`, and `closed_set_predictions.npz`. | | `confusion_matrix_viz.py` | Load `closed_set_predictions.npz`, print metrics, plot confusion matrices (counts + row-normalized), save PNG. | | `open_set_detection.py` | Open set: load weights and `seen_*.npy`, calibrate reconstruction error, merge test sets, hybrid prediction (unknown label defaults to **7**), EVT/GPD; writes `open_set_for_dynamic.npz`. | | `dynamic_update_detection.py` | Dynamic update: student/teacher and distillation; read unknowns from `open_set_for_dynamic.npz`, merge with samples from `final_dataset_new.csv`, retrain and evaluate. | | `IIoT-IDS.py` | Monolithic export from Jupyter (single long script) for reference; prefer the modular scripts above for day-to-day use. | --- ## Main inputs / outputs | Artifact | Description | |------------|-------------| | `gas_final.arff.csv` | Raw feature table (preprocessing entry point). | | `gas_encoded_balance_data0.csv` | Balanced feature table with `result` (from preprocessing). | | `seen_train_X.npy`, `seen_train_y.npy`, `seen_test_X.npy`, `seen_test_y.npy` | Arrays after known/unknown splitting. | | `open_set_vae.weights.h5`, `open_set_classifier.weights.h5` | VAE and classifier weights from closed-set training. | | `closed_set_predictions.npz` | `true_labels`, `predicted_labels`, etc. on the closed-set test split. | | `open_set_for_dynamic.npz` | `merged_pred`, `test_X_merge`, `test_y_merge` for the dynamic-update stage. | | `final_dataset_new.csv` | Extra data for augmentation in the dynamic script (place in the project or set path in config). | | `train_X_new.npy`, etc. | Augmented train/test tensors written by the dynamic script. | Paths default to the **directory containing each script**; adjust `data_dir` and filenames in each `*Config` class as needed. --- ## Quick commands ```bash # 1) Preprocessing (requires gas_final.arff.csv) python data_preprocess.py # 2) Known / unknown split python data_seen_unseen.py # 3) Closed-set VAE + classifier python vae_closed_set_training.py # 4) (Optional) Closed-set confusion matrix figure python confusion_matrix_viz.py # 5) Open-set detection python open_set_detection.py # 6) Dynamic update (needs npz from step 5 and final_dataset_new.csv) python dynamic_update_detection.py ``` On headless machines, disable plotting / `show` flags where the scripts expose them. ---