# IIoT-IDS

**Repository Path**: csnow9/ids

## Basic Information

- **Project Name**: IIoT-IDS
- **Description**: 面向工业物联网的轻量化动态开集入侵检测系统
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 3
- **Forks**: 1
- **Created**: 2025-10-28
- **Last Updated**: 2026-05-25

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# IIoT-IDS

This folder contains Python scripts that mirror the notebooks (`IIoT-IDS.ipynb`): from raw CSVs through **closed-set classification**, **open-set recognition**, and **dynamic update detection**, so you can reproduce and integrate the pipeline from the command line.

---

## Requirements

- Python 3.10+ (3.10–3.12 recommended)
- Typical stack: `pandas`, `numpy`, `scikit-learn`, `matplotlib`, `seaborn`, `imbalanced-learn` (`imblearn`), `tensorflow` / `keras`, `scipy`

Install example:

```bash
pip install pandas numpy scikit-learn matplotlib seaborn imbalanced-learn tensorflow scipy
```

---

## Recommended pipeline

```
Raw data (gas_final.arff.csv)
    ↓
data_preprocess.py           # clean, one-hot, standardize, SMOTEENN, export balanced data, etc.
    ↓
data_seen_unseen.py          # known/unknown split, MinMax, write seen_*.npy
    ↓
vae_closed_set_training.py   # 1D Conv VAE + frozen-encoder classifier; save weights + closed_set_predictions.npz
    ↓
confusion_matrix_viz.py      # (optional) closed-set confusion-matrix heatmaps
    ↓
open_set_detection.py        # softmax + reconstruction threshold + EVT/GPD; write open_set_for_dynamic.npz
    ↓
dynamic_update_detection.py  # student/teacher + distillation; augment with unknown + final_dataset, distill again
```

If you already have `gas_encoded_balance_data0.csv` and the four `seen_*.npy` files, you can start from the matching step.

---

## Scripts

| File | Role |
|------|------|
| `data_preprocess.py` | Load `gas_final.arff.csv`, one-hot encode, standardize, SMOTEENN, split and export balanced data (e.g. `gas_encoded_balance_data0.csv`, `normalized_data.csv`, `.npy` artifacts). |
| `data_seen_unseen.py` | Load balanced data, remap labels, `MinMaxScaler`, train/test split, known (0–6) vs unknown (7), save four `seen_*.npy` files. |
| `vae_closed_set_training.py` | Closed set: VAE pre-training + softmax classifier on frozen encoder; saves `open_set_vae.weights.h5`, `open_set_classifier.weights.h5`, and `closed_set_predictions.npz`. |
| `confusion_matrix_viz.py` | Load `closed_set_predictions.npz`, print metrics, plot confusion matrices (counts + row-normalized), save PNG. |
| `open_set_detection.py` | Open set: load weights and `seen_*.npy`, calibrate reconstruction error, merge test sets, hybrid prediction (unknown label defaults to **7**), EVT/GPD; writes `open_set_for_dynamic.npz`. |
| `dynamic_update_detection.py` | Dynamic update: student/teacher and distillation; read unknowns from `open_set_for_dynamic.npz`, merge with samples from `final_dataset_new.csv`, retrain and evaluate. |
| `IIoT-IDS.py` | Monolithic export from Jupyter (single long script) for reference; prefer the modular scripts above for day-to-day use. |

---

## Main inputs / outputs

| Artifact | Description |
|------------|-------------|
| `gas_final.arff.csv` | Raw feature table (preprocessing entry point). |
| `gas_encoded_balance_data0.csv` | Balanced feature table with `result` (from preprocessing). |
| `seen_train_X.npy`, `seen_train_y.npy`, `seen_test_X.npy`, `seen_test_y.npy` | Arrays after known/unknown splitting. |
| `open_set_vae.weights.h5`, `open_set_classifier.weights.h5` | VAE and classifier weights from closed-set training. |
| `closed_set_predictions.npz` | `true_labels`, `predicted_labels`, etc. on the closed-set test split. |
| `open_set_for_dynamic.npz` | `merged_pred`, `test_X_merge`, `test_y_merge` for the dynamic-update stage. |
| `final_dataset_new.csv` | Extra data for augmentation in the dynamic script (place in the project or set path in config). |
| `train_X_new.npy`, etc. | Augmented train/test tensors written by the dynamic script. |

Paths default to the **directory containing each script**; adjust `data_dir` and filenames in each `*Config` class as needed.

---

## Quick commands

```bash
# 1) Preprocessing (requires gas_final.arff.csv)
python data_preprocess.py

# 2) Known / unknown split
python data_seen_unseen.py

# 3) Closed-set VAE + classifier
python vae_closed_set_training.py

# 4) (Optional) Closed-set confusion matrix figure
python confusion_matrix_viz.py

# 5) Open-set detection
python open_set_detection.py

# 6) Dynamic update (needs npz from step 5 and final_dataset_new.csv)
python dynamic_update_detection.py
```

On headless machines, disable plotting / `show` flags where the scripts expose them.

---