# SafeFlow

**Repository Path**: codekpy/safe-flow

## Basic Information

- **Project Name**: SafeFlow
- **Description**: 使用Python+轻量本地文本分类模型的网络流量监测模块
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-04-12
- **Last Updated**: 2026-04-12

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# SafeFlow 安全检测系统

## 项目简介

SafeFlow 是一个基于机器学习的HTTP请求安全检测系统，使用TF-IDF + 朴素贝叶斯模型进行恶意流量检测。

**主要功能：**
- 检测SQL注入、XSS、命令注入、路径遍历等常见Web攻击
- 支持批量生成恶意内容样本
- 基于CSV数据管理训练数据
- 模块化架构设计，易于扩展
- 支持标准化HTTP请求格式检测

## 技术架构

```
SafeFlow/
├── models/              # 模型相关
│   ├── __init__.py
│   ├── attack_type.py   # 攻击类型枚举
│   └── trained_model.pkl # 训练好的模型
├── core/                # 核心检测
│   ├── __init__.py
│   └── detector.py      # HTTP安全检测器
├── generators/          # 数据生成
│   ├── __init__.py
│   └── malicious_generator.py  # 恶意内容生成器
├── trainers/            # 模型训练
│   ├── __init__.py
│   └── model_trainer.py  # 模型训练器
├── data/                # 数据存储
│   └── training_data.csv  # 训练数据
├── utils/               # 工具函数
│   ├── __init__.py
│   └── text_processor.py  # 文本预处理
├── reports/             # 报告
│   └── training_report.txt  # 训练报告
├── main.py              # 主入口
├── requirements.txt     # 依赖管理
└── .gitignore          # Git忽略文件
```

## 快速开始

### 1. 环境准备

```bash
# 创建虚拟环境
python -m venv venv

# 激活虚拟环境
# Windows
venv\Scripts\Activate.ps1
# Linux/Mac
# source venv/bin/activate

# 安装依赖（使用清华源）
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt
```

### 2. 生成训练数据

```bash
# 生成500条训练数据（每类型100条）
python main.py generate --count 100

# 查看生成结果
ls data/training_data.csv
```

### 3. 训练模型

```bash
# 训练模型
python main.py train

# 查看训练报告
cat reports/training_report.txt
```

### 4. 检测请求

#### 4.1 使用完整JSON格式

```bash
# 从文件读取请求
python main.py detect --file request.json

# 直接传入JSON
python main.py detect --request '{"url":"http://example.com/api","method":"GET","headers":{},"params":{},"body":{}}'
```

#### 4.2 使用单独参数

```bash
# 检测URL
python main.py detect --url "http://example.com/api/users?id=123"

# 带参数的请求
python main.py detect --url "http://example.com/search" --method GET --params "q:test,page:1"

# POST请求
python main.py detect --url "http://example.com/login" --method POST --body "username:admin,password:secret"
```

#### 4.3 交互模式

```bash
python main.py detect --interactive
```

## 攻击类型支持

| 攻击类型 | 检测能力 |
|---------|---------|
| SQL注入 | ✅ 支持基础注入、UNION注入、时间盲注、布尔盲注、报错注入等 |
| XSS | ✅ 支持反射型、存储型、DOM型、事件处理器等 |
| 命令注入 | ✅ 支持Unix/Linux、Windows、反弹Shell等 |
| 路径遍历 | ✅ 支持基础遍历、编码绕过、双重编码等 |
| 正常请求 | ✅ 正确识别正常流量 |

## 数据生成器

### 生成样本类型

- **正常请求**：包含用户查询、登录、注册、搜索等正常业务请求
- **SQL注入**：包含基础注入、UNION注入、时间盲注、布尔盲注、报错注入、编码绕过等
- **XSS**：包含反射型、DOM型、事件处理器、编码绕过等
- **命令注入**：包含Unix/Linux命令、Windows命令、反弹Shell、编码绕过等
- **路径遍历**：包含基础遍历、编码绕过、双重编码、特殊协议等

### 生成命令

```bash
# 生成每类型200条数据
python main.py generate --count 200 --output data/custom_data.csv
```

## 模型训练

### 训练参数

- **训练集比例**：80%
- **测试集比例**：20%
- **特征提取**：TF-IDF（1-3元语法）
- **分类器**：朴素贝叶斯（alpha=0.1）
- **评估指标**：准确率、精确率、召回率、F1值

### 训练结果

| 指标 | 结果 |
|------|------|
| 准确率 | ~99% |
| 精确率 | ~99% |
| 召回率 | ~99% |
| F1值 | ~99% |

## 高级功能

### 1. 标准化HTTP请求格式

```json
{
    "url": "http://example.com/api",
    "method": "GET",
    "headers": {
        "User-Agent": "Mozilla/5.0",
        "Accept": "application/json"
    },
    "params": {
        "id": "123",
        "name": "test"
    },
    "body": {
        "username": "admin",
        "password": "secret"
    }
}
```

### 2. 智能检测逻辑

- **模型预测**：使用TF-IDF + 朴素贝叶斯模型
- **关键词检测**：基于规则的关键词匹配
- **正则模式**：针对特定攻击模式的正则匹配
- **置信度判断**：当模型预测不确定时，进行二次验证

### 3. 误报控制

- **正常样本增强**：增加包含常见参数的正常请求样本
- **阈值调整**：优化模型预测阈值，减少误报
- **规则过滤**：对特定模式进行规则过滤

## 性能特性

- **轻量化**：使用TF-IDF + 朴素贝叶斯，模型大小约10-20MB
- **高效**：单条请求检测时间 < 1ms
- **可扩展**：模块化设计，支持添加新的攻击类型
- **可定制**：支持自定义训练数据和模型参数

## 示例

### 检测SQL注入

```bash
python main.py detect --url "http://example.com/api/users?id=1' OR '1'='1"
# 结果：恶意=是，攻击类型=SQL_INJECTION
```

### 检测XSS

```bash
python main.py detect --url "http://example.com/search?q=<script>alert('xss')</script>"
# 结果：恶意=是，攻击类型=XSS
```

### 检测命令注入

```bash
python main.py detect --url "http://example.com/ping?host=localhost; cat /etc/passwd"
# 结果：恶意=是，攻击类型=COMMAND_INJECTION
```

### 检测路径遍历

```bash
python main.py detect --url "http://example.com/files?path=../../../etc/passwd"
# 结果：恶意=是，攻击类型=PATH_TRAVERSAL
```

## 依赖项

- numpy
- scikit-learn
- pandas

## 项目状态

- ✅ 核心功能实现完成
- ✅ 模块化架构设计
- ✅ 恶意内容生成器
- ✅ CSV数据管理
- ✅ 模型训练与评估
- ✅ 标准化HTTP请求格式
- ✅ 误报控制优化

## 贡献

欢迎提交Issue和Pull Request！

## 许可证

Apache License 2.0

详见 [LICENSE](LICENSE) 文件。