# ai-text-classification

**Repository Path**: xiaonian0430/ai-text-classification

## Basic Information

- **Project Name**: ai-text-classification
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-12-29
- **Last Updated**: 2021-12-29

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 基于 BERT 预训练语言模型，搭建短文本分类训练和预测框架

文本分类是自然语言处理核心任务之一，常见用文本审核、广告过滤、情感分析、语音控制和反黄识别等 NLP 领域。

>BERT，预训练的深度双向 Transformer 语言模型

## 安装依赖
```
pip3 install -r requirements.txt
pip3 install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
pip3 install -r requirements.txt -i https://pypi.python.org/simple/
pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/
```

## 训练

```
python3 run_train.py
```

## 部署

```
docker run -t --rm -p 8501:8501 -v "`pwd`/tf_serving_model:/models" -e MODEL_NAME=single_label_model tensorflow/serving
python3 run_http.py
```

8核16G内存的机器，没有GPU

训练数据样本: 267882 分类标签： 267882
评估数据样本: 57403 分类标签： 57403
测试数据样本: 57403 分类标签： 57403

未使用bert预训练模型：
内存 1.7G 
CPU 630~650 78%占用率
训练时长：
Epoch 1/100
2092/2092 - 387s 182ms/step - loss: 0.0939 - accuracy: 0.7334 - val_loss: 0.0518 - val_accuracy: 0.8651

使用bert预训练模型：
内存 2.5G 
CPU 580~560 74%占用率
训练时长：
Epoch 1/100
2092/2092 [==============================] - 3:40:00 570s/step