# name_classification

**Repository Path**: lucasliu71/name_classification

## Basic Information

- **Project Name**: name_classification
- **Description**: Name classification prediction project
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-05-27
- **Last Updated**: 2025-06-27

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 人名分类器训练&预测

## 安装依赖

```bash
pip install -r requirements.txt
```

## 选择模型

模型用的是 RNN, LSTM 和 GRU

### 数据结构

```
Abl Czech
Adsit   Czech
Ajdrna  Czech
Alt Czech
Antonowitsch    Czech
Antonowitz  Czech
Bacon   Czech
Ballalatak  Czech
Ballaltick  Czech
Bartonova   Czech
Bastl   Czech
Baroch  Czech
...
```

### 定义分割字符串

用 Python `string` 库的 `ascii_letters` 再加上 `.,;'` 等有可能在英文人名中出现的符号

`abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ .,;'`, 一共57个字符, 模型将人名转化为 one-hot 独热编码

人名所对应的国家: Arabic, Chinese, Czech, Dutch, English, French, German, Greek, Irish, Italian, Japanese, Korean, Polish, Portuguese, Russian, Scottish, Spanish, Vietnamese

### 创建 RNN 网络模型

```python
class RNNRebuild(Module):
    def __init__(self, input_size: int, hidden_size: int, output_size: int, num_layers: int=1, batch_first: bool=False):
        """
        RNN rebuild model for name classification,
        separated with original RNN model from PyTorch

        Args:
            input_size: The number of expected features in the input
            hidden_size: The number of features in the hidden state
            output_size: The number of output features
            num_layers: The number of recurrent layers
            batch_first: If True, the first input will be used as the initial hidden state
        """
        super(RNNRebuild, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers
        self.batch_first = batch_first

        self.rnn = RNN(
            input_size=self.input_size,
            hidden_size=self.hidden_size,
            num_layers=self.num_layers,
            batch_first=self.batch_first
        ).to('cuda')
        self.linear = Linear(self.hidden_size, self.output_size).to('cuda')
        self.softmax = LogSoftmax(dim=-1).to('cuda')

    def forward(self, inputs: Tensor,
                hidden: Tensor) -> tuple[Tensor, Tensor]:
        inputs = inputs.unsqueeze(1)
        rr, hn = self.rnn(inputs, hidden)
        tmp_rr = rr[-1]
        tmp_rr = self.linear(tmp_rr)
        return self.softmax(tmp_rr), hn

    def init_hidden(self) -> Tensor:
        return zeros(self.num_layers, 1, self.hidden_size).to('cuda')
```

### 创建 LSTM 网络模型

```python
class LSTMRebuild(Module):
    def __init__(self, input_size: int, hidden_size: int, output_size: int, num_layers: int=1, batch_first: bool=False):
        """
        LSTM rebuild model for name classification,
        separated with original LSTM model from PyTorch

        Args:
            input_size: The number of expected features in the input
            hidden_size: The number of features in the hidden state
            output_size: The number of output features
            num_layers: The number of recurrent layers
            batch_first: If True, the first input will be used as the initial hidden state
        """
        super(LSTMRebuild, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers
        self.batch_first = batch_first

        self.lstm = LSTM(
            input_size=self.input_size,
            hidden_size=self.hidden_size,
            num_layers=self.num_layers,
            batch_first=self.batch_first
        ).to('cuda')
        self.linear = Linear(self.hidden_size, self.output_size).to('cuda')
        self.softmax = LogSoftmax(dim=-1).to('cuda')

    def forward(self, inputs: Tensor, hidden: Tensor,
                c: Tensor) -> tuple[Tensor, Tensor, Tensor]:
        inputs = inputs.unsqueeze(1)
        rr, (hn, cn) = self.lstm(inputs, (hidden, c))
        tmp_rr = rr[-1]
        tmp_rr = self.linear(tmp_rr)
        return self.softmax(tmp_rr), hn, cn

    def init_hidden(self) -> tuple[Tensor, Tensor]:
        hidden = zeros(self.num_layers, 1, self.hidden_size).to('cuda')
        c = zeros(self.num_layers, 1, self.hidden_size).to('cuda')
        return hidden, c
```

### 创建 GRU 网络模型

```python
class GRURebuild(Module):
    def __init__(self, input_size: int, hidden_size: int, output_size: int, num_layers: int=1, batch_first: bool=False):
        """
        LSTM rebuild model for name classification,
        separated with original LSTM model from PyTorch

        Args:
            input_size: The number of expected features in the input
            hidden_size: The number of features in the hidden state
            output_size: The number of output features
            num_layers: The number of recurrent layers
            batch_first: If True, the first input will be used as the initial hidden state
        """
        super(GRURebuild, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers
        self.batch_first = batch_first

        self.gru = GRU(
            input_size=self.input_size,
            hidden_size=self.hidden_size,
            num_layers=self.num_layers,
            batch_first=self.batch_first
        ).to('cuda')
        self.linear = Linear(self.hidden_size, self.output_size).to('cuda')
        self.softmax = LogSoftmax(dim=-1).to('cuda')

    def forward(self, inputs: Tensor, hidden: Tensor) -> tuple[Tensor, Tensor]:
        inputs = inputs.unsqueeze(1)
        rr, hn = self.gru(inputs, hidden)
        tmp_rr = rr[-1]
        tmp_rr = self.linear(tmp_rr)
        return self.softmax(tmp_rr), hn

    def init_hidden(self) -> Tensor:
        return zeros(self.num_layers, 1, self.hidden_size).to('cuda')
```

### 定义 Name Classification Dataset

```python
class NameClassDataset(Dataset):
    def __init__(self, names: list[str], countries: list[str]):
        """
        Initialize the dataset with a list
        of names and a list of countries

        Args:
            names: A list of names
            countries: A list of countries
        """
        super(NameClassDataset, self).__init__()
        self.names = names
        self.countries = countries
        self.num_names = len(self.names)

    def __len__(self) -> int:
        return self.num_names

    def __getitem__(self, idx: int) -> tuple[Tensor, str, Tensor, str]:
        idx = min(max(idx, 0), self.num_names - 1)
        name = self.names[idx]
        country = self.countries[idx]
        tensor_name = zeros(len(name), len(LETTERS)).to('cuda')
        tensor_country = tensor(COUNTRIES.index(country), dtype=long).to('cuda')
        for l, letter in enumerate(name):
            tensor_name[l][LETTERS.find(letter)] = 1
        return tensor_name, tensor_country
```

## 模型训练

![rnn_lstm_gru_train](assets/rnn_lstm_gru_train.png)

本次训练的学习率为: `0.001` 和 `0.0001`, 训练了10次

学习率=`0.001`:

* RNN 模型的准确率为: `0.6999311701081613`, 耗时457.87秒
* LSTM 模型的准确率为: `0.865774788241156`, 耗时422.13秒
* GRU 模型的准确率为: `0.8664374688589935`, 耗时523.64秒

学习率=`0.0001`:

* RNN 模型的准确率为: `0.7715752741774676`, 耗时569.19秒

* LSTM 模型的准确率为: `0.7445839561534628`, 耗时428.09秒

* GRU 模型的准确率为: `0.7696811160936722`, 耗时387.49秒

## 模型对于缺失值、准确率和时间的对比

### 损失值对比

<img src="images/10_0.001/compare_loss.png" style="zoom:80%;" /> <img src="images/10_0.0001/compare_loss.png" style="zoom:80%;" />

> [!NOTE]
>
> 在学习率为 `0.001` 时, RNN 模型的损失值在降到1.01时就不在下降了
>
> 在学习率为 `0.0001` 时, RNN, LSTM 和 GRU 模型的损失值都在稳定下降

### 准确率对比

<img src="images/10_0.001/compare_acc.png" style="zoom:80%;" /> <img src="images/10_0.0001/compare_acc.png" style="zoom:80%;" />

> [!NOTE]
>
> 在学习率为 `0.001` 时, LSTM 和 GRU 模型的训练准确度比 RNN 模型的要高
>
> 在学习率为 `0.0001` 时, RNN 和 GRU 模型的训练准确度要比 LSTM 模型的要高

### 时间对比

<img src="images/10_0.001/compare_time.png" style="zoom:80%;" /> <img src="images/10_0.0001/compare_time.png" style="zoom:80%;" />

> [!NOTE]
>
> 在学习率为 `0.001` 时, GRU 模型的耗时要比 RNN 和 GRU 的要高, LSTM 模型的训练时间最低
>
> 在学习率为 `0.0001` 时, RNN 模型的耗时要比 LSTM 和 GRU 的要高, GRU 模型的训练时间最低

## 模型预测

<img src="assets/model_predict_0.001.png" alt="model_predict_0.001" style="zoom:80%;" /> <img src="assets/model_predict_0.0001.png" alt="model_predict_0.0001" style="zoom:80%;" />