diff --git a/assignment-2/submission/18307130104/README.md b/assignment-2/submission/18307130104/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d1d38cfc70c1a72658e9d0fa1cf8569687ab9e45 --- /dev/null +++ b/assignment-2/submission/18307130104/README.md @@ -0,0 +1,179 @@ +18307130104 + +# 课程报告 + +这是 prml 的 assignment-2 课程报告,我的代码可以查看 numpy_fnn.py 中 code 1 ~ code 7 部分,以及 util.py 中 mini_batch 函数 numpy == True 的部分。 + +在 assignment-2 中,完成了 numpy_fnn.py 中各种算子的反向传播,以及一个简单的前馈神经网络构建(包括正向传播和反向传播)。修改了 mini_batch,在 numpy == True 的情况下,不使用 torch 中的 dataloader 函数完成测试集的打乱和分批。 + +## 模型实现 + +为了区别矩阵乘法(np.matmul)和矩阵元素逐一做乘法(\*),下面用$\times$表示矩阵乘法,\*表示元素逐一相乘。 + +### Matmul 算子的反向传播 + +Matmul 算子输入一个 X 和权重 W,输出 $$[Y] = [X] \times [W]$$ + +对于 Y 中的元素 $$Y_{ij}$$ 有$$Y_{ij}=\sum_{k}X_{ik} * W_{kj}$$ + +在计算 grad_x 的时候,已知 grad_y,根据链式法则,可以得到 $gradx_{ij}=\sum_{k}\frac{\partial Y_{ik}}{\partial X_{ij}} * grady_{ik}$ + +由 $Y_{ij}$的计算公式可以得到,$\frac{\partial Y_{ik}}{\partial X_{ij}}=W_{jk}$ + +故 $gradx_{ij}=\sum_k W_{jk} *grady_{ik}$ + +所以 $[gradx] = [grady] \times [W^T]$ + +同理,可以得到$[gradW]=[x^T]\times [grady]$ + +经过验证,矩阵的大小符合矩阵乘法规则。 + +### Relu 算子的反向传播 + +relu 函数的计算规则如下: + +$relu(x) = \begin{cases}0 & x < 0 \\\\ x & otherwise \end{cases}$ + +求导可以得到 + +$relu^{'}(x) = \begin{cases}0 & x < 0 \\\\ 1 & otherwise \end{cases}$ + +故 + +$[relugrad]=[grady]* [relu^{'}]$ + +### Log 算子的反向传播 + +$log(x) = \ln x$ + +可以得到 + +$log^{'}(x)=\frac 1 x$ + +故 + +$[loggrad]=[grady]* [log^{'}]$ + +### softmax 算子的反向传播 + +$softmax(x_i) = \frac {e^{x_i}}{\sum_j e^{x_j}}$ + +在实现过程中,因为每一行代表一个测试数据点,所以以每一行为整体对每个元素进行 softmax 操作,从而达成对每个测试数据点进行分类的目的。 + +采用 softmax 算子和交叉熵损失函数可以让损失函数的形式比较简单,但是遗憾的是实现的神经网络要求将两个算子的反向传播操作分开,因此没有办法投机取巧,只能分步进行计算。 + +为了表达方便,不妨令 $a_i = softmax(x_i)$ + +下面考虑$a_i$对$x_j$的反向传播。 + +$a_i = \frac{e^{x_i}}{\sum_k e^{x_k}}$ + +$\frac {\partial a_i}{\partial x_j}=\frac{\partial}{\partial x_j}(\frac{e^{x_i}}{\sum_k e^{x_k}})$ + +接下来根据 i 和 j 是否相等分情况进行讨论。 + +若 i == j,则 $\frac{\partial}{\partial x_j}(\frac{e^{x_i}}{\sum_k e^{x_k}})=\frac{e^{x_i}(\sum_j e^{x_j})-e^{z_i}e^{z_i}}{(\sum_k e^{x_k})^2}=a_i(1-a_i)$ + +若 i != j,则$\frac{\partial}{\partial x_j}(\frac{e^{x_i}}{\sum_k e^{x_k}})=-\frac{e^{x_i}e^{x_j}}{(\sum_k e^{x_k})^2}=-a_ia_j$ + +结合 grady,可以得到 + +$gradx_{ij}=\sum_k \frac{\partial}{\partial x_j}(\frac{e^{x_k}}{\sum_w e^{x_w}}) grady_{ik}$ + +由于这个梯度的计算需要进行分类讨论,我没有想到可以直接用 numpy 中函数进行计算的方法,所以首先计算出一个 list 再转换成 ndarray 进行返回。 + +### 模型正向传播 + +模型每一层的输出作为下一层的输入,最后得到的是经过 Log 计算的 softmax 结果,这样就能很方便的进行交叉熵损失函数的计算。同时经过“模型反向传播”中的分析可以知道,这样设计使反向传播时的输入也非常简便。 + +### 模型反向传播 + +模型进行反向传播的时候会输入一个每行为一个独热向量的矩阵,表示每个数据集的类别,初始代码中会将矩阵中所有元素都除以矩阵的大小,但是经过的尝试,需要将所有元素除以训练数据的组数才能保证结果正确。~~同时,虽然通过了测试,但是 softmax 层的输出也和 torch 中的结果有不同,而后面层的输出是正确的。我认定我理解的 softmax 层和 torch 实现的 softmax 层有一定区别。~~ + +在更改了测试代码之后,输出和 torch 层比较接近,可以认定是正确的。 + +接下来推导反向传播时 Log 层的输入。 + +交叉熵损失函数的形式为 + +$Loss = -\sum_k t_k*\ln a_k$ + +其中 $t_k$表示是否属于第 k 个类别,$a_k$为 softmax 层的输出,Log 层的输出为$\ln a_k$,则$\frac{\partial Loss}{\partial \ln a_k}=-t_k$ + +因此,将输入到反向传播的矩阵 T 取反作为 Log 层的反向传播输入,然后将结果作为前一层的输入逐一反向传播。 + +## 模型训练 + +随着训练轮数增长,训练的正确率如下 + +learning_rate = 0.1 mini_batch = 128 + +> [0] Accuracy: 0.9403
+> [1] Accuracy: 0.9641
+> [2] Accuracy: 0.9716
+> [3] Accuracy: 0.9751
+> [4] Accuracy: 0.9772
+> [5] Accuracy: 0.9782
+> [6] Accuracy: 0.9745
+> [7] Accuracy: 0.9807
+> [8] Accuracy: 0.9790
+> [9] Accuracy: 0.9811 + +损失随训练轮数变化如下图所示 + +loss + +可以看到,正确率随着训练稳步上升,在 6 轮之后,数字基本稳定,仅仅有略微的上下波动。 + +learning_rate = 0.1 mini_batch = 32 + +> [0] Accuracy: 0.9646
+> [1] Accuracy: 0.9726
+> [2] Accuracy: 0.9768
+> [3] Accuracy: 0.9788
+> [4] Accuracy: 0.9792
+> [5] Accuracy: 0.9770
+> [6] Accuracy: 0.9820
+> [7] Accuracy: 0.9808
+> [8] Accuracy: 0.9822
+> [9] Accuracy: 0.9835 + +loss + +可以看到,由于 mini_batch 从 128 变成 32,损失随着轮数的变化会有比较大的起伏。 + +learning_rate = 0.2 mini_batch = 128 + +> [0] Accuracy: 0.9295
+> [1] Accuracy: 0.9688
+> [2] Accuracy: 0.9753
+> [3] Accuracy: 0.9734
+> [4] Accuracy: 0.9793
+> [5] Accuracy: 0.9777
+> [6] Accuracy: 0.9792
+> [7] Accuracy: 0.9807
+> [8] Accuracy: 0.9821
+> [9] Accuracy: 0.9815 + +loss + +虽然调高了学习率,但是损失并没有因此产生比较大的起伏,仍然表现出非常好的效果。 + +learning_rate = 0.05 mini_batch = 128 + +> [0] Accuracy: 0.9310
+> [1] Accuracy: 0.9504
+> [2] Accuracy: 0.9601
+> [3] Accuracy: 0.9661
+> [4] Accuracy: 0.9691
+> [5] Accuracy: 0.9728
+> [6] Accuracy: 0.9749
+> [7] Accuracy: 0.9761
+> [8] Accuracy: 0.9768
+> [9] Accuracy: 0.9752 + +loss + +降低了学习率之后,可以看到正确率的增长比较缓慢,但是经过几轮训练之后的结果和高学习率的时候差不多。 + +综合来看,影响最终正确率的主要还是模型本身的学习能力,一定范围内修改学习率和 mini_batch 对结果的影响不大。采用 mini_batch 的方式训练有助于降低训练过程中损失的波动。 \ No newline at end of file diff --git a/assignment-2/submission/18307130104/img/result-1.png b/assignment-2/submission/18307130104/img/result-1.png new file mode 100644 index 0000000000000000000000000000000000000000..11c6fba6be9d6f58a463830a5d8c006ad64af963 Binary files /dev/null and b/assignment-2/submission/18307130104/img/result-1.png differ diff --git a/assignment-2/submission/18307130104/img/result-2.png b/assignment-2/submission/18307130104/img/result-2.png new file mode 100644 index 0000000000000000000000000000000000000000..3f9aa1a2ed643f738f7d9ff59ea1923891048166 Binary files /dev/null and b/assignment-2/submission/18307130104/img/result-2.png differ diff --git a/assignment-2/submission/18307130104/img/result-3.png b/assignment-2/submission/18307130104/img/result-3.png new file mode 100644 index 0000000000000000000000000000000000000000..1e7d29f9f43741b83d6ac43ecf4b6c448c8c1141 Binary files /dev/null and b/assignment-2/submission/18307130104/img/result-3.png differ diff --git a/assignment-2/submission/18307130104/img/result-4.png b/assignment-2/submission/18307130104/img/result-4.png new file mode 100644 index 0000000000000000000000000000000000000000..2a1f550db001bdcc1d3a3b9501dba56a13028e8e Binary files /dev/null and b/assignment-2/submission/18307130104/img/result-4.png differ diff --git a/assignment-2/submission/18307130104/img/result-5.png b/assignment-2/submission/18307130104/img/result-5.png new file mode 100644 index 0000000000000000000000000000000000000000..7ee7df630e01d83559e9f316a937df107e98248d Binary files /dev/null and b/assignment-2/submission/18307130104/img/result-5.png differ diff --git a/assignment-2/submission/18307130104/img/result.png b/assignment-2/submission/18307130104/img/result.png new file mode 100644 index 0000000000000000000000000000000000000000..0039ef8029c07eeb75caa2efd42c13aeba61ce5a Binary files /dev/null and b/assignment-2/submission/18307130104/img/result.png differ diff --git a/assignment-2/submission/18307130104/numpy_fnn.py b/assignment-2/submission/18307130104/numpy_fnn.py new file mode 100644 index 0000000000000000000000000000000000000000..ba780e9edb71ec687ddf7d295973be810848ce79 --- /dev/null +++ b/assignment-2/submission/18307130104/numpy_fnn.py @@ -0,0 +1,214 @@ +import numpy as np + + +class NumpyOp: + + def __init__(self): + self.memory = {} + self.epsilon = 1e-12 + + +class Matmul(NumpyOp): + + def forward(self, x, W): + """ + x: shape(N, d) + w: shape(d, d') + """ + self.memory['x'] = x + self.memory['W'] = W + h = np.matmul(x, W) + return h + + def backward(self, grad_y): + """ + grad_y: shape(N, d') + """ + + #################### + # code 1 # + #################### + grad_x = np.matmul(grad_y, self.memory['W'].T) + grad_W = np.matmul(self.memory['x'].T, grad_y) + + return grad_x, grad_W + + +class Relu(NumpyOp): + + def forward(self, x): + self.memory['x'] = x + return np.where(x > 0, x, np.zeros_like(x)) + + def backward(self, grad_y): + """ + grad_y: same shape as x + """ + + #################### + # code 2 # + #################### + grad_x = grad_y * np.where(self.memory['x'] > 0, np.ones_like(self.memory['x']), np.zeros_like(self.memory['x'])) + + return grad_x + + +class Log(NumpyOp): + + def forward(self, x): + """ + x: shape(N, c) + """ + + out = np.log(x + self.epsilon) + self.memory['x'] = x + + return out + + def backward(self, grad_y): + """ + grad_y: same shape as x + """ + + #################### + # code 3 # + #################### + grad_x = grad_y * np.reciprocal(self.memory['x'] + self.epsilon) + + return grad_x + + +class Softmax(NumpyOp): + """ + softmax over last dimension + """ + + def forward(self, x): + """ + x: shape(N, c) + """ + + #################### + # code 4 # + #################### + self.memory['x'] = x + expx = np.exp(x) + sumx = np.sum(expx, axis = 1, keepdims = True) + return (expx / sumx) + + def backward(self, grad_y): + """ + grad_y: same shape as x + """ + + #################### + # code 5 # + #################### + + x = self.memory['x'] + softx = self.forward(x) + # print(sumx.shape) + [n, m] = x.shape + out = [] + # print(grad_y) + for i in range(n): + out.append([]) + for j in range(m): + out[i].append(0) + for k in range(m): + if j == k: + # print(softx[i][k], grad_y[i][k]) + out[i][j] += (1 - softx[i][k]) * softx[i][k] * grad_y[i][k] + else: + out[i][j] += -softx[i][j] * softx[i][k] * grad_y[i][k] + grad_x = np.array(out) + + return grad_x + + +class NumpyLoss: + + def __init__(self): + self.target = None + + def get_loss(self, pred, target): + self.target = target + return (-pred * target).sum(axis=1).mean() + + def backward(self): + return -self.target / self.target.shape[0] + + +class NumpyModel: + def __init__(self): + self.W1 = np.random.normal(size=(28 * 28, 256)) + self.W2 = np.random.normal(size=(256, 64)) + self.W3 = np.random.normal(size=(64, 10)) + + # 以下算子会在 forward 和 backward 中使用 + self.matmul_1 = Matmul() + self.relu_1 = Relu() + self.matmul_2 = Matmul() + self.relu_2 = Relu() + self.matmul_3 = Matmul() + self.softmax = Softmax() + self.log = Log() + + # 以下变量需要在 backward 中更新。 softmax_grad, log_grad 等为算子反向传播的梯度( loss 关于算子输入的偏导) + self.x1_grad, self.W1_grad = None, None + self.relu_1_grad = None + self.x2_grad, self.W2_grad = None, None + self.relu_2_grad = None + self.x3_grad, self.W3_grad = None, None + self.softmax_grad = None + self.log_grad = None + + def forward(self, x): + x = x.reshape(-1, 28 * 28) + + #################### + # code 6 # + #################### + x = self.matmul_1.forward(x, self.W1) + x = self.relu_1.forward(x) + x = self.matmul_2.forward(x, self.W2) + x = self.relu_2.forward(x) + x = self.matmul_3.forward(x, self.W3) + x = self.softmax.forward(x) + # print(x) + x = self.log.forward(x) + + return x + + def backward(self, y): + + #################### + # code 7 # + #################### + + y = self.log.backward(y) + self.log_grad = y + + y = self.softmax.backward(y) + self.softmax_grad = y + + y, self.W3_grad = self.matmul_3.backward(y) + self.x3_grad = y + + y = self.relu_2.backward(y) + self.relu_2_grad = y + + y, self.W2_grad = self.matmul_2.backward(y) + self.x2_grad = y + + y = self.relu_1.backward(y) + self.relu_1_grad = y + + y, self.W1_grad = self.matmul_1.backward(y) + self.x1_grad = y + return y + + def optimize(self, learning_rate): + self.W1 -= learning_rate * self.W1_grad + self.W2 -= learning_rate * self.W2_grad + self.W3 -= learning_rate * self.W3_grad diff --git a/assignment-2/submission/18307130104/numpy_mnist.py b/assignment-2/submission/18307130104/numpy_mnist.py new file mode 100644 index 0000000000000000000000000000000000000000..5f7aaadd84d701b578d384df3d4976f5c76a5dfa --- /dev/null +++ b/assignment-2/submission/18307130104/numpy_mnist.py @@ -0,0 +1,38 @@ +import numpy as np +from numpy_fnn import NumpyModel, NumpyLoss +from utils import download_mnist, batch, mini_batch, get_torch_initialization, plot_curve, one_hot + + +def numpy_run(): + train_dataset, test_dataset = download_mnist() + + model = NumpyModel() + numpy_loss = NumpyLoss() + model.W1, model.W2, model.W3 = get_torch_initialization() + + train_loss = [] + + epoch_number = 3 + learning_rate = 0.1 + + for epoch in range(epoch_number): + for x, y in mini_batch(train_dataset, 128, True): + y = one_hot(y) + + y_pred = model.forward(x) + loss = numpy_loss.get_loss(y_pred, y) + + model.backward(numpy_loss.backward()) + model.optimize(learning_rate) + + train_loss.append(loss.item()) + + x, y = batch(test_dataset)[0] + accuracy = np.mean((model.forward(x).argmax(axis=1) == y)) + print('[{}] Accuracy: {:.4f}'.format(epoch, accuracy)) + + plot_curve(train_loss) + + +if __name__ == "__main__": + numpy_run() diff --git a/assignment-2/submission/18307130104/tester_demo.py b/assignment-2/submission/18307130104/tester_demo.py new file mode 100644 index 0000000000000000000000000000000000000000..df4bb27bc0d8b9f28f5abd09faff7635d8347792 --- /dev/null +++ b/assignment-2/submission/18307130104/tester_demo.py @@ -0,0 +1,183 @@ +import numpy as np +import torch +from torch import matmul as torch_matmul, relu as torch_relu, softmax as torch_softmax, log as torch_log + +from numpy_fnn import Matmul, Relu, Softmax, Log, NumpyModel, NumpyLoss +from torch_mnist import TorchModel +from utils import get_torch_initialization, one_hot + +err_epsilon = 1e-6 +err_p = 0.4 + + +def check_result(numpy_result, torch_result=None): + if isinstance(numpy_result, list) and torch_result is None: + flag = True + for (n, t) in numpy_result: + flag = flag and check_result(n, t) + return flag + # print((torch.from_numpy(numpy_result) - torch_result).abs().mean().item()) + T = (torch_result * torch.from_numpy(numpy_result) < 0).sum().item() + direction = T / torch_result.numel() < err_p + + return direction and ((torch.from_numpy(numpy_result) - torch_result).abs().mean() < err_epsilon).item() + + +def case_1(): + x = np.random.normal(size=[5, 6]) + W = np.random.normal(size=[6, 4]) + + numpy_matmul = Matmul() + numpy_out = numpy_matmul.forward(x, W) + numpy_x_grad, numpy_W_grad = numpy_matmul.backward(np.ones_like(numpy_out)) + + torch_x = torch.from_numpy(x).clone().requires_grad_() + torch_W = torch.from_numpy(W).clone().requires_grad_() + + torch_out = torch_matmul(torch_x, torch_W) + torch_out.sum().backward() + + return check_result([ + (numpy_out, torch_out), + (numpy_x_grad, torch_x.grad), + (numpy_W_grad, torch_W.grad) + ]) + + +def case_2(): + x = np.random.normal(size=[5, 6]) + + numpy_relu = Relu() + numpy_out = numpy_relu.forward(x) + numpy_x_grad = numpy_relu.backward(np.ones_like(numpy_out)) + + torch_x = torch.from_numpy(x).clone().requires_grad_() + + torch_out = torch_relu(torch_x) + torch_out.sum().backward() + + return check_result([ + (numpy_out, torch_out), + (numpy_x_grad, torch_x.grad), + ]) + + +def case_3(): + x = np.random.uniform(low=0.0, high=1.0, size=[3, 4]) + + numpy_log = Log() + numpy_out = numpy_log.forward(x) + numpy_x_grad = numpy_log.backward(np.ones_like(numpy_out)) + + torch_x = torch.from_numpy(x).clone().requires_grad_() + + torch_out = torch_log(torch_x) + torch_out.sum().backward() + + return check_result([ + (numpy_out, torch_out), + + (numpy_x_grad, torch_x.grad), + ]) + + +def case_4(): + x = np.random.normal(size=[4, 5]) + + numpy_softmax = Softmax() + numpy_out = numpy_softmax.forward(x) + + torch_x = torch.from_numpy(x).clone().requires_grad_() + + torch_out = torch_softmax(torch_x, 1) + + return check_result(numpy_out, torch_out) + + +def case_5(): + x = np.random.normal(size=[20, 25]) + + numpy_softmax = Softmax() + numpy_out = numpy_softmax.forward(x) + numpy_x_grad = numpy_softmax.backward(np.ones_like(numpy_out)) + + torch_x = torch.from_numpy(x).clone().requires_grad_() + + torch_out = torch_softmax(torch_x, 1) + torch_out.sum().backward() + + return check_result([ + (numpy_out, torch_out), + (numpy_x_grad, torch_x.grad), + ]) + + +def test_model(): + try: + numpy_loss = NumpyLoss() + numpy_model = NumpyModel() + torch_model = TorchModel() + torch_model.W1.data, torch_model.W2.data, torch_model.W3.data = get_torch_initialization(numpy=False) + numpy_model.W1 = torch_model.W1.detach().clone().numpy() + numpy_model.W2 = torch_model.W2.detach().clone().numpy() + numpy_model.W3 = torch_model.W3.detach().clone().numpy() + + x = torch.randn((10000, 28, 28)) + y = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 0] * 1000) + + y = one_hot(y, numpy=False) + x2 = x.numpy() + y_pred = torch_model.forward(x) + loss = (-y_pred * y).sum(dim=1).mean() + loss.backward() + + y_pred_numpy = numpy_model.forward(x2) + numpy_loss.get_loss(y_pred_numpy, y.numpy()) + + check_flag_1 = check_result(y_pred_numpy, y_pred) + print("+ {:12} {}/{}".format("forward", 10 * check_flag_1, 10)) + except: + print("[Runtime Error in forward]") + print("+ {:12} {}/{}".format("forward", 0, 10)) + return 0 + + try: + + numpy_model.backward(numpy_loss.backward()) + + check_flag_2 = [ + check_result(numpy_model.log_grad, torch_model.log_input.grad), + check_result(numpy_model.softmax_grad, torch_model.softmax_input.grad), + check_result(numpy_model.W3_grad, torch_model.W3.grad), + check_result(numpy_model.W2_grad, torch_model.W2.grad), + check_result(numpy_model.W1_grad, torch_model.W1.grad) + ] + check_flag_2 = sum(check_flag_2) >= 4 + print("+ {:12} {}/{}".format("backward", 20 * check_flag_2, 20)) + except: + print("[Runtime Error in backward]") + print("+ {:12} {}/{}".format("backward", 0, 20)) + check_flag_2 = False + + return 10 * check_flag_1 + 20 * check_flag_2 + + +if __name__ == "__main__": + testcases = [ + ["matmul", case_1, 5], + ["relu", case_2, 5], + ["log", case_3, 5], + ["softmax_1", case_4, 5], + ["softmax_2", case_5, 10], + ] + score = 0 + for case in testcases: + try: + res = case[2] if case[1]() else 0 + except: + print("[Runtime Error in {}]".format(case[0])) + res = 0 + score += res + print("+ {:12} {}/{}".format(case[0], res, case[2])) + score += test_model() + print("{:14} {}/60".format("FINAL SCORE", score)) diff --git a/assignment-2/submission/18307130104/torch_mnist.py b/assignment-2/submission/18307130104/torch_mnist.py new file mode 100644 index 0000000000000000000000000000000000000000..6d3e214c7606e3d43dac4b94554f942508afffb3 --- /dev/null +++ b/assignment-2/submission/18307130104/torch_mnist.py @@ -0,0 +1,73 @@ +import torch +from utils import mini_batch, batch, download_mnist, get_torch_initialization, one_hot, plot_curve + + +class TorchModel: + + def __init__(self): + self.W1 = torch.randn((28 * 28, 256), requires_grad=True) + self.W2 = torch.randn((256, 64), requires_grad=True) + self.W3 = torch.randn((64, 10), requires_grad=True) + self.softmax_input = None + self.log_input = None + + def forward(self, x): + x = x.reshape(-1, 28 * 28) + x = torch.relu(torch.matmul(x, self.W1)) + x = torch.relu(torch.matmul(x, self.W2)) + x = torch.matmul(x, self.W3) + + self.softmax_input = x + self.softmax_input.retain_grad() + + x = torch.softmax(x, 1) + + self.log_input = x + self.log_input.retain_grad() + + x = torch.log(x) + + return x + + def optimize(self, learning_rate): + with torch.no_grad(): + self.W1 -= learning_rate * self.W1.grad + self.W2 -= learning_rate * self.W2.grad + self.W3 -= learning_rate * self.W3.grad + + self.W1.grad = None + self.W2.grad = None + self.W3.grad = None + + +def torch_run(): + train_dataset, test_dataset = download_mnist() + + model = TorchModel() + model.W1.data, model.W2.data, model.W3.data = get_torch_initialization(numpy=False) + + train_loss = [] + + epoch_number = 3 + learning_rate = 0.1 + + for epoch in range(epoch_number): + for x, y in mini_batch(train_dataset, numpy=False): + y = one_hot(y, numpy=False) + + y_pred = model.forward(x) + loss = (-y_pred * y).sum(dim=1).mean() + loss.backward() + model.optimize(learning_rate) + + train_loss.append(loss.item()) + + x, y = batch(test_dataset, numpy=False)[0] + accuracy = model.forward(x).argmax(dim=1).eq(y).float().mean().item() + print('[{}] Accuracy: {:.4f}'.format(epoch, accuracy)) + + plot_curve(train_loss) + + +if __name__ == "__main__": + torch_run() diff --git a/assignment-2/submission/18307130104/utils.py b/assignment-2/submission/18307130104/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..274566a51dc9718158d63b6aa59546381d939223 --- /dev/null +++ b/assignment-2/submission/18307130104/utils.py @@ -0,0 +1,83 @@ +import torch +import numpy as np +from matplotlib import pyplot as plt + +def plot_curve(data): + plt.plot(range(len(data)), data, color='blue') + plt.legend(['loss_value'], loc='upper right') + plt.xlabel('step') + plt.ylabel('value') + plt.xlim(-100,5000) + plt.savefig('./img/result.png') + plt.close() + plt.show() + + +def download_mnist(): + from torchvision import datasets, transforms + + transform = transforms.Compose([ + transforms.ToTensor(), + transforms.Normalize(mean=(0.1307,), std=(0.3081,)) + ]) + + train_dataset = datasets.MNIST(root="./data/", transform=transform, train=True, download=True) + test_dataset = datasets.MNIST(root="./data/", transform=transform, train=False, download=True) + + return train_dataset, test_dataset + + +def one_hot(y, numpy=True): + if numpy: + y_ = np.zeros((y.shape[0], 10)) + y_[np.arange(y.shape[0], dtype=np.int32), y] = 1 + return y_ + else: + y_ = torch.zeros((y.shape[0], 10)) + y_[torch.arange(y.shape[0], dtype=torch.long), y] = 1 + return y_ + + +def batch(dataset, numpy=True): + data = [] + label = [] + for each in dataset: + data.append(each[0]) + label.append(each[1]) + data = torch.stack(data) + label = torch.LongTensor(label) + if numpy: + return [(data.numpy(), label.numpy())] + else: + return [(data, label)] + + +def mini_batch(dataset, batch_size=128, numpy=False): + if numpy: + import random + datas = [(each[0].numpy(), each[1]) for each in dataset] + random.shuffle(datas) + datat = [each[0] for each in datas] + labelt = [each[1] for each in datas] + data = [np.array(datat[i: i + batch_size]) for i in range(0, len(datat), batch_size)] + label = [np.array(labelt[i: i + batch_size]) for i in range(0, len(datat), batch_size)] + return zip(data, label) + else: + return torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True) + + +def get_torch_initialization(numpy=True): + fc1 = torch.nn.Linear(28 * 28, 256) + fc2 = torch.nn.Linear(256, 64) + fc3 = torch.nn.Linear(64, 10) + + if numpy: + W1 = fc1.weight.T.detach().clone().numpy() + W2 = fc2.weight.T.detach().clone().numpy() + W3 = fc3.weight.T.detach().clone().numpy() + else: + W1 = fc1.weight.T.detach().clone().data + W2 = fc2.weight.T.detach().clone().data + W3 = fc3.weight.T.detach().clone().data + + return W1, W2, W3