diff --git a/assignment-2/submission/18307130104/README.md b/assignment-2/submission/18307130104/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..d1d38cfc70c1a72658e9d0fa1cf8569687ab9e45
--- /dev/null
+++ b/assignment-2/submission/18307130104/README.md
@@ -0,0 +1,179 @@
+18307130104
+
+# 课程报告
+
+这是 prml 的 assignment-2 课程报告，我的代码可以查看 numpy_fnn.py 中 code 1 ~ code 7 部分，以及 util.py 中 mini_batch 函数 numpy == True 的部分。
+
+在 assignment-2 中，完成了 numpy_fnn.py 中各种算子的反向传播，以及一个简单的前馈神经网络构建（包括正向传播和反向传播）。修改了 mini_batch，在 numpy == True 的情况下，不使用 torch 中的 dataloader 函数完成测试集的打乱和分批。
+
+## 模型实现
+
+为了区别矩阵乘法（np.matmul）和矩阵元素逐一做乘法（\*），下面用$\times$表示矩阵乘法，\*表示元素逐一相乘。
+
+### Matmul 算子的反向传播
+
+Matmul 算子输入一个 X 和权重 W，输出 $$[Y] = [X] \times [W]$$
+
+对于 Y 中的元素 $$Y_{ij}$$ 有$$Y_{ij}=\sum_{k}X_{ik} * W_{kj}$$
+
+在计算 grad_x 的时候，已知 grad_y，根据链式法则，可以得到 $gradx_{ij}=\sum_{k}\frac{\partial Y_{ik}}{\partial X_{ij}} * grady_{ik}$
+
+由 $Y_{ij}$的计算公式可以得到，$\frac{\partial Y_{ik}}{\partial X_{ij}}=W_{jk}$
+
+故 $gradx_{ij}=\sum_k W_{jk} *grady_{ik}$
+
+所以 $[gradx] = [grady] \times [W^T]$
+
+同理，可以得到$[gradW]=[x^T]\times [grady]$
+
+经过验证，矩阵的大小符合矩阵乘法规则。
+
+### Relu 算子的反向传播
+
+relu 函数的计算规则如下：
+
+$relu(x) = \begin{cases}0 & x < 0 \\\\ x & otherwise \end{cases}$
+
+求导可以得到
+
+$relu^{'}(x) = \begin{cases}0 & x < 0 \\\\ 1 & otherwise \end{cases}$
+
+故
+
+$[relugrad]=[grady]* [relu^{'}]$
+
+### Log 算子的反向传播
+
+$log(x) = \ln x$
+
+可以得到
+
+$log^{'}(x)=\frac 1 x$
+
+故
+
+$[loggrad]=[grady]* [log^{'}]$
+
+### softmax 算子的反向传播
+
+$softmax(x_i) = \frac {e^{x_i}}{\sum_j e^{x_j}}$
+
+在实现过程中，因为每一行代表一个测试数据点，所以以每一行为整体对每个元素进行 softmax 操作，从而达成对每个测试数据点进行分类的目的。
+
+采用 softmax 算子和交叉熵损失函数可以让损失函数的形式比较简单，但是遗憾的是实现的神经网络要求将两个算子的反向传播操作分开，因此没有办法投机取巧，只能分步进行计算。
+
+为了表达方便，不妨令 $a_i = softmax(x_i)$
+
+下面考虑$a_i$对$x_j$的反向传播。
+
+$a_i = \frac{e^{x_i}}{\sum_k e^{x_k}}$
+
+$\frac {\partial a_i}{\partial x_j}=\frac{\partial}{\partial x_j}(\frac{e^{x_i}}{\sum_k e^{x_k}})$
+
+接下来根据 i 和 j 是否相等分情况进行讨论。
+
+若 i == j，则 $\frac{\partial}{\partial x_j}(\frac{e^{x_i}}{\sum_k e^{x_k}})=\frac{e^{x_i}(\sum_j e^{x_j})-e^{z_i}e^{z_i}}{(\sum_k e^{x_k})^2}=a_i(1-a_i)$
+
+若 i != j，则$\frac{\partial}{\partial x_j}(\frac{e^{x_i}}{\sum_k e^{x_k}})=-\frac{e^{x_i}e^{x_j}}{(\sum_k e^{x_k})^2}=-a_ia_j$
+
+结合 grady，可以得到
+
+$gradx_{ij}=\sum_k \frac{\partial}{\partial x_j}(\frac{e^{x_k}}{\sum_w e^{x_w}}) grady_{ik}$
+
+由于这个梯度的计算需要进行分类讨论，我没有想到可以直接用 numpy 中函数进行计算的方法，所以首先计算出一个 list 再转换成 ndarray 进行返回。
+
+### 模型正向传播
+
+模型每一层的输出作为下一层的输入，最后得到的是经过 Log 计算的 softmax 结果，这样就能很方便的进行交叉熵损失函数的计算。同时经过“模型反向传播”中的分析可以知道，这样设计使反向传播时的输入也非常简便。
+
+### 模型反向传播
+
+模型进行反向传播的时候会输入一个每行为一个独热向量的矩阵，表示每个数据集的类别，初始代码中会将矩阵中所有元素都除以矩阵的大小，但是经过的尝试，需要将所有元素除以训练数据的组数才能保证结果正确。~~同时，虽然通过了测试，但是 softmax 层的输出也和 torch 中的结果有不同，而后面层的输出是正确的。我认定我理解的 softmax 层和 torch 实现的 softmax 层有一定区别。~~
+
+在更改了测试代码之后，输出和 torch 层比较接近，可以认定是正确的。
+
+接下来推导反向传播时 Log 层的输入。
+
+交叉熵损失函数的形式为
+
+$Loss = -\sum_k t_k*\ln a_k$
+
+其中 $t_k$表示是否属于第 k 个类别，$a_k$为 softmax 层的输出，Log 层的输出为$\ln a_k$，则$\frac{\partial Loss}{\partial \ln a_k}=-t_k$
+
+因此，将输入到反向传播的矩阵 T 取反作为 Log 层的反向传播输入，然后将结果作为前一层的输入逐一反向传播。
+
+## 模型训练
+
+随着训练轮数增长，训练的正确率如下
+
+learning_rate = 0.1    mini_batch = 128
+
+> [0] Accuracy: 0.9403<br>
+> [1] Accuracy: 0.9641<br>
+> [2] Accuracy: 0.9716<br>
+> [3] Accuracy: 0.9751<br>
+> [4] Accuracy: 0.9772<br>
+> [5] Accuracy: 0.9782<br>
+> [6] Accuracy: 0.9745<br>
+> [7] Accuracy: 0.9807<br>
+> [8] Accuracy: 0.9790<br>
+> [9] Accuracy: 0.9811
+
+损失随训练轮数变化如下图所示
+
+<img src="./img/result-1.png" alt="loss" style="zoom:50%;" />
+
+可以看到，正确率随着训练稳步上升，在 6 轮之后，数字基本稳定，仅仅有略微的上下波动。
+
+learning_rate = 0.1    mini_batch = 32
+
+> [0] Accuracy: 0.9646<br>
+> [1] Accuracy: 0.9726<br>
+> [2] Accuracy: 0.9768<br>
+> [3] Accuracy: 0.9788<br>
+> [4] Accuracy: 0.9792<br>
+> [5] Accuracy: 0.9770<br>
+> [6] Accuracy: 0.9820<br>
+> [7] Accuracy: 0.9808<br>
+> [8] Accuracy: 0.9822<br>
+> [9] Accuracy: 0.9835
+
+<img src="./img/result-2.png" alt="loss" style="zoom:50%;" />
+
+可以看到，由于 mini_batch 从 128 变成 32，损失随着轮数的变化会有比较大的起伏。
+
+learning_rate = 0.2    mini_batch = 128
+
+> [0] Accuracy: 0.9295<br>
+> [1] Accuracy: 0.9688<br>
+> [2] Accuracy: 0.9753<br>
+> [3] Accuracy: 0.9734<br>
+> [4] Accuracy: 0.9793<br>
+> [5] Accuracy: 0.9777<br>
+> [6] Accuracy: 0.9792<br>
+> [7] Accuracy: 0.9807<br>
+> [8] Accuracy: 0.9821<br>
+> [9] Accuracy: 0.9815
+
+<img src="./img/result-3.png" alt="loss" style="zoom:50%;" />
+
+虽然调高了学习率，但是损失并没有因此产生比较大的起伏，仍然表现出非常好的效果。
+
+learning_rate = 0.05 mini_batch = 128
+
+> [0] Accuracy: 0.9310<br>
+> [1] Accuracy: 0.9504<br>
+> [2] Accuracy: 0.9601<br>
+> [3] Accuracy: 0.9661<br>
+> [4] Accuracy: 0.9691<br>
+> [5] Accuracy: 0.9728<br>
+> [6] Accuracy: 0.9749<br>
+> [7] Accuracy: 0.9761<br>
+> [8] Accuracy: 0.9768<br>
+> [9] Accuracy: 0.9752
+
+<img src="./img/result-5.png" alt="loss" style="zoom:50%;" />
+
+降低了学习率之后，可以看到正确率的增长比较缓慢，但是经过几轮训练之后的结果和高学习率的时候差不多。
+
+综合来看，影响最终正确率的主要还是模型本身的学习能力，一定范围内修改学习率和 mini_batch 对结果的影响不大。采用 mini_batch 的方式训练有助于降低训练过程中损失的波动。
\ No newline at end of file
diff --git a/assignment-2/submission/18307130104/img/result-1.png b/assignment-2/submission/18307130104/img/result-1.png
new file mode 100644
index 0000000000000000000000000000000000000000..11c6fba6be9d6f58a463830a5d8c006ad64af963
Binary files /dev/null and b/assignment-2/submission/18307130104/img/result-1.png differ
diff --git a/assignment-2/submission/18307130104/img/result-2.png b/assignment-2/submission/18307130104/img/result-2.png
new file mode 100644
index 0000000000000000000000000000000000000000..3f9aa1a2ed643f738f7d9ff59ea1923891048166
Binary files /dev/null and b/assignment-2/submission/18307130104/img/result-2.png differ
diff --git a/assignment-2/submission/18307130104/img/result-3.png b/assignment-2/submission/18307130104/img/result-3.png
new file mode 100644
index 0000000000000000000000000000000000000000..1e7d29f9f43741b83d6ac43ecf4b6c448c8c1141
Binary files /dev/null and b/assignment-2/submission/18307130104/img/result-3.png differ
diff --git a/assignment-2/submission/18307130104/img/result-4.png b/assignment-2/submission/18307130104/img/result-4.png
new file mode 100644
index 0000000000000000000000000000000000000000..2a1f550db001bdcc1d3a3b9501dba56a13028e8e
Binary files /dev/null and b/assignment-2/submission/18307130104/img/result-4.png differ
diff --git a/assignment-2/submission/18307130104/img/result-5.png b/assignment-2/submission/18307130104/img/result-5.png
new file mode 100644
index 0000000000000000000000000000000000000000..7ee7df630e01d83559e9f316a937df107e98248d
Binary files /dev/null and b/assignment-2/submission/18307130104/img/result-5.png differ
diff --git a/assignment-2/submission/18307130104/img/result.png b/assignment-2/submission/18307130104/img/result.png
new file mode 100644
index 0000000000000000000000000000000000000000..0039ef8029c07eeb75caa2efd42c13aeba61ce5a
Binary files /dev/null and b/assignment-2/submission/18307130104/img/result.png differ
diff --git a/assignment-2/submission/18307130104/numpy_fnn.py b/assignment-2/submission/18307130104/numpy_fnn.py
new file mode 100644
index 0000000000000000000000000000000000000000..ba780e9edb71ec687ddf7d295973be810848ce79
--- /dev/null
+++ b/assignment-2/submission/18307130104/numpy_fnn.py
@@ -0,0 +1,214 @@
+import numpy as np
+
+
+class NumpyOp:
+    
+    def __init__(self):
+        self.memory = {}
+        self.epsilon = 1e-12
+
+
+class Matmul(NumpyOp):
+    
+    def forward(self, x, W):
+        """
+        x: shape(N, d)
+        w: shape(d, d')
+        """
+        self.memory['x'] = x
+        self.memory['W'] = W
+        h = np.matmul(x, W)
+        return h
+    
+    def backward(self, grad_y):
+        """
+        grad_y: shape(N, d')
+        """
+        
+        ####################
+        #      code 1      #
+        ####################
+        grad_x = np.matmul(grad_y, self.memory['W'].T)
+        grad_W = np.matmul(self.memory['x'].T, grad_y)
+        
+        return grad_x, grad_W
+
+
+class Relu(NumpyOp):
+    
+    def forward(self, x):
+        self.memory['x'] = x
+        return np.where(x > 0, x, np.zeros_like(x))
+    
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+        
+        ####################
+        #      code 2      #
+        ####################
+        grad_x = grad_y * np.where(self.memory['x'] > 0, np.ones_like(self.memory['x']), np.zeros_like(self.memory['x']))
+        
+        return grad_x
+
+
+class Log(NumpyOp):
+    
+    def forward(self, x):
+        """
+        x: shape(N, c)
+        """
+        
+        out = np.log(x + self.epsilon)
+        self.memory['x'] = x
+        
+        return out
+    
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+        
+        ####################
+        #      code 3      #
+        ####################
+        grad_x = grad_y * np.reciprocal(self.memory['x'] + self.epsilon)
+        
+        return grad_x
+
+
+class Softmax(NumpyOp):
+    """
+    softmax over last dimension
+    """
+    
+    def forward(self, x):
+        """
+        x: shape(N, c)
+        """
+        
+        ####################
+        #      code 4      #
+        ####################
+        self.memory['x'] = x
+        expx = np.exp(x)
+        sumx = np.sum(expx, axis = 1, keepdims = True)
+        return (expx / sumx)
+    
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+        
+        ####################
+        #      code 5      #
+        ####################
+        
+        x = self.memory['x']
+        softx = self.forward(x)
+        # print(sumx.shape)
+        [n, m] = x.shape
+        out = []
+        # print(grad_y)
+        for i in range(n):
+            out.append([])
+            for j in range(m):
+                out[i].append(0)
+                for k in range(m):
+                    if j == k:
+                        # print(softx[i][k], grad_y[i][k])
+                        out[i][j] += (1 - softx[i][k]) * softx[i][k] * grad_y[i][k]
+                    else:
+                        out[i][j] += -softx[i][j] * softx[i][k] * grad_y[i][k]
+        grad_x = np.array(out)
+
+        return grad_x
+
+
+class NumpyLoss:
+    
+    def __init__(self):
+        self.target = None
+    
+    def get_loss(self, pred, target):
+        self.target = target
+        return (-pred * target).sum(axis=1).mean()
+    
+    def backward(self):
+        return -self.target / self.target.shape[0]
+
+
+class NumpyModel:
+    def __init__(self):
+        self.W1 = np.random.normal(size=(28 * 28, 256))
+        self.W2 = np.random.normal(size=(256, 64))
+        self.W3 = np.random.normal(size=(64, 10))
+        
+        # 以下算子会在 forward 和 backward 中使用
+        self.matmul_1 = Matmul()
+        self.relu_1 = Relu()
+        self.matmul_2 = Matmul()
+        self.relu_2 = Relu()
+        self.matmul_3 = Matmul()
+        self.softmax = Softmax()
+        self.log = Log()
+        
+        # 以下变量需要在 backward 中更新。 softmax_grad, log_grad 等为算子反向传播的梯度（ loss 关于算子输入的偏导）
+        self.x1_grad, self.W1_grad = None, None
+        self.relu_1_grad = None
+        self.x2_grad, self.W2_grad = None, None
+        self.relu_2_grad = None
+        self.x3_grad, self.W3_grad = None, None
+        self.softmax_grad = None
+        self.log_grad = None
+    
+    def forward(self, x):
+        x = x.reshape(-1, 28 * 28)
+        
+        ####################
+        #      code 6      #
+        ####################
+        x = self.matmul_1.forward(x, self.W1)
+        x = self.relu_1.forward(x)
+        x = self.matmul_2.forward(x, self.W2)
+        x = self.relu_2.forward(x)
+        x = self.matmul_3.forward(x, self.W3)
+        x = self.softmax.forward(x)
+        # print(x)
+        x = self.log.forward(x)
+        
+        return x
+    
+    def backward(self, y):
+        
+        ####################
+        #      code 7      #
+        ####################
+
+        y = self.log.backward(y)
+        self.log_grad = y
+
+        y = self.softmax.backward(y)
+        self.softmax_grad = y
+
+        y, self.W3_grad = self.matmul_3.backward(y)
+        self.x3_grad = y
+
+        y = self.relu_2.backward(y)
+        self.relu_2_grad = y
+
+        y, self.W2_grad = self.matmul_2.backward(y)
+        self.x2_grad = y
+
+        y = self.relu_1.backward(y)
+        self.relu_1_grad = y
+
+        y, self.W1_grad = self.matmul_1.backward(y)
+        self.x1_grad = y
+        return y
+    
+    def optimize(self, learning_rate):
+        self.W1 -= learning_rate * self.W1_grad
+        self.W2 -= learning_rate * self.W2_grad
+        self.W3 -= learning_rate * self.W3_grad
diff --git a/assignment-2/submission/18307130104/numpy_mnist.py b/assignment-2/submission/18307130104/numpy_mnist.py
new file mode 100644
index 0000000000000000000000000000000000000000..5f7aaadd84d701b578d384df3d4976f5c76a5dfa
--- /dev/null
+++ b/assignment-2/submission/18307130104/numpy_mnist.py
@@ -0,0 +1,38 @@
+import numpy as np
+from numpy_fnn import NumpyModel, NumpyLoss
+from utils import download_mnist, batch, mini_batch, get_torch_initialization, plot_curve, one_hot
+
+
+def numpy_run():
+    train_dataset, test_dataset = download_mnist()
+    
+    model = NumpyModel()
+    numpy_loss = NumpyLoss()
+    model.W1, model.W2, model.W3 = get_torch_initialization()
+    
+    train_loss = []
+    
+    epoch_number = 3
+    learning_rate = 0.1
+    
+    for epoch in range(epoch_number):
+        for x, y in mini_batch(train_dataset, 128, True):
+            y = one_hot(y)
+            
+            y_pred = model.forward(x)
+            loss = numpy_loss.get_loss(y_pred, y)
+
+            model.backward(numpy_loss.backward())
+            model.optimize(learning_rate)
+            
+            train_loss.append(loss.item())
+        
+        x, y = batch(test_dataset)[0]
+        accuracy = np.mean((model.forward(x).argmax(axis=1) == y))
+        print('[{}] Accuracy: {:.4f}'.format(epoch, accuracy))
+    
+    plot_curve(train_loss)
+
+
+if __name__ == "__main__":
+    numpy_run()
diff --git a/assignment-2/submission/18307130104/tester_demo.py b/assignment-2/submission/18307130104/tester_demo.py
new file mode 100644
index 0000000000000000000000000000000000000000..df4bb27bc0d8b9f28f5abd09faff7635d8347792
--- /dev/null
+++ b/assignment-2/submission/18307130104/tester_demo.py
@@ -0,0 +1,183 @@
+import numpy as np
+import torch
+from torch import matmul as torch_matmul, relu as torch_relu, softmax as torch_softmax, log as torch_log
+
+from numpy_fnn import Matmul, Relu, Softmax, Log, NumpyModel, NumpyLoss
+from torch_mnist import TorchModel
+from utils import get_torch_initialization, one_hot
+
+err_epsilon = 1e-6
+err_p = 0.4
+
+
+def check_result(numpy_result, torch_result=None):
+    if isinstance(numpy_result, list) and torch_result is None:
+        flag = True
+        for (n, t) in numpy_result:
+            flag = flag and check_result(n, t)
+        return flag
+    # print((torch.from_numpy(numpy_result) - torch_result).abs().mean().item())
+    T = (torch_result * torch.from_numpy(numpy_result) < 0).sum().item()
+    direction = T / torch_result.numel() < err_p
+
+    return direction and ((torch.from_numpy(numpy_result) - torch_result).abs().mean() < err_epsilon).item()
+
+
+def case_1():
+    x = np.random.normal(size=[5, 6])
+    W = np.random.normal(size=[6, 4])
+    
+    numpy_matmul = Matmul()
+    numpy_out = numpy_matmul.forward(x, W)
+    numpy_x_grad, numpy_W_grad = numpy_matmul.backward(np.ones_like(numpy_out))
+    
+    torch_x = torch.from_numpy(x).clone().requires_grad_()
+    torch_W = torch.from_numpy(W).clone().requires_grad_()
+    
+    torch_out = torch_matmul(torch_x, torch_W)
+    torch_out.sum().backward()
+    
+    return check_result([
+        (numpy_out, torch_out),
+        (numpy_x_grad, torch_x.grad),
+        (numpy_W_grad, torch_W.grad)
+    ])
+
+
+def case_2():
+    x = np.random.normal(size=[5, 6])
+    
+    numpy_relu = Relu()
+    numpy_out = numpy_relu.forward(x)
+    numpy_x_grad = numpy_relu.backward(np.ones_like(numpy_out))
+    
+    torch_x = torch.from_numpy(x).clone().requires_grad_()
+    
+    torch_out = torch_relu(torch_x)
+    torch_out.sum().backward()
+    
+    return check_result([
+        (numpy_out, torch_out),
+        (numpy_x_grad, torch_x.grad),
+    ])
+
+
+def case_3():
+    x = np.random.uniform(low=0.0, high=1.0, size=[3, 4])
+    
+    numpy_log = Log()
+    numpy_out = numpy_log.forward(x)
+    numpy_x_grad = numpy_log.backward(np.ones_like(numpy_out))
+    
+    torch_x = torch.from_numpy(x).clone().requires_grad_()
+    
+    torch_out = torch_log(torch_x)
+    torch_out.sum().backward()
+    
+    return check_result([
+        (numpy_out, torch_out),
+        
+        (numpy_x_grad, torch_x.grad),
+    ])
+
+
+def case_4():
+    x = np.random.normal(size=[4, 5])
+    
+    numpy_softmax = Softmax()
+    numpy_out = numpy_softmax.forward(x)
+    
+    torch_x = torch.from_numpy(x).clone().requires_grad_()
+    
+    torch_out = torch_softmax(torch_x, 1)
+    
+    return check_result(numpy_out, torch_out)
+
+
+def case_5():
+    x = np.random.normal(size=[20, 25])
+    
+    numpy_softmax = Softmax()
+    numpy_out = numpy_softmax.forward(x)
+    numpy_x_grad = numpy_softmax.backward(np.ones_like(numpy_out))
+    
+    torch_x = torch.from_numpy(x).clone().requires_grad_()
+    
+    torch_out = torch_softmax(torch_x, 1)
+    torch_out.sum().backward()
+    
+    return check_result([
+        (numpy_out, torch_out),
+        (numpy_x_grad, torch_x.grad),
+    ])
+
+
+def test_model():
+    try:
+        numpy_loss = NumpyLoss()
+        numpy_model = NumpyModel()
+        torch_model = TorchModel()
+        torch_model.W1.data, torch_model.W2.data, torch_model.W3.data = get_torch_initialization(numpy=False)
+        numpy_model.W1 = torch_model.W1.detach().clone().numpy()
+        numpy_model.W2 = torch_model.W2.detach().clone().numpy()
+        numpy_model.W3 = torch_model.W3.detach().clone().numpy()
+        
+        x = torch.randn((10000, 28, 28))
+        y = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 0] * 1000)
+        
+        y = one_hot(y, numpy=False)
+        x2 = x.numpy()
+        y_pred = torch_model.forward(x)
+        loss = (-y_pred * y).sum(dim=1).mean()
+        loss.backward()
+        
+        y_pred_numpy = numpy_model.forward(x2)
+        numpy_loss.get_loss(y_pred_numpy, y.numpy())
+        
+        check_flag_1 = check_result(y_pred_numpy, y_pred)
+        print("+ {:12} {}/{}".format("forward", 10 * check_flag_1, 10))
+    except:
+        print("[Runtime Error in forward]")
+        print("+ {:12} {}/{}".format("forward", 0, 10))
+        return 0
+    
+    try:
+        
+        numpy_model.backward(numpy_loss.backward())
+        
+        check_flag_2 = [
+            check_result(numpy_model.log_grad, torch_model.log_input.grad),
+            check_result(numpy_model.softmax_grad, torch_model.softmax_input.grad),
+            check_result(numpy_model.W3_grad, torch_model.W3.grad),
+            check_result(numpy_model.W2_grad, torch_model.W2.grad),
+            check_result(numpy_model.W1_grad, torch_model.W1.grad)
+        ]
+        check_flag_2 = sum(check_flag_2) >= 4
+        print("+ {:12} {}/{}".format("backward", 20 * check_flag_2, 20))
+    except:
+        print("[Runtime Error in backward]")
+        print("+ {:12} {}/{}".format("backward", 0, 20))
+        check_flag_2 = False
+    
+    return 10 * check_flag_1 + 20 * check_flag_2
+
+
+if __name__ == "__main__":
+    testcases = [
+        ["matmul", case_1, 5],
+        ["relu", case_2, 5],
+        ["log", case_3, 5],
+        ["softmax_1", case_4, 5],
+        ["softmax_2", case_5, 10],
+    ]
+    score = 0
+    for case in testcases:
+        try:
+            res = case[2] if case[1]() else 0
+        except:
+            print("[Runtime Error in {}]".format(case[0]))
+            res = 0
+        score += res
+        print("+ {:12} {}/{}".format(case[0], res, case[2]))
+    score += test_model()
+    print("{:14} {}/60".format("FINAL SCORE", score))
diff --git a/assignment-2/submission/18307130104/torch_mnist.py b/assignment-2/submission/18307130104/torch_mnist.py
new file mode 100644
index 0000000000000000000000000000000000000000..6d3e214c7606e3d43dac4b94554f942508afffb3
--- /dev/null
+++ b/assignment-2/submission/18307130104/torch_mnist.py
@@ -0,0 +1,73 @@
+import torch
+from utils import mini_batch, batch, download_mnist, get_torch_initialization, one_hot, plot_curve
+
+
+class TorchModel:
+    
+    def __init__(self):
+        self.W1 = torch.randn((28 * 28, 256), requires_grad=True)
+        self.W2 = torch.randn((256, 64), requires_grad=True)
+        self.W3 = torch.randn((64, 10), requires_grad=True)
+        self.softmax_input = None
+        self.log_input = None
+    
+    def forward(self, x):
+        x = x.reshape(-1, 28 * 28)
+        x = torch.relu(torch.matmul(x, self.W1))
+        x = torch.relu(torch.matmul(x, self.W2))
+        x = torch.matmul(x, self.W3)
+        
+        self.softmax_input = x
+        self.softmax_input.retain_grad()
+        
+        x = torch.softmax(x, 1)
+        
+        self.log_input = x
+        self.log_input.retain_grad()
+        
+        x = torch.log(x)
+        
+        return x
+    
+    def optimize(self, learning_rate):
+        with torch.no_grad():
+            self.W1 -= learning_rate * self.W1.grad
+            self.W2 -= learning_rate * self.W2.grad
+            self.W3 -= learning_rate * self.W3.grad
+            
+            self.W1.grad = None
+            self.W2.grad = None
+            self.W3.grad = None
+
+
+def torch_run():
+    train_dataset, test_dataset = download_mnist()
+    
+    model = TorchModel()
+    model.W1.data, model.W2.data, model.W3.data = get_torch_initialization(numpy=False)
+    
+    train_loss = []
+    
+    epoch_number = 3
+    learning_rate = 0.1
+    
+    for epoch in range(epoch_number):
+        for x, y in mini_batch(train_dataset, numpy=False):
+            y = one_hot(y, numpy=False)
+            
+            y_pred = model.forward(x)
+            loss = (-y_pred * y).sum(dim=1).mean()
+            loss.backward()
+            model.optimize(learning_rate)
+            
+            train_loss.append(loss.item())
+        
+        x, y = batch(test_dataset, numpy=False)[0]
+        accuracy = model.forward(x).argmax(dim=1).eq(y).float().mean().item()
+        print('[{}] Accuracy: {:.4f}'.format(epoch, accuracy))
+    
+    plot_curve(train_loss)
+
+
+if __name__ == "__main__":
+    torch_run()
diff --git a/assignment-2/submission/18307130104/utils.py b/assignment-2/submission/18307130104/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..274566a51dc9718158d63b6aa59546381d939223
--- /dev/null
+++ b/assignment-2/submission/18307130104/utils.py
@@ -0,0 +1,83 @@
+import torch
+import numpy as np
+from matplotlib import pyplot as plt
+
+def plot_curve(data):
+    plt.plot(range(len(data)), data, color='blue')
+    plt.legend(['loss_value'], loc='upper right')
+    plt.xlabel('step')
+    plt.ylabel('value')
+    plt.xlim(-100,5000)
+    plt.savefig('./img/result.png')
+    plt.close()
+    plt.show()
+
+
+def download_mnist():
+    from torchvision import datasets, transforms
+    
+    transform = transforms.Compose([
+        transforms.ToTensor(),
+        transforms.Normalize(mean=(0.1307,), std=(0.3081,))
+    ])
+    
+    train_dataset = datasets.MNIST(root="./data/", transform=transform, train=True, download=True)
+    test_dataset = datasets.MNIST(root="./data/", transform=transform, train=False, download=True)
+    
+    return train_dataset, test_dataset
+
+
+def one_hot(y, numpy=True):
+    if numpy:
+        y_ = np.zeros((y.shape[0], 10))
+        y_[np.arange(y.shape[0], dtype=np.int32), y] = 1
+        return y_
+    else:
+        y_ = torch.zeros((y.shape[0], 10))
+        y_[torch.arange(y.shape[0], dtype=torch.long), y] = 1
+    return y_
+
+
+def batch(dataset, numpy=True):
+    data = []
+    label = []
+    for each in dataset:
+        data.append(each[0])
+        label.append(each[1])
+    data = torch.stack(data)
+    label = torch.LongTensor(label)
+    if numpy:
+        return [(data.numpy(), label.numpy())]
+    else:
+        return [(data, label)]
+
+
+def mini_batch(dataset, batch_size=128, numpy=False):
+    if numpy:
+        import random
+        datas = [(each[0].numpy(), each[1]) for each in dataset]
+        random.shuffle(datas)
+        datat = [each[0] for each in datas]
+        labelt = [each[1] for each in datas]
+        data = [np.array(datat[i: i + batch_size]) for i in range(0, len(datat), batch_size)]
+        label = [np.array(labelt[i: i + batch_size]) for i in range(0, len(datat), batch_size)]
+        return zip(data, label)
+    else:
+        return torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)
+
+
+def get_torch_initialization(numpy=True):
+    fc1 = torch.nn.Linear(28 * 28, 256)
+    fc2 = torch.nn.Linear(256, 64)
+    fc3 = torch.nn.Linear(64, 10)
+    
+    if numpy:
+        W1 = fc1.weight.T.detach().clone().numpy()
+        W2 = fc2.weight.T.detach().clone().numpy()
+        W3 = fc3.weight.T.detach().clone().numpy()
+    else:
+        W1 = fc1.weight.T.detach().clone().data
+        W2 = fc2.weight.T.detach().clone().data
+        W3 = fc3.weight.T.detach().clone().data
+    
+    return W1, W2, W3