diff --git a/assignment-2/submission/17307130243/README.md b/assignment-2/submission/17307130243/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..a0f392231337f491e92e7b3e730d9f93652612a5
--- /dev/null
+++ b/assignment-2/submission/17307130243/README.md
@@ -0,0 +1,337 @@
+# Assignment2——前馈神经网络
+
+------
+
+## 算子反向传播推导与实现
+
+------
+
+以下记损失函数为$L$, 各算子前向输出为$Y$, $L$ 关于 $Y$ 的梯度为$\frac{\partial L}{\partial Y}$.
+
+### Matmul
+对输入$X$的每一行$x$，记$Y$的对应行为$y$，则$xW=y$.  
+由向量值函数求导公式$\frac{\partial y}{\partial x}=W^T$以及链式法则可知$\frac{\partial L}{\partial x}=\frac{\partial L}{\partial y}\frac{\partial y}{\partial x}=\frac{\partial L}{\partial y}W^T$，于是$\frac{\partial L}{\partial X}=\frac{\partial L}{\partial Y}W^T$.  
+类似可得$\frac{\partial L}{\partial w}=X^T\frac{\partial L}{\partial Y}$.  
+
+代码如下  
+
+
+```{python}
+    def backward(self, grad_y):
+        """
+        grad_y: shape(N, d')
+        """
+        grad_x = np.matmul(grad_y, self.memory['W'].T)
+        grad_W = np.matmul(self.memory['x'].T, grad_y)
+
+        return grad_x, grad_W
+```
+
+### ReLU 
+
+$\frac{\partial y}{\partial x}=\begin{cases}0 \quad x < 0 \\\\ 1 \quad x\geq 0\end{cases}$  
+故由链式法则，每一行，$\frac{\partial L}{\partial x}=\frac{\partial L}{\partial y}\odot\frac{\partial y}{\partial x}=\begin{cases}0 \quad x < 0 \\\\  \frac{\partial L}{\partial y}\quad x\geq 0\end{cases}$
+
+```{python}
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+        # max(x, 0)
+        x = self.memory['x']
+        grad_x = np.where(x > 0, grad_y, np.zeros_like(grad_y))
+
+        return grad_x
+```
+
+### Log  
+
+$\frac{\partial y}{\partial x}=\frac{1}{x+\epsilon}$
+
+故由链式法则，每一行，$\frac{\partial L}{\partial x}=\frac{\partial L}{\partial y}\odot \frac{1}{x+\epsilon}$
+```{python}
+
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+        x = self.memory['x']
+        grad_x = (1 / (x + self.epsilon)) * grad_y
+
+        return grad_x
+```
+
+### Softmax  
+
+- forward  
+代码实现时，为了防止数据上溢，对原始输入中的每一行减去其最大值后再进行指数运算。 
+
+```{python}
+    def forward(self, x):
+        """
+        x: shape(N, c)
+        """
+        self.memory['x'] = x
+        exp = np.exp(x - np.max(x, axis=1, keepdims=True))
+        out = exp / np.sum(exp, axis=1, keepdims=True)
+        self.memory['out'] = out
+
+        return out
+```
+
+- backward  
+课本附录中，对每一行：
+
+![softmax](./img/softmax.png)
+
+代码实现如下：  
+
+```{python}
+
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+        out = self.memory['out']
+        J = np.array([np.diag(i) - np.outer(i, i) for i in out])
+
+        grad_y = grad_y[:, np.newaxis, :]
+        grad_x = np.matmul(grad_y, J).squeeze(axis=1)
+        
+        return grad_x
+
+```
+
+## 实验  
+
+------
+实验中主要修改了`numpy_mnist.py`中的`numpy_run`函数，使其可以接收参数`learning_rate`，`epoch_number`，`batch_size`，`optimizer`与`max_iter`五个参数，运用在`numpy_fnn`中定义的网络结构进行训练并对结果进行可视化。后续实验中通过对`numpy_run`传入不同的参数来分析与比较各参数对模型学习性能的影响。 
+
+### 网络搭建
+
+阅读文件`torch_mnist.py`中的`TorchModel`类代码，可知其中定义了如下所示的前馈神经网络结构：
+> 网络结构图  
+![网络模型结构图](./img/net_structure.png)
+
+> 计算图  
+![计算图](./img/net_structure-1.png)
+
+于是，根据网络结构，模仿`TorchModel`代码，利用上面用`numpy`实现的算子搭建该网络，具体代码如下：
+
+- foward
+
+```{python}
+    x = self.matmul_1.forward(x, self.W1)
+    x = self.relu_1.forward(x)
+
+    x = self.matmul_2.forward(x, self.W2)
+    x = self.relu_2.forward(x)
+
+    x = self.matmul_3.forward(x, self.W3)
+    x = self.softmax.forward(x)
+
+    x = self.log.forward(x)
+```
+
+- backward
+
+
+```{python}
+    self.log_grad = self.log.backward(y)
+
+    self.softmax_grad = self.softmax.backward(self.log_grad)
+    self.x3_grad, self.W3_grad = self.matmul_3.backward(self.softmax_grad)
+
+    self.relu_2_grad = self.relu_2.backward(self.x3_grad)
+    self.x2_grad, self.W2_grad = self.matmul_2.backward(self.relu_2_grad)
+
+    self.relu_1_grad = self.relu_1.backward(self.x2_grad)
+    self.x1_grad, self.W1_grad = self.matmul_1.backward(self.relu_1_grad)
+```
+
+### mini-batch的numpy实现
+
+用`numpy`实现的 mini-batch 方法位于文件`numpy_mnist` 的函数 `mini_batch`，其将数据集随机打乱后划分为指定大小(batch-size)的多个批次，后续优化中每步梯度下降轮流使用其中的单个批次来计算梯度。以下是原框架代码中调用了`PyTorch`中的`Dataloader`实现的 `mini-batch` 与实验中用`numpy`实现的`mini-batch`的分类效果对比实验的可视化结果。实验中，采用`MNIST`数据集，运用上面搭建的网络结构，令学习率为0.01，batch-size为128，epoch数量为5进行训练。可以看到，两者loss曲线相似，训练集准确率相近，说明`mini-batch`的numpy实现大致正确。
+
+![mini_batch对比](./img/mini_batch.png)
+
+> 左：使用原代码中mini-batch的实验结果
+
+> 右：使用自行实现的mini-batch的实验结果  
+> 
+|Accuracy|epoch 1|epoch 2|epoch 3|epoch 4|epoch 5|
+|--|--|--|--|--|--|
+|torch mini-batch|0.9510|0.9640|0.9729|0.9773|0.9773|
+|numpy mini-batch|0.9337|0.9636|0.9716|0.9748|0.9756|  
+
+
+### learning-rate实验   
+
+这部分实验对模型设定多种不同的学习率大小后来进行训练，并根据实验结果对学习率大小对模型性能的影响进行分析。   
+
+实验中所使用的学习率大小 ： 
+
+|learning_rate|0.0001|0.001|0.01|0.1|1|2|5|  
+|--|--|--|--|--|--|--|--|
+
+实验中所使用的其他参数(固定不变):  
+
+|epoch number|batch size|optimizer|  
+|--|--|--|  
+|15|128|SGD|
+
+#### 实验结果  
+
+> 不同学习率下 Accuracy与Epoch的关系图  
+![lr](./img/learning_rate.png)   
+
+从图中可以看到，当学习率过大(实验中为大于等于1的情况)时，模型的准确率维持在0.1左右保持不变，模型无法学习。由于模型中使用了ReLU作为激活函数，可以猜测这种情况的发生是由于学习率过大导致了梯度消失、死亡ReLU的现象。而学习率取其他待选值时，模型的训练集准确率都随训练步数而提高，且学习率越大，模型性能提升得越快。学习率取0.1时，模型在第一个epoch后就达到了很高的训练集准确率，并且在随后的训练中也总是胜过其他取值。学习率取0.0001时，模型准确率虽然在训练过程中不断上升，但其起点很低，提升速度也慢，说明这不是一个好的取值。  
+
+> 模型在不同学习率下，每50次迭代后的训练集损失 ：
+> 
+|learning rate = 0.0001|learning rate = 0.001|
+|--|--|  
+|![lr=0.0001](./img/lr_1.png)|![lr=0.001](./img/lr_2.png)|  
+|**learning rate = 0.01**|**learning rate = 0.1**|
+|![lr=0.01](./img/lr_3.png)|![lr=0.1](./img/lr_4.png)|  
+
+从图中可以看出，除了学习率取0.1的情况，其余取值下模型的训练集损失虽然以不同速度降低，但都有较大程度的波动，并未接近收敛。
+
+|Accuracy|epoch 1|epoch 3|epoch 5| epoch 7| epoch 9|epoch 11|epoch 13| epoch 15|   
+|--|--|--|--|--|--|--|--|--|  
+lr = 0.0001|0.1043|0.1578|0.2507|0.3498|0.4308|0.5035|0.5736|0.6268|
+|lr=0.001|0.4540|0.7319|0.8226|0.8503|0.8675|0.8798|0.8883|0.8944|  
+|lr=0.01|0.8692|0.9148|0.9287|0.9379|0.9447|0.9509|0.9566|0.9602|  
+|lr=0.1|0.9417|0.9637|0.9746|0.9774|0.9787|0.9705|0.9781|0.9801|  
+
+**总结**：为追求模型训练的速度与性能，学习率的选取不应过小。而当学习率过大时，可能会产生数据异常、梯度消失或者模型不稳定等现象，因此过大的学习率也应当规避。  
+
+### batch-size 实验 
+
+这部分实验对模型设定不同的batch-size并进行训练，分析batch-size对模型性能的影响。  
+
+实验中所选用的batch-size： 
+|batch-size|1|8|16|32|64|128|256|512|1024|
+|--|--|--|--|--|--|--|--|--|--|
+
+实验中使用的其他参数：
+|learning rate| epoch number| optimizer| max_iter|
+|--|--|--|--|
+|0.01|5|SGD|5000| 
+
+由于数据量一定，batch size的大小影响着梯度下降的迭代次数，为了研究batch size对模型性能的影响，需要消减迭代次数所带来的效果提升在其中的作用，所以代码中设置了`max_iter`为最大迭代次数，并将其设置为5000，迭代次数大于5000或epoch达到5时停止训练。  
+
+#### 实验结果
+
+![batch size](./img/batch_size.png)
+|batch size=1|batch size=16|batch size=32|batch size=64|  
+|--|--|--|--| 
+|![b=1](./img/b1_1.png)|![b=16](./img/b16_1.png)|![b=32](./img/b32_1.png)|![b=64](./img/b64_1.png)|
+|**batch size=128**|**batch size=256**|**batch size=512**|**batch size=1024**|  
+|![b=1](./img/b128_1.png)|![b=16](./img/b256_1.png)|![b=32](./img/b512_1.png)|![b=1024](./img/b1024_1.png)|  
+
+可以看到，随着 batch size 增加，由于批量数据与整体数据更相似，模型的表现更加稳定，但是loss下降变慢，训练速度变慢，因此batch size的选取也应当适中。
+
+
+### 优化方法实验
+
+------
+本部分参考了[CS231n](https://cs231n.github.io/neural-networks-3/#sgd).   
+
+#### SGD  
+
+SGD为上面实验中所采用的优化方法，其在每次迭代中计算一个batch的梯度，并根据固定的学习率与梯度的乘积来更新参数。如上面实验所见，在SGD方法下，选择适当的学习率是不容易的，过小则收敛缓慢，过大则可能造成异常，而且学习率是固定的这一点也不尽合理。以下为几种对梯度下降算法的改进，各有优势。  
+
+#### SGD with momentum
+Momentum方法借鉴了物理学中的动量概念，其保留一定的原始梯度，在更新梯度的时候利用它对当前梯度做微调，得到最终的更新方向，这可以增加模型的稳定性。  
+
+示意代码： 
+```{python}
+# Momentum update
+v = mu * v - learning_rate * dx # integrate velocity
+x += v # integrate position
+```
+
+#### AdaGrad  
+AdaGrad算法在训练过程中能对学习率根据梯度大小进行自动调整，使梯度大的参数对应的学习率更快地衰减，其示意代码如下：
+
+```{python}
+# Assume the gradient dx and parameter vector x
+cache += dx**2
+x += - learning_rate * dx / (np.sqrt(cache) + eps)
+```
+
+#### RMSprop 
+
+AdaGrad一大弊病是学习率会单调下降且下降速度较快，RMSProp试图避免这种情况，其利用梯度平方的均值来更新改变学习率，而非累加梯度的平方。  
+
+示意代码： 
+```{python}  
+cache = decay_rate * cache + (1 - decay_rate) * dx**2
+x += - learning_rate * dx / (np.sqrt(cache) + eps)
+```
+其中 decay_rate 一般取 0.9/0.99/0.999
+
+#### Adam  
+Adam是一种结合了上述多种方法优点的算法，其示意代码如下： 
+
+```{python}
+# t is your iteration counter going from 1 to infinity
+m = beta1*m + (1-beta1)*dx
+mt = m / (1-beta1**t)
+v = beta2*v + (1-beta2)*(dx**2)
+vt = v / (1-beta2**t)
+x += - learning_rate * mt / (np.sqrt(vt) + eps)
+```
+一般地，beta1取0.9，beta2取0.999， eps取1e-8  
+
+
+#### 实验
+
+代码中运用`numpy`对上述方法均予以实现，位于`numpy_fnn.py`文件中的类`Numpy_Model`中。
+
+实验中分别对模型设定不同的优化器进行训练，实验使用的其它参数如下：
+|learning rate|epoch number|batch size|  
+|--|--|--|  
+|0.002|10|128|  
+
+
+#### 实验结果
+
+![opt1](./img/opt_1.png)
+
+从图中可以看出，Adam算法使模型快速收敛，并有很高的准确率，优势很明显；RMSProp算法可能由于参数不匹配而出现了数值异常；Momentum算法与AdaGrad算法均比SGD稳定一些，但在实验所用模型上效果相近。
+
+
+### 权重初始化方法  
+
+在神经网络训练中，参数的初始化对迭代是否收敛，以及是否收敛到最优解等重要的问题有很大的影响。在本次实验所使用的前馈神经网络中，如果权重初始化不当，有可能出现梯度消失或梯度爆炸问题。实验的这一部分探究了原始代码中权重的初始化方式。  
+
+首先，通过阅读`utils.py`中的`get_torch_initialization`函数可以得知，代码中借用一个同样大小`torch.nn.Linear`线性层的初始化权重来对权重进行初始化。
+
+然后，阅读Pytorch的官方文档，可知`torch.nn.Linear`的权重初始化方式是`init.kaiming_uniform_(self.weight, a=math.sqrt(5))`，即kaiming均匀分布初始化，其规定权重矩阵的元素服从$(-bound, bound)$上均匀分布，这里$bound=\sqrt{\frac{6}{(1+a^2)d}}$， $d$为输入层的大小，$a$为非线性激活函数在负半轴上的导数值(leaky-ReLU，实验中为ReLU, 故$a$取0)。 
+
+于是，在`numpy_mnist.py`中运用`numpy`实现权重初始化如下： 
+
+```{python}
+def get_torch_initialization_numpy():
+    bound1 = np.sqrt(6 / (28 * 28))
+    bound2 = np.sqrt(6 / 256)
+    bound3 = np.sqrt(6 / 64)
+
+    W1 = np.random.uniform(-bound1, bound1, (28 * 28, 256))
+    W2 = np.random.uniform(-bound2, bound2, (256, 64))
+    W3 = np.random.uniform(-bound3, bound3, (64, 10))
+
+    return W1, W2, W3
+``` 
+
+#### 实验  
+调用默认`numpy_run()`函数进行实验，结果如下： 
+![init](./img/init.png) 
+```
+[0] Accuracy: 0.9553
+[1] Accuracy: 0.9595
+[2] Accuracy: 0.9694
+```
+结果与原始初始化的结果相近。
\ No newline at end of file
diff --git a/assignment-2/submission/17307130243/img/.keep b/assignment-2/submission/17307130243/img/.keep
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/assignment-2/submission/17307130243/img/b1024_1.png b/assignment-2/submission/17307130243/img/b1024_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..b0ea36943a9311a5d806e52fa7e859d282c16c62
Binary files /dev/null and b/assignment-2/submission/17307130243/img/b1024_1.png differ
diff --git a/assignment-2/submission/17307130243/img/b128_1.png b/assignment-2/submission/17307130243/img/b128_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..25166b8f1baacacfe69b47b93ed32e447e0f0bcd
Binary files /dev/null and b/assignment-2/submission/17307130243/img/b128_1.png differ
diff --git a/assignment-2/submission/17307130243/img/b16_1.png b/assignment-2/submission/17307130243/img/b16_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..8b8e96367ed6ef28ad73eee03716d8f64e6e9f6f
Binary files /dev/null and b/assignment-2/submission/17307130243/img/b16_1.png differ
diff --git a/assignment-2/submission/17307130243/img/b1_1.png b/assignment-2/submission/17307130243/img/b1_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..2a6c10150b2c74608e89d5ebb7cfb1a05deaa26f
Binary files /dev/null and b/assignment-2/submission/17307130243/img/b1_1.png differ
diff --git a/assignment-2/submission/17307130243/img/b256_1.png b/assignment-2/submission/17307130243/img/b256_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..d830aeb3df9b99563c6259e151ba593c54c06cf6
Binary files /dev/null and b/assignment-2/submission/17307130243/img/b256_1.png differ
diff --git a/assignment-2/submission/17307130243/img/b32_1.png b/assignment-2/submission/17307130243/img/b32_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..bc3d4b8915bc3d4e7186f29f3d241f9a5ba084cc
Binary files /dev/null and b/assignment-2/submission/17307130243/img/b32_1.png differ
diff --git a/assignment-2/submission/17307130243/img/b512_1.png b/assignment-2/submission/17307130243/img/b512_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..74169fcbf6e897d2ee78be6444187edd2d88331f
Binary files /dev/null and b/assignment-2/submission/17307130243/img/b512_1.png differ
diff --git a/assignment-2/submission/17307130243/img/b64_1.png b/assignment-2/submission/17307130243/img/b64_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..7863486a4371529af9f256b0c46e3849c6215a90
Binary files /dev/null and b/assignment-2/submission/17307130243/img/b64_1.png differ
diff --git a/assignment-2/submission/17307130243/img/batch_size.png b/assignment-2/submission/17307130243/img/batch_size.png
new file mode 100644
index 0000000000000000000000000000000000000000..ffc6954e0378aa70440fda924aea5116a9687287
Binary files /dev/null and b/assignment-2/submission/17307130243/img/batch_size.png differ
diff --git a/assignment-2/submission/17307130243/img/init.png b/assignment-2/submission/17307130243/img/init.png
new file mode 100644
index 0000000000000000000000000000000000000000..7d6feca8b26759afc49cc2c638b1d4d2a5c84888
Binary files /dev/null and b/assignment-2/submission/17307130243/img/init.png differ
diff --git a/assignment-2/submission/17307130243/img/learning_rate.png b/assignment-2/submission/17307130243/img/learning_rate.png
new file mode 100644
index 0000000000000000000000000000000000000000..120811f9c7b1e08106c7813ad1505620700496b8
Binary files /dev/null and b/assignment-2/submission/17307130243/img/learning_rate.png differ
diff --git a/assignment-2/submission/17307130243/img/lr_1.png b/assignment-2/submission/17307130243/img/lr_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..4cfd56e69baff52ec48c30659a2606dedf5480ec
Binary files /dev/null and b/assignment-2/submission/17307130243/img/lr_1.png differ
diff --git a/assignment-2/submission/17307130243/img/lr_2.png b/assignment-2/submission/17307130243/img/lr_2.png
new file mode 100644
index 0000000000000000000000000000000000000000..cdd23eec5b965b0ef9d9332f4fa6700538ad11d4
Binary files /dev/null and b/assignment-2/submission/17307130243/img/lr_2.png differ
diff --git a/assignment-2/submission/17307130243/img/lr_3.png b/assignment-2/submission/17307130243/img/lr_3.png
new file mode 100644
index 0000000000000000000000000000000000000000..54fb2768f6e9ae91ab2d8806fe13f2587595f732
Binary files /dev/null and b/assignment-2/submission/17307130243/img/lr_3.png differ
diff --git a/assignment-2/submission/17307130243/img/lr_4.png b/assignment-2/submission/17307130243/img/lr_4.png
new file mode 100644
index 0000000000000000000000000000000000000000..f751ecc7fceb471e8dc52867f3c2da0e44b5a312
Binary files /dev/null and b/assignment-2/submission/17307130243/img/lr_4.png differ
diff --git a/assignment-2/submission/17307130243/img/mini_batch.png b/assignment-2/submission/17307130243/img/mini_batch.png
new file mode 100644
index 0000000000000000000000000000000000000000..ac47c7991574a2e9729cf30bd9a18481e551ed06
Binary files /dev/null and b/assignment-2/submission/17307130243/img/mini_batch.png differ
diff --git a/assignment-2/submission/17307130243/img/net_structure-1.png b/assignment-2/submission/17307130243/img/net_structure-1.png
new file mode 100644
index 0000000000000000000000000000000000000000..3f96cf02e720739b574d103b2a2d467f798acac3
Binary files /dev/null and b/assignment-2/submission/17307130243/img/net_structure-1.png differ
diff --git a/assignment-2/submission/17307130243/img/net_structure.png b/assignment-2/submission/17307130243/img/net_structure.png
new file mode 100644
index 0000000000000000000000000000000000000000..3997bba001bf9d6175183c5f2eed24ed2930e2d6
Binary files /dev/null and b/assignment-2/submission/17307130243/img/net_structure.png differ
diff --git a/assignment-2/submission/17307130243/img/opt_1.png b/assignment-2/submission/17307130243/img/opt_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..5305066801bd4ea98d0c6b171e2e7569f0cbfeb4
Binary files /dev/null and b/assignment-2/submission/17307130243/img/opt_1.png differ
diff --git a/assignment-2/submission/17307130243/img/softmax.png b/assignment-2/submission/17307130243/img/softmax.png
new file mode 100644
index 0000000000000000000000000000000000000000..71fa0c41f4df0c1f969353666a9430c57e7c252d
Binary files /dev/null and b/assignment-2/submission/17307130243/img/softmax.png differ
diff --git a/assignment-2/submission/17307130243/numpy_fnn.py b/assignment-2/submission/17307130243/numpy_fnn.py
new file mode 100644
index 0000000000000000000000000000000000000000..e3829e4d8d24ce459ffbddcb8e705ab8029aa2da
--- /dev/null
+++ b/assignment-2/submission/17307130243/numpy_fnn.py
@@ -0,0 +1,267 @@
+import numpy as np
+
+
+class NumpyOp:
+
+    def __init__(self):
+        self.memory = {}
+        self.epsilon = 1e-12
+
+
+class Matmul(NumpyOp):
+
+    def forward(self, x, W):
+        """
+        x: shape(N, d)
+        w: shape(d, d')
+        """
+        self.memory['x'] = x
+        self.memory['W'] = W
+        h = np.matmul(x, W)
+        return h
+
+    def backward(self, grad_y):
+        """
+        grad_y: shape(N, d')
+        """
+
+        ####################
+        #      code 1      #
+        ####################
+
+        grad_x = np.matmul(grad_y, self.memory['W'].T)
+        grad_W = np.matmul(self.memory['x'].T, grad_y)
+
+        return grad_x, grad_W
+
+
+class Relu(NumpyOp):
+
+    def forward(self, x):
+        self.memory['x'] = x
+        return np.where(x > 0, x, np.zeros_like(x))
+
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+
+        ####################
+        #      code 2      #
+        ####################
+
+        x = self.memory['x']
+        grad_x = np.where(x > 0, grad_y, np.zeros_like(grad_y))
+
+        return grad_x
+
+
+class Log(NumpyOp):
+
+    def forward(self, x):
+        """
+        x: shape(N, c)
+        """
+
+        out = np.log(x + self.epsilon)
+        self.memory['x'] = x
+
+        return out
+
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+
+        ####################
+        #      code 3      #
+        ####################
+
+        x = self.memory['x']
+        grad_x = (1 / (x + self.epsilon)) * grad_y
+
+        return grad_x
+
+
+class Softmax(NumpyOp):
+    """
+    softmax over last dimension
+    """
+
+    def forward(self, x):
+        """
+        x: shape(N, c)
+        """
+
+        ####################
+        #      code 4      #
+        ####################
+
+        self.memory['x'] = x
+
+        exp = np.exp(x - np.max(x, axis=1, keepdims=True))
+        out = exp / np.sum(exp, axis=1, keepdims=True)
+        self.memory['out'] = out
+
+        return out
+
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+
+        ####################
+        #      code 5      #
+        ####################
+
+        y = self.memory['out']
+        temp = np.matmul(grad_y[:, np.newaxis], np.matmul(y[:, :, np.newaxis], y[:, np.newaxis, :])).squeeze(1)
+        grad_x = -temp + grad_y * y
+
+        return grad_x
+
+
+class NumpyLoss:
+
+    def __init__(self):
+        self.target = None
+
+    def get_loss(self, pred, target):
+        self.target = target
+        return (-pred * target).sum(axis=1).mean()
+
+    def backward(self):
+        return -self.target / self.target.shape[0]
+
+
+class NumpyModel:
+    def __init__(self):
+        self.W1 = np.random.normal(size=(28 * 28, 256))
+        self.W2 = np.random.normal(size=(256, 64))
+        self.W3 = np.random.normal(size=(64, 10))
+
+        # 以下算子会在 forward 和 backward 中使用
+        self.matmul_1 = Matmul()
+        self.relu_1 = Relu()
+        self.matmul_2 = Matmul()
+        self.relu_2 = Relu()
+        self.matmul_3 = Matmul()
+        self.softmax = Softmax()
+        self.log = Log()
+
+        # 以下变量需要在 backward 中更新。 softmax_grad, log_grad 等为算子反向传播的梯度（ loss 关于算子输入的偏导）
+        self.x1_grad, self.W1_grad = None, None
+        self.relu_1_grad = None
+        self.x2_grad, self.W2_grad = None, None
+        self.relu_2_grad = None
+        self.x3_grad, self.W3_grad = None, None
+        self.softmax_grad = None
+        self.log_grad = None
+
+        # 以下变量在 momentum 中使用
+        self.v1 = np.zeros_like(self.W1)
+        self.v2 = np.zeros_like(self.W2)
+        self.v3 = np.zeros_like(self.W3)
+
+        # 以下变量在 adam 中使用
+        self.m1 = np.zeros_like(self.W1)
+        self.m2 = np.zeros_like(self.W2)
+        self.m3 = np.zeros_like(self.W3)
+
+        # 迭代次数
+        self.t = 0
+
+    def forward(self, x):
+        x = x.reshape(-1, 28 * 28)
+
+        ####################
+        #      code 6      #
+        ####################
+
+        x = self.matmul_1.forward(x, self.W1)
+        x = self.relu_1.forward(x)
+
+        x = self.matmul_2.forward(x, self.W2)
+        x = self.relu_2.forward(x)
+
+        x = self.matmul_3.forward(x, self.W3)
+        x = self.softmax.forward(x)
+
+        x = self.log.forward(x)
+
+        return x
+
+    def backward(self, y):
+        ####################
+        #      code 7      #
+        ####################
+
+        self.log_grad = self.log.backward(y)
+
+        self.softmax_grad = self.softmax.backward(self.log_grad)
+        self.x3_grad, self.W3_grad = self.matmul_3.backward(self.softmax_grad)
+
+        self.relu_2_grad = self.relu_2.backward(self.x3_grad)
+        self.x2_grad, self.W2_grad = self.matmul_2.backward(self.relu_2_grad)
+
+        self.relu_1_grad = self.relu_1.backward(self.x2_grad)
+        self.x1_grad, self.W1_grad = self.matmul_1.backward(self.relu_1_grad)
+
+    def optimize(self, learning_rate):
+        self.W1 -= learning_rate * self.W1_grad
+        self.W2 -= learning_rate * self.W2_grad
+        self.W3 -= learning_rate * self.W3_grad
+
+    def momentum(self, learning_rate, gamma=0.95):
+        self.v1 = gamma * self.v1 + (1 - gamma) * self.W1_grad
+        self.v2 = gamma * self.v2 + (1 - gamma) * self.W2_grad
+        self.v3 = gamma * self.v3 + (1 - gamma) * self.W3_grad
+
+        self.W1 -= learning_rate * self.v1
+        self.W2 -= learning_rate * self.v2
+        self.W3 -= learning_rate * self.v3
+
+    def AdaGrad(self, learning_rate):
+        eps = 1e-7
+
+        self.v1 += self.W1_grad ** 2
+        self.v2 += self.W2_grad ** 2
+        self.v3 += self.W3_grad ** 2
+
+        self.W1 -= learning_rate * self.W1_grad / (self.v1 ** 0.5 + eps)
+        self.W2 -= learning_rate * self.W2_grad / (self.v2 ** 0.5 + eps)
+        self.W3 -= learning_rate * self.W3_grad / (self.v3 ** 0.5 + eps)
+
+    def RMSProp(self, learning_rate, decay_rate=0.999):
+        eps = 1e-7
+        self.v1 = decay_rate * self.v1 + (1 - decay_rate) * np.square(self.W1_grad)
+        self.v2 = decay_rate * self.v2 + (1 - decay_rate) * np.square(self.W2_grad)
+        self.v3 = decay_rate * self.v3 + (1 - decay_rate) * np.square(self.W3_grad)
+
+        self.W1 -= learning_rate * self.W1_grad / (np.sqrt(self.v1) + eps)
+        self.W2 -= learning_rate * self.W2_grad / (np.sqrt(self.v2) + eps)
+        self.W3 -= learning_rate * self.W3_grad / (np.sqrt(self.v3) + eps)
+
+    def Adam(self, learning_rate, beta1=0.9, beta2=0.999):
+        self.t += 1
+        eps = 1e-8
+        self.m1 = beta1 * self.m1 + (1 - beta1) * self.W1_grad
+        self.m2 = beta1 * self.m2 + (1 - beta1) * self.W2_grad
+        self.m3 = beta1 * self.m3 + (1 - beta1) * self.W3_grad
+
+        self.v1 = beta2 * self.v1 + (1 - beta2) * self.W1_grad ** 2
+        self.v2 = beta2 * self.v2 + (1 - beta2) * self.W2_grad ** 2
+        self.v3 = beta2 * self.v3 + (1 - beta2) * self.W3_grad ** 2
+
+        # 修正
+        m1 = self.m1 / (1 - beta1 ** self.t)
+        m2 = self.m2 / (1 - beta1 ** self.t)
+        m3 = self.m3 / (1 - beta1 ** self.t)
+
+        v1 = self.v1 / (1 - beta2 ** self.t)
+        v2 = self.v2 / (1 - beta2 ** self.t)
+        v3 = self.v3 / (1 - beta2 ** self.t)
+
+        self.W1 -= learning_rate * m1 / (np.sqrt(v1) + eps)
+        self.W2 -= learning_rate * m2 / (np.sqrt(v2) + eps)
+        self.W3 -= learning_rate * m3 / (np.sqrt(v3) + eps)
diff --git a/assignment-2/submission/17307130243/numpy_mnist.py b/assignment-2/submission/17307130243/numpy_mnist.py
new file mode 100644
index 0000000000000000000000000000000000000000..334eb7fca8d126c4d0d4a468004b57f943f9b163
--- /dev/null
+++ b/assignment-2/submission/17307130243/numpy_mnist.py
@@ -0,0 +1,251 @@
+import matplotlib.pyplot as plt
+import numpy as np
+from numpy_fnn import NumpyModel, NumpyLoss
+from utils import download_mnist, batch, get_torch_initialization, plot_curve, one_hot
+import time
+
+np.random.seed(16)
+
+colors = ['lightskyblue', 'sandybrown', 'mediumpurple', 'olivedrab',
+          'gold', 'hotpink']
+
+
+def normalization(dataset):
+    """
+    对数据进行标准化
+    :param dataset: shape (N, d)
+    :return: (dataset - mean) / std
+    """
+    # 避免溢出
+    eps = 1e-8
+    temp = dataset - dataset.mean(axis=0)
+    return temp / (dataset.var(axis=0) + eps) ** 0.5
+
+
+def mini_batch(dataset, batch_size=128, shuffle=True, normal=False):
+    data = np.array([datum[0].numpy() for datum in dataset])
+    label = np.array([datum[1] for datum in dataset])
+
+    num = data.shape[0]
+    idx = np.arange(num)
+
+    # shuffle
+    if shuffle:
+        np.random.shuffle(idx)
+
+    batches = []
+    for i in range(0, num, batch_size):
+        batch_data = data[idx[i: i + batch_size]]
+        batch_label = label[idx[i: i + batch_size]]
+
+        # batch normalization
+        if normal:
+            batch_data = normalization(batch_data)
+
+        batches.append((batch_data, batch_label))
+
+    # return [(data[idx[i: i + batch_size]], label[idx[i: i + batch_size]]) for i in range(0, num, batch_size)]
+    return batches
+
+
+def get_torch_initialization_numpy():
+    bound1 = np.sqrt(6 / (28 * 28))
+    bound2 = np.sqrt(6 / 256)
+    bound3 = np.sqrt(6 / 64)
+
+    W1 = np.random.uniform(-bound1, bound1, (28 * 28, 256))
+    W2 = np.random.uniform(-bound2, bound2, (256, 64))
+    W3 = np.random.uniform(-bound3, bound3, (64, 10))
+
+    return W1, W2, W3
+
+
+def numpy_run(learning_rate=0.1, epoch_number=3, batch_size=128, optimizer='SGD', max_iter=None):
+    train_dataset, test_dataset = download_mnist()
+
+    model = NumpyModel()
+    numpy_loss = NumpyLoss()
+    model.W1, model.W2, model.W3 = get_torch_initialization_numpy()
+
+    train_loss = []
+
+    epoch_number = epoch_number
+    learning_rate = learning_rate
+
+    begin = time.time()  # 开始时间
+
+    acc = []
+
+    iter = 0
+    flag = False
+
+    for epoch in range(epoch_number):
+        for x, y in mini_batch(train_dataset, batch_size=batch_size):
+            y = one_hot(y)
+
+            y_pred = model.forward(x)
+            loss = numpy_loss.get_loss(y_pred, y)
+
+            model.backward(numpy_loss.backward())
+            # model.optimize(learning_rate)
+            # model.momentum(learning_rate)
+            # model.AdaGrad(learning_rate)
+            # model.RMSProp(learning_rate)
+            # model.Adam(learning_rate)
+            if optimizer == 'SGD':
+                model.optimize(learning_rate)
+            elif optimizer == 'momentum':
+                model.momentum(learning_rate)
+            elif optimizer == 'AdaGrad':
+                model.momentum(learning_rate)
+            elif optimizer == 'RMSProp':
+                model.RMSProp(learning_rate)
+            elif optimizer == 'Adam':
+                model.Adam(learning_rate)
+
+            train_loss.append(loss.item())
+            iter += 1
+            if max_iter and iter > max_iter:
+                flag = True
+                break
+
+        x, y = batch(test_dataset)[0]
+        accuracy = np.mean((model.forward(x).argmax(axis=1) == y))
+        acc.append(accuracy)
+        print('[{}] Accuracy: {:.4f}'.format(epoch, accuracy))
+        if flag:
+            break
+
+    # 结束时间
+    end = time.time()
+    print('learning rate:', learning_rate)
+    print("time:{:.4f}".format(end - begin))
+
+    plot_curve(train_loss)
+
+    #   idx = np.arange(0, len(train_loss), 50)
+    # 　plot_curve(train_loss[idx])
+    simple_train_loss = []
+    for i in range(len(train_loss)):
+        if i % 50 == 0:
+            simple_train_loss.append(train_loss[i])
+    plot_curve(simple_train_loss)
+    return acc, simple_train_loss
+
+
+""" 学习率的模型效果的影响 """
+alpha = [0.0001, 0.001, 0.01, 0.1, 1, 2, 5]
+alpha2 = [0.0001, 0.001, 0.01, 0.1]
+
+
+def learning_rate_expr():
+    x = np.arange(1, 16)
+    plt.grid()
+    accs = []
+    for i in range(len(alpha)):
+        acc, _ = numpy_run(learning_rate=alpha[i], epoch_number=15)
+
+        # plt.plot(x, acc, label='alpha={}'.format(alpha[i]))
+        accs.append(acc)
+
+    # figure
+    for i in range(len(alpha)):
+        plt.plot(x, accs[i], label='alpha={}'.format(alpha[i]))
+
+    plt.legend()
+    plt.xlabel('epoch')
+    plt.ylabel('accuracy')
+
+    plt.savefig('./img/learning_rate.png', dpi=300, bbox_inches='tight')
+
+
+def learning_rate_expr2():
+    x = np.arange(1, 51)
+    plt.grid()
+    accs = []
+    for i in range(len(alpha)):
+        acc, _ = numpy_run(learning_rate=alpha[i], epoch_number=50)
+
+        # plt.plot(x, acc, label='alpha={}'.format(alpha[i]))
+        accs.append(acc)
+
+    # figure
+    for i in range(len(alpha2)):
+        plt.plot(x, accs[i], label='alpha={}'.format(alpha[i]))
+
+    plt.legend()
+    plt.xlabel('epoch')
+    plt.ylabel('accuracy')
+
+    plt.savefig('./img/learning_rate2.png', dpi=300, bbox_inches='tight')
+    plt.show()
+
+
+"""batch size 对模型效果的影响"""
+
+batch_sizes = [1, 16, 32, 64, 128, 256, 512, 1024]
+
+
+def batch_size_expr():
+    # x = np.arange(1, 16)
+    plt.grid()
+    accs = []
+    for size in batch_sizes:
+        acc, _ = numpy_run(learning_rate=0.01, epoch_number=5, batch_size=size, optimizer='SGD', max_iter=5000)
+
+        # figure
+        # plt.plot(x, acc, label='size={}'.format(size))
+        accs.append(acc)
+
+    for i in range(len(batch_sizes)):
+        plt.plot(accs[i], label='size={}'.format(batch_sizes[i]))
+
+    plt.legend()
+    plt.xlabel('epoch')
+    plt.ylabel('accuracy')
+    plt.savefig('./img/batch_size.png', dpi=300, bbox_inches='tight')
+
+
+""" optimizer对模型效果的影响 """
+opts = ['SGD', 'momentum', 'AdaGrad', 'RMSProp', 'Adam']
+
+
+def opt_expr():
+    x = np.arange(1, 6)
+    accs = []
+    losses = []
+    for opt in opts:
+        acc, loss = numpy_run(learning_rate=0.002, epoch_number=5, batch_size=128, optimizer=opt)
+        accs.append(acc)
+        losses.append(loss)
+
+    # figure accuracy
+    for i in range(5):
+        plt.plot(accs[i], label=opts[i])
+
+    plt.grid()
+    plt.legend()
+    plt.xlabel("step/50")
+    plt.ylabel("Accuracy")
+    plt.savefig("./img/opt_1.png", dpi=300, bbox_inches='tight')
+    plt.show()
+
+    # figure loss
+    for i in range(5):
+        plt.plot(losses[i], label=opts[i])
+
+    plt.grid()
+    plt.legend()
+    plt.xlabel("epoch")
+    plt.ylabel("Loss Value")
+    plt.savefig("./img/opt_2.png", dpi=300, bbox_inches='tight')
+    plt.show()
+
+
+if __name__ == "__main__":
+    # numpy_run()
+    # learning_rate_expr()
+    # learning_rate_expr2()
+    # batch_size_expr()
+    # opt_expr()
+    numpy_run()