diff --git a/assignment-1/submission/18340986009/README.md b/assignment-1/submission/18340986009/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..fa66f00c0a8bb1084d3920b47714534b3a660dfe
--- /dev/null
+++ b/assignment-1/submission/18340986009/README.md
@@ -0,0 +1,159 @@
+# KNN Classification
+
+This report includes two parts:
+1. Find a KNN model that maximize accuracy rate with given dataset. (Distribution type of each class = Gaussian, distribution parameters chosen at random)
+2. Assess how distribution parameters affects model accuracy using the model built in part 1.
+
+
+## 1. Model Generation
+
+### 1.1 Overview of Mock Data
+
+Generate 3 classes of 2-dimension Gaussian Distribution.
+
+$
+  N_0 = 150 \hspace{1cm}
+  C_0 \sim \mathcal{N}(\mu = \begin{bmatrix}50\\\\50\end{bmatrix},\sigma^{2} = \begin{bmatrix}60 & -50\\\\-50 & 140\end{bmatrix})
+$
+
+$
+  N_1 = 250 \hspace{1cm}
+  C_1 \sim \mathcal{N}(\mu = \begin{bmatrix}60\\\\20\end{bmatrix},\sigma^{2} = \begin{bmatrix}130 & 10\\\\10 & 100\end{bmatrix})
+$
+
+$
+  N_2 = 100 \hspace{1cm}
+  C_2 \sim \mathcal{N}(\mu = \begin{bmatrix}20\\\\60\end{bmatrix},\sigma^{2} = \begin{bmatrix}120 & 20\\\\20 & 90\end{bmatrix})
+$
+
+Mock Data 1 Overview:
+
+<img src="img/Figure 1.png" width=450 height=300/>
+
+500 points are then split randomly into training set (80%) and testing set (20%).
+
+### 1.2 Model Accuracy with Different K and Distance Method
+
+Since a rule of thumb is to let $K = \sqrt{N}$, where $ N = N_0 + N_1 + N_2$, we first try some Ks around $\sqrt{400} = 20$ using both Euclidean and Manhattan distance.
+
+|       \      | K = 10 | K = 15 | K = 20 | K = 25 | K = 30 |
+| ------------ |:------:|:------:|:------:|:------:|:------:|
+| **Euclidean**    |83.0|82.0|83.0|81.0|80.0|
+| **Manhattan**    |83.0|82.0|81.0|81.0|81.0|
+
+The KNN model with $K = 10$ gives the best prediction result of 83% for both distance methods, so we consider choosing $K_{0} = 10$ as a starting point for model optimization. Below is a scatter plot showing the prediction result of the chosen model ($K = 10$, Euclidean Distance). Each red dot represents a mis-classification.
+
+*Noticed model accuracy using different distance method doesn't show much difference for this dataset. 
+
+<img src="img/Figure 2.png" width=450 height=300/>
+
+### 1.3 Model Optimization
+
+General Idea: $K_{i+1} = \lceil{K_{i} + Step_{i+1}}\rceil$
+
+Detailed steps:
+
+ - For each $K_{i+1}$, calculate its accuracy rate $R_{i+1}$.
+ - If $R_{i+1} > R_{0}$, a better model is find. End our optimization. Else:
+     - If $R_{i+1} > R_{i}$, let $Step_{i+1} =  \frac{1}{C} Step_{i} $, where $C = (R_{i+1} - R_{i}) / R_{i}$. 
+     Which is, if model accuracy improves, continue in this direction with a smaller step. The step size is negatively related to the percentage of improvement.
+     - If $R_{i+1} <= R_{i}$, let $Step_{i+1} = - \frac{1}{2} Step_{i}$. 
+   Which is, if the new K does not improve model accuracy, try a smaller step in reverse direction.
+
+The model from 1.2 gives K = 10 and Euclidean distance. Using this model as the starting point, define the first step $Step_{0} = \frac{1}{100}N = 5$.
+
+Optimization process:
+    
+|       \      | K = 10 | K = 5 | K = 8 |
+| ------------ |:------:|:------:|:------:|
+| **Accuracy rate (%)**    |83.0|83.0|85.0|
+
+ After three iterations, a higher accuracy rate of 85% is reached when K is adjusted to 8. Thus, our final KNN model will use K = 8 and Euclidean distance.
+
+Prediction result evaluation:
+
+<img src="img/Figure 3.png" width=450 height=300/>
+
+Compared with the model before optimization, two points on the top is now classified correctly.
+
+## 2. Distribution Parameters & Model Accuracy
+
+From inuition, we hypothesis that any change that results in a more balanced mixture of all classes will make classification harder, thereby decrease model accuracy. Below, we modify the parameters of Gaussian distributions to test our hypothesis.
+
+### 2.1 Change of Variance and Covariance
+
+Let the means stay the same. Modify the variance-covariance matrix for each class to increase overlapping between each class:
+
+$
+  N_0 = 150 \hspace{1cm}
+  C_0 \sim \mathcal{N}(\mu = \begin{bmatrix}50\\\\50\end{bmatrix},\sigma^{2} = \begin{bmatrix}300 & 0\\\\0 & 200\end{bmatrix}) 
+$
+
+$
+  N_1 = 250 \hspace{1cm}
+  C_1 \sim \mathcal{N}(\mu = \begin{bmatrix}60\\\\20\end{bmatrix},\sigma^{2} = \begin{bmatrix}250 & 0\\\\0 & 150\end{bmatrix})
+$
+
+$
+  N_2 = 100 \hspace{1cm}
+  C_2 \sim \mathcal{N}(\mu = \begin{bmatrix}20\\\\60\end{bmatrix},\sigma^{2} = \begin{bmatrix}150 & 0\\\\0 & 150\end{bmatrix})
+$
+
+Mock Data 2 Overview:
+
+<img src="img/Figure 4.png" width=450 height=300/>
+
+Prediction result evaluation:
+
+<img src="img/Figure 5.png" width=450 height=300/>
+
+Accuracy of our model drop from 85% to 79% as expected.
+
+### 2.2 Change of Mean
+
+Let other parameters stay the same, decrease the distance between the means of each class to increase overlapping:
+
+$
+  N_0 = 150 \hspace{1cm}
+  C_0 \sim \mathcal{N}(\mu = \begin{bmatrix}50\\\\50\end{bmatrix},\sigma^{2} = \begin{bmatrix}60 & -50\\\\-50 & 140\end{bmatrix})
+$
+
+$
+  N_1 = 250 \hspace{1cm}
+  C_1 \sim \mathcal{N}(\mu = \begin{bmatrix}50\\\\40\end{bmatrix},\sigma^{2} = \begin{bmatrix}130 & 10\\\\10 & 100\end{bmatrix}) 
+$
+
+$
+  N_2 = 100 \hspace{1cm}
+  C_2 \sim \mathcal{N}(\mu = \begin{bmatrix}40\\\\60\end{bmatrix},\sigma^{2} = \begin{bmatrix}120 & 20\\\\20 & 90\end{bmatrix})
+$
+
+Mock Data 3 Overview:
+
+<img src="img/Figure 6.png" width=450 height=300/>
+
+Prediction result evaluation:
+
+<img src="img/Figure 7.png" width=450 height=300/>
+
+Accuracy of our model drop from 85% to 73% as expected.
+
+### 2.3 N & Model Accuracy
+
+In attempts to increase model accuracy, we try double the Ns in proportion to Data 3. With $N_{total} = 1000$, we expect some increase on model accuracy.
+
+Mock Data 4 Overview:
+
+<img src="img/Figure 8.png" width=450 height=300/>
+
+Prediction result evaluation:
+
+<img src="img/Figure 9.png" width=450 height=300/>
+
+Model accuracy decreases from 73% to 62.5% even though our data size doubled. This suggests sample size contributes much less to model accuracy compared with distribution parameters. This makes sense because if the data labeled by different categories does indeed come from the same distribution, increasing N should provide more evidence of the similarity between these different categories.
+
+## Summary
+
+The main takeaways for this exercise:
+
+Model accuracy depends more on distribution parameters and the choice of K. Distance method have little influence on model accuracy, and whether an increase of N improves model accuracy or not depends on if the true distributions of all categories are significantly different (Might be able to use p-value from a statistical test to evaluate).
diff --git a/assignment-1/submission/18340986009/img/Figure 1.png b/assignment-1/submission/18340986009/img/Figure 1.png
new file mode 100644
index 0000000000000000000000000000000000000000..32d5ded9c9d662bf7eacaede5e9316ba1d545335
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 1.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 2.png b/assignment-1/submission/18340986009/img/Figure 2.png
new file mode 100644
index 0000000000000000000000000000000000000000..c7e7752721f808ea5ca19a56a7e642badb1617fd
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 2.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 3.png b/assignment-1/submission/18340986009/img/Figure 3.png
new file mode 100644
index 0000000000000000000000000000000000000000..5a3fd62c0681f995d32c1ea794258095239261ee
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 3.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 4.png b/assignment-1/submission/18340986009/img/Figure 4.png
new file mode 100644
index 0000000000000000000000000000000000000000..9c1e05f712b290be595b12c812476c72e0f0002d
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 4.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 5.png b/assignment-1/submission/18340986009/img/Figure 5.png
new file mode 100644
index 0000000000000000000000000000000000000000..e49ec9595ac9c813a2e6044375c534bb669b3a7c
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 5.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 6.png b/assignment-1/submission/18340986009/img/Figure 6.png
new file mode 100644
index 0000000000000000000000000000000000000000..11a84369882f65a2a3e46237e51fe479d4f14b88
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 6.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 7.png b/assignment-1/submission/18340986009/img/Figure 7.png
new file mode 100644
index 0000000000000000000000000000000000000000..ee33c60766eb907d5b8992c24ca3806c297d9fc8
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 7.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 8.png b/assignment-1/submission/18340986009/img/Figure 8.png
new file mode 100644
index 0000000000000000000000000000000000000000..a3f42ac859f2ef35448cb16f0412df387ba8e7a8
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 8.png differ
diff --git a/assignment-1/submission/18340986009/img/Figure 9.png b/assignment-1/submission/18340986009/img/Figure 9.png
new file mode 100644
index 0000000000000000000000000000000000000000..0de5d1f658bdd5860681cfee20432e8074f39a1d
Binary files /dev/null and b/assignment-1/submission/18340986009/img/Figure 9.png differ
diff --git a/assignment-1/submission/18340986009/source.py b/assignment-1/submission/18340986009/source.py
new file mode 100644
index 0000000000000000000000000000000000000000..410b588394c97d15227671e94f6e24e6cbb46882
--- /dev/null
+++ b/assignment-1/submission/18340986009/source.py
@@ -0,0 +1,249 @@
+#!/usr/bin/env python
+# coding: utf-8
+
+# In[1]:
+
+
+import sys
+import numpy as np
+import matplotlib.pyplot as plt
+
+
+# ## Define Global Functions
+
+# In[139]:
+
+
+# Generate Training and Testing Sets
+def generate(Ns, Means, Covs, train_frac):
+
+    # Generate 2-D data of N class
+    data = list()
+    label = list()
+    
+    for i in range(0,len(Ns)):
+        Ci = np.random.multivariate_normal(Means[i], Covs[i], Ns[i])
+        data.append(Ci)
+        label.append([i]*Ns[i])
+
+    data = np.array([v for subl in data for v in subl])
+    label = np.array([v for subl in label for v in subl])
+    
+    #Assign random number
+    idx = np.arange(sum(Ns))
+    np.random.shuffle(idx)
+    
+    data = data[idx]
+    label = label[idx]
+
+    # Split into training and testing set
+    split_point = int(label.size * train_frac)
+    train_data, test_data = data[:split_point,], data[split_point:,]
+    train_label, test_label = label[:split_point,], label[split_point:,]
+    
+    np.save("data.npy",((train_data, train_label), 
+                        (test_data, test_label)))
+
+    return train_data, train_label, test_data, test_label
+
+
+# Read in saved data
+def read():
+    (train_data, train_label), (test_data, test_label) = np.load(
+        "data.npy", allow_pickle = True)
+    return (train_data, train_label), (test_data, test_label)
+
+
+# Create scatter plot of different categories
+def display(data, colorby, name, title):
+    colors = ['red','grey','blue']
+    datas =[[],[],[]]
+    
+    for i in range(len(data)):
+        datas[colorby[i]].append(data[i])
+        
+    for i in range(len(datas)):
+        each = np.array(datas[i])
+        if len(each) == 0:
+            continue
+        plt.scatter(each[:, 0], each[:, 1], 
+                    marker = 'o',
+                    color = colors[i],
+                    alpha = 0.7)
+        
+    plt.xlabel("X1")
+    plt.ylabel("X2")
+    plt.title(title)
+    plt.savefig(f'img/{name}')
+    plt.show()
+
+
+# ## Define Class KNN
+
+# In[140]:
+
+
+class KNN:
+
+    def __init__(self):
+        
+        self.K = None
+        self.Dist = None
+        self.data = None
+        self.label = None
+    
+    
+    # Calculate distance between two given points
+    def get_distance(self, x, y, dist_type = "Euclidean"):
+        dist = 0.0
+        if "Euclidean" == dist_type:
+            distance = 0.0
+            for i in range(len(x)):
+                distance += (x[i] - y[i])**2
+            dist = np.sqrt(distance)
+            
+        if "Manhattan" == dist_type:
+            distance = 0.0
+            for i in range(len(x)):
+                distance += np.abs(x[i] - y[i])
+            dist = distance
+            
+        return dist
+    
+    
+    # Make a prediction for one point
+    def predict_for_one(self, K, Dist, target, train_data, train_label):
+        # Calculate distances between target point and other points
+        dists = []
+        neighbors = []
+
+        for i in range(len(train_data)):
+            dist = self.get_distance(target, train_data[i], Dist)
+            dists.append((train_data[i], train_label[i], dist))
+
+        # Get the K nearest neighbors
+        dists.sort(key = lambda e: e[-1])
+        neighbors = dists[1:K+1]
+
+        # Make prediction based on conditional probabilities
+        neighbors_class = [e[-2] for e in neighbors]
+        prediction = max(neighbors_class, key = neighbors_class.count)
+
+        return prediction
+
+    
+    # Calculate model accuracy
+    def calc_accuracy(self, K, Dist, train_data, train_label):
+        predictions = []
+        # Make predictions for the training data
+        for i in range(len(train_label)):
+            target = train_data[i]
+            prediction = self.predict_for_one(
+                K, Dist, target, train_data, train_label
+            )
+            predictions.append(prediction)
+        
+        correct = 0
+        for i in range(len(predictions)):
+            if train_label[i] == predictions[i]:
+                correct += 1
+        accuracy = correct / len(predictions) * 100
+
+        return accuracy
+        
+    
+    # Find the Optimal K & Distance combination 
+    def fit(self, K_list, Dist_list, train_data, train_label):
+    
+        # Loop through the given options for K and distance methods
+        accuracy_list = []
+        for i in range(len(Dist_list)):
+            Dist = Dist_list[i]
+            dum_list = []
+            for j in range(len(K_list)):
+                K = K_list[j]
+                accuracy = self.calc_accuracy(
+                K, Dist, train_data, train_label
+                )
+                dum_list.append(accuracy)
+            accuracy_list.append(dum_list)
+        
+        # Find the K & Distance method that gives the highest accuracy
+        ac_array = np.array(accuracy_list)
+        global_max = max([max(subl) for subl in accuracy_list])
+        params = np.where(ac_array == global_max)
+        
+        # Assign the optimal parameters to KNN object
+        # Randomly choice one if there exist more than one highest accuracy
+        Dist_idx = np.random.choice(np.array(params[0]))
+        K_idx = np.random.choice(np.array(params[1]))
+        
+        self.Dist = Dist_list[Dist_idx]
+        self.K = K_list[K_idx]
+        self.data = train_data
+        self.label = train_label
+        
+        return ac_array
+        
+    
+    def predict(self, test_data):
+        # If test data has been inputed & Model has been obtained 
+        predictions = []
+            # For every point(target) in test data
+        for i in range(len(test_data)):
+            target = test_data[i]
+            prediction = self.predict_for_one(
+                self.K, self.Dist, 
+                target, 
+                self.data, 
+                self.label)
+            predictions.append(prediction)
+            
+        return np.array(predictions)
+
+
+# ## Start of Program
+
+# In[143]:
+
+
+if __name__ == '__main__':
+
+    if len(sys.argv) > 1 and sys.argv[1] == "g":
+        generate(
+            Ns = [100, 250, 150],
+
+            Means = [[50,50], 
+                     [60,20], 
+                     [20,60]], 
+
+            Covs = [[[60,-50],[-50,140]], 
+                    [[130,10],[10,100]], 
+                    [[120,20],[20,90]]],
+
+            train_frac = 0.8
+        )
+        
+    elif len(sys.argv) > 1 and sys.argv[1] == "d":
+        (train_data, train_label), (test_data, test_label) = read()
+        
+        display(train_data, train_label, 
+                'train', 'Scatter Plot of Training Data')
+        display(test_data, test_label, 
+                'test', 'Scatter Plot of Testing Data')
+    else:
+        (train_data, train_label), (test_data, test_label) = read()
+
+        model = KNN()
+        
+        model.fit(
+            K_list = [15, 20, 25], 
+            Dist_list = ["Euclidean", "Manhattan"],
+            train_data = train_data, 
+            train_label = train_label)
+
+        res = model.predict(test_data)
+
+        print("acc =",np.mean(np.equal(res, test_label)))
+        
+
diff --git a/assignment-2/.keep b/assignment-2/.keep
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/assignment-2/submission/18340986009/README.md b/assignment-2/submission/18340986009/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..2c6df32c234dc0eb883c99c677c0c8d2158a1156
--- /dev/null
+++ b/assignment-2/submission/18340986009/README.md
@@ -0,0 +1,260 @@
+# FNN
+
+## 1. Activation Function
+
+<img src="img/md0.png"/>
+
+### 1.1 Matmul
+
+<img src="img/md1.png"/>
+
+<img src="img/md2.png"/>
+
+```python
+    def backward(self, grad_y):
+        """
+        grad_y: shape(N, d')
+        """
+        x = self.memory['x']
+        W = self.memory['W']
+
+        grad_x = np.matmul(grad_y, W.T)
+        grad_W = np.matmul(x.T, grad_y)
+        return grad_x, grad_W
+```
+
+### 1.2 Relu:
+
+<img src="img/md3.png"/>
+
+```python
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+        x = self.memory['x']
+        grad_x = np.where(x <= 0, 0, grad_y)
+
+        return grad_x
+```
+
+### 1.3 Log
+
+<img src="img/md4.png"/>
+
+```python
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+        x = self.memory['x']
+        grad_x = (x + self.epsilon)**(-1) * grad_y
+
+        return grad_x
+```
+
+### 1.4 Softmax:
+
+<img src="img/md5.png"/>
+
+```python
+    def forward(self, x):
+        """
+        x: shape(N, c)
+        """
+        exp = np.exp(x - self.epsilon)
+        out = exp/exp.sum(axis=1, keepdims=True)
+        self.memory['x'] = x
+        self.memory['s'] = out
+
+        return out
+
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+        s = self.memory['s']
+        n = s.shape[1]
+
+        grad_x = np.zeros(s.shape)
+        # calculate Jacobian for ith row
+        for i in range(s.shape[0]):
+            jacob = np.zeros((n, n))
+
+            for j in range(n):
+                for k in range(n):
+                    if j == k:
+                        jacob[k, j] = s[i, k] * (1-s[i, j])
+                    else:
+                        jacob[k, j] = -1 * s[i, k] * s[i, j]
+            # apply Jacobian to the ith row of grad_y
+            grad_x[i, :] = np.matmul(jacob, grad_y[i, :])
+
+        return grad_x
+```
+
+## 2. Model Training
+
+### 2.1 Change mini_batch:
+
+From documentation, we can see that mini_batch is a function that shuffles data and create small batches with user-specified batch size. The return is a batch such that the code: **for x, y in mini_batch(train_dataset):** in function **numpy_run()** can retrive appropriate training data and training label in the batch. The suffling method implemented by the code below uses ordered sampling without replacement. If the remaining data is insufficient to form a batch of the specified size, we'll drop the remaining ones.
+
+```python
+def mini_batch(dataset, batch_size=128):
+    
+    # get total number of observations to batch from
+    size = dataset.train_data.shape[0]
+
+    # fetch data and label
+    data = []
+    label = []
+    for one_obs in dataset:
+        data.append(np.array(one_obs[0]))
+        label.append(np.array(one_obs[1]))
+    data = np.array(data)
+    label = np.array(label)
+
+    # random shuffle
+    idx = np.arange(size)
+    np.random.shuffle(idx)
+    data = data[idx]
+    label = label[idx]
+
+    # split to batch
+    # discard the last batch if remaining size < batch_size
+    data_batches = []
+    label_batches = []
+    batched = 0
+    while batched+batch_size <= size:
+        data_batches.append(data[batched:batched + batch_size])
+        label_batches.append(label[batched:batched + batch_size])
+        batched += batch_size
+
+    data_batches = np.array(data_batches)
+    label_batches = np.array(label_batches)
+
+    return zip(data_batches, label_batches)
+```
+
+### 2.2 Training Result:
+
+ - Hyper-parameters:
+     * epoch number = 3
+     * learning rate = 0.1
+
+
+ - Model structure: 
+     * 3 layers
+     * dimension reduced in each layer using a weight matrix
+     * softmax as the final activation function
+     * loss calculated as $-y^{T}\log{\hat{y}}$
+
+```python
+def forward(self, x):
+    x = x.reshape(-1, 28 * 28)
+    a0 = x
+    
+    #First Layer
+    z1 = self.matmul_1.forward(a0, self.W1)
+    a1 = self.relu_1.forward(z1)
+    
+    #Second Layer
+    z2 = self.matmul_2.forward(a1, self.W2)
+    a2 = self.relu_2.forward(z2)
+    
+    #Third Layer
+    z3 = self.matmul_3.forward(a2, self.W3)
+    a3 = self.softmax.forward(z3)
+    x = self.log.forward(a3)
+
+    return x
+```
+
+```python
+def backward(self, y):
+    
+    #Third Layer
+    self.log_grad = self.log.backward(y)
+    self.softmax_grad = self.softmax.backward(self.log_grad)
+    self.x3_grad, self.W3_grad = self.matmul_3.backward(self.softmax_grad)
+    
+    #Second Layer
+    self.relu_2_grad = self.relu_2.backward(self.x3_grad)
+    self.x2_grad, self.W2_grad = self.matmul_2.backward(self.relu_2_grad)
+    
+    #First Layer
+    self.relu_1_grad = self.relu_1.backward(self.x2_grad)
+    self.x1_grad, self.W1_grad = self.matmul_1.backward(self.relu_1_grad)
+
+    pass
+```
+
+Result:
+```python
+[0] Accuracy: 0.9482
+[1] Accuracy: 0.9643
+[2] Accuracy: 0.9735
+```
+
+<img src="img/figure 1.png" width=450 height=300/>
+
+## 3. Change of Parameters
+
+**Learning Rate [0.2, 0.4, 0.6]**
+
+<img src="img/lr1.png" width=450 height=300/>
+<img src="img/lr2.png" width=450 height=300/>
+<img src="img/lr3.png" width=450 height=300/>
+
+**Batch Size [20, 40, 60]**
+
+<img src="img/bs1.png" width=450 height=300/>
+<img src="img/bs2.png" width=450 height=300/>
+<img src="img/bs3.png" width=450 height=300/>
+
+## 4. Optimization
+
+**Momentum:**
+
+<img src="img/md6.png"/>
+
+```python
+#Extends NumpyModel
+#Save step size
+self.delta1 = 0
+self.delta2 = 0
+self.delta3 = 0
+    
+#Extends the optimize function defined in numpy_fnn.py
+if method == "Momentum":
+    self.delta1 = momentum * self.delta1 - learning_rate * self.W1_grad
+    self.W1 += self.delta1
+
+    self.delta2 = momentum * self.delta2 - learning_rate * self.W2_grad
+    self.W2 += self.delta2
+
+    self.delta3 = momentum * self.delta3 - learning_rate * self.W3_grad
+    self.W3 += self.delta3
+```
+
+Result:
+```python
+[0] Accuracy: 0.9654
+[1] Accuracy: 0.9712
+[2] Accuracy: 0.9726
+```
+
+<img src="img/figure 2.png" width=450 height=300/>
+
+Discussion:
+
+Momentum method differs from gradient descent in that it adds a short term memory for step size. Momentum, a number between 0 and 1, is the weight given to the previous step size. Momentum smoothes the path (visual demonstration below, [ref](https://dominikschmidt.xyz/nesterov-momentum/)). A greater momentum results in a more direct path with less changes in direction. Or, from another perspective, accelarates towards the optimum. If momentum = 0, the method reduces to Gradient Descent.
+
+<img src="img/gd1.png" width=450 height=300 left/> <img src="img/m1.png" width=450 height=300 right/>
+
+Momentum $\beta$ can be thought of the weight of the previous force, while learning rate $\alpha$ can be thought of the weight of the current force. Intuitively, the optimal training model needs an optimal combination of $\alpha$ and $\beta$ depending on the eigenvalues of a convex function. Inappropriate values could lead to divergence.
+
+
+```python
+
+```
diff --git a/assignment-2/submission/18340986009/img/.keep b/assignment-2/submission/18340986009/img/.keep
new file mode 100644
index 0000000000000000000000000000000000000000..e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
diff --git a/assignment-2/submission/18340986009/img/bs1.png b/assignment-2/submission/18340986009/img/bs1.png
new file mode 100644
index 0000000000000000000000000000000000000000..31bcbc402c9fc53e61e481c993b8b5f6b23818ac
Binary files /dev/null and b/assignment-2/submission/18340986009/img/bs1.png differ
diff --git a/assignment-2/submission/18340986009/img/bs2.png b/assignment-2/submission/18340986009/img/bs2.png
new file mode 100644
index 0000000000000000000000000000000000000000..6ae0b59ef792b22db214d25cd171b322f2a8484e
Binary files /dev/null and b/assignment-2/submission/18340986009/img/bs2.png differ
diff --git a/assignment-2/submission/18340986009/img/bs3.png b/assignment-2/submission/18340986009/img/bs3.png
new file mode 100644
index 0000000000000000000000000000000000000000..e2960bdf2212931c64ec51e62b620dc78ecf7b8f
Binary files /dev/null and b/assignment-2/submission/18340986009/img/bs3.png differ
diff --git a/assignment-2/submission/18340986009/img/figure 1.png b/assignment-2/submission/18340986009/img/figure 1.png
new file mode 100644
index 0000000000000000000000000000000000000000..fc72d38e7552271f12ed7053ce7225592b4fd294
Binary files /dev/null and b/assignment-2/submission/18340986009/img/figure 1.png differ
diff --git a/assignment-2/submission/18340986009/img/figure 2.png b/assignment-2/submission/18340986009/img/figure 2.png
new file mode 100644
index 0000000000000000000000000000000000000000..2d786ff3ae631f111de4c1cd8cf7c37bfa132dd3
Binary files /dev/null and b/assignment-2/submission/18340986009/img/figure 2.png differ
diff --git a/assignment-2/submission/18340986009/img/gd1.png b/assignment-2/submission/18340986009/img/gd1.png
new file mode 100644
index 0000000000000000000000000000000000000000..758610daf5ff4027bc5af1a3188bcd8763ac1ff9
Binary files /dev/null and b/assignment-2/submission/18340986009/img/gd1.png differ
diff --git a/assignment-2/submission/18340986009/img/lr1.png b/assignment-2/submission/18340986009/img/lr1.png
new file mode 100644
index 0000000000000000000000000000000000000000..2c9be521b06b3fed74e4fdf3b91dd839ef6e8bd9
Binary files /dev/null and b/assignment-2/submission/18340986009/img/lr1.png differ
diff --git a/assignment-2/submission/18340986009/img/lr2.png b/assignment-2/submission/18340986009/img/lr2.png
new file mode 100644
index 0000000000000000000000000000000000000000..21554b6875726e8f09384ccb91543d80bfb30803
Binary files /dev/null and b/assignment-2/submission/18340986009/img/lr2.png differ
diff --git a/assignment-2/submission/18340986009/img/lr3.png b/assignment-2/submission/18340986009/img/lr3.png
new file mode 100644
index 0000000000000000000000000000000000000000..2c9be521b06b3fed74e4fdf3b91dd839ef6e8bd9
Binary files /dev/null and b/assignment-2/submission/18340986009/img/lr3.png differ
diff --git a/assignment-2/submission/18340986009/img/m1.png b/assignment-2/submission/18340986009/img/m1.png
new file mode 100644
index 0000000000000000000000000000000000000000..9a6ad8016f88624cef391c94a980e317c872b5ec
Binary files /dev/null and b/assignment-2/submission/18340986009/img/m1.png differ
diff --git a/assignment-2/submission/18340986009/img/md0.png b/assignment-2/submission/18340986009/img/md0.png
new file mode 100644
index 0000000000000000000000000000000000000000..21513b1a3ced81649b595e1a879352b845ee1452
Binary files /dev/null and b/assignment-2/submission/18340986009/img/md0.png differ
diff --git a/assignment-2/submission/18340986009/img/md1.png b/assignment-2/submission/18340986009/img/md1.png
new file mode 100644
index 0000000000000000000000000000000000000000..588bf1e6170af88b9cff220cca85b6cfa62cc53d
Binary files /dev/null and b/assignment-2/submission/18340986009/img/md1.png differ
diff --git a/assignment-2/submission/18340986009/img/md2.png b/assignment-2/submission/18340986009/img/md2.png
new file mode 100644
index 0000000000000000000000000000000000000000..cb4014256f4f1ef8de960b879ec109df76f4b3bf
Binary files /dev/null and b/assignment-2/submission/18340986009/img/md2.png differ
diff --git a/assignment-2/submission/18340986009/img/md3.png b/assignment-2/submission/18340986009/img/md3.png
new file mode 100644
index 0000000000000000000000000000000000000000..e61cc4aa396ee9ae0424f487a40e7d65adf73d93
Binary files /dev/null and b/assignment-2/submission/18340986009/img/md3.png differ
diff --git a/assignment-2/submission/18340986009/img/md4.png b/assignment-2/submission/18340986009/img/md4.png
new file mode 100644
index 0000000000000000000000000000000000000000..cd354c72566ba0f763b9192130c9c8304728b5f2
Binary files /dev/null and b/assignment-2/submission/18340986009/img/md4.png differ
diff --git a/assignment-2/submission/18340986009/img/md5.png b/assignment-2/submission/18340986009/img/md5.png
new file mode 100644
index 0000000000000000000000000000000000000000..eaaa605aacceaa9dfb934186014eff9ef2eb343e
Binary files /dev/null and b/assignment-2/submission/18340986009/img/md5.png differ
diff --git a/assignment-2/submission/18340986009/img/md6.png b/assignment-2/submission/18340986009/img/md6.png
new file mode 100644
index 0000000000000000000000000000000000000000..0b814967e2386d8c6feb7e88a4c14e9d7cd64bbd
Binary files /dev/null and b/assignment-2/submission/18340986009/img/md6.png differ
diff --git a/assignment-2/submission/18340986009/numpy_fnn.py b/assignment-2/submission/18340986009/numpy_fnn.py
new file mode 100644
index 0000000000000000000000000000000000000000..7d8bac7a0627a5e3235c8850e186a09a2eba72d2
--- /dev/null
+++ b/assignment-2/submission/18340986009/numpy_fnn.py
@@ -0,0 +1,198 @@
+import numpy as np
+
+
+class NumpyOp:
+
+    def __init__(self):
+        self.memory = {}
+        self.epsilon = 1e-12
+
+
+class Matmul(NumpyOp):
+
+    def forward(self, x, W):
+        """
+        x: shape(N, d)
+        w: shape(d, d')
+        """
+        self.memory['x'] = x
+        self.memory['W'] = W
+        h = np.matmul(x, W)
+
+        return h
+
+    def backward(self, grad_y):
+        """
+        grad_y: shape(N, d')
+        """
+        x = self.memory['x']
+        W = self.memory['W']
+
+        grad_x = np.matmul(grad_y, W.T)
+        grad_W = np.matmul(x.T, grad_y)
+        return grad_x, grad_W
+
+
+class Relu(NumpyOp):
+
+    def forward(self, x):
+        self.memory['x'] = x
+        return np.where(x > 0, x, np.zeros_like(x))
+
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+        x = self.memory['x']
+        grad_x = np.where(x <= 0, 0, grad_y)
+
+        return grad_x
+
+
+class Log(NumpyOp):
+
+    def forward(self, x):
+        """
+        x: shape(N, c)
+        """
+        out = np.log(x + self.epsilon)
+        self.memory['x'] = x
+
+        return out
+
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+        x = self.memory['x']
+        grad_x = (x + self.epsilon)**(-1) * grad_y
+
+        return grad_x
+
+
+class Softmax(NumpyOp):
+    """
+    softmax over last dimension
+    """
+
+    def forward(self, x):
+        """
+        x: shape(N, c)
+        """
+        exp = np.exp(x + self.epsilon)
+        out = exp/exp.sum(axis=1, keepdims=True)
+        self.memory['x'] = x
+        self.memory['s'] = out
+
+        return out
+
+    def backward(self, grad_y):
+        """
+        grad_y: same shape as x
+        """
+        s = self.memory['s']
+        n = s.shape[1]
+
+        grad_x = np.zeros(s.shape)
+        # calculate Jacobian for ith row
+        for i in range(s.shape[0]):
+            jacob = np.zeros((n, n))
+
+            for j in range(n):
+                for k in range(n):
+                    if j == k:
+                        jacob[k, j] = s[i, k] * (1-s[i, j])
+                    else:
+                        jacob[k, j] = -1 * s[i, k] * s[i, j]
+            # apply Jacobian to the ith row of grad_y
+            grad_x[i, :] = np.matmul(jacob, grad_y[i, :])
+
+        return grad_x
+
+
+class NumpyLoss:
+
+    def __init__(self):
+        self.target = None
+
+    def get_loss(self, pred, target):
+        self.target = target
+        return (-1 * pred * target).sum(axis=1).mean()
+
+    def backward(self):
+        return -1 * self.target / self.target.shape[0]
+
+
+class NumpyModel:
+    def __init__(self):
+        self.epsilon = 1e-12
+
+        self.W1 = np.random.normal(size=(28 * 28, 256))
+        self.W2 = np.random.normal(size=(256, 64))
+        self.W3 = np.random.normal(size=(64, 10))
+
+        # 以下算子会在 forward 和 backward 中使用
+        self.matmul_1 = Matmul()
+        self.relu_1 = Relu()
+        self.matmul_2 = Matmul()
+        self.relu_2 = Relu()
+        self.matmul_3 = Matmul()
+        self.softmax = Softmax()
+        self.log = Log()
+
+        # 以下变量需要在 backward 中更新。 softmax_grad, log_grad 等为算子反向传播的梯度（ loss 关于算子输入的偏导）
+        self.x1_grad, self.W1_grad = None, None
+        self.relu_1_grad = None
+        self.x2_grad, self.W2_grad = None, None
+        self.relu_2_grad = None
+        self.x3_grad, self.W3_grad = None, None
+        self.softmax_grad = None
+        self.log_grad = None
+
+        # 如需要，以下变量在optimize中更新
+        self.delta1 = 0
+        self.delta2 = 0
+        self.delta3 = 0
+
+    def forward(self, x):
+        x = x.reshape(-1, 28 * 28)
+        a0 = x
+        z1 = self.matmul_1.forward(a0, self.W1)
+        a1 = self.relu_1.forward(z1)
+        z2 = self.matmul_2.forward(a1, self.W2)
+        a2 = self.relu_2.forward(z2)
+        z3 = self.matmul_3.forward(a2, self.W3)
+        a3 = self.softmax.forward(z3)
+        x = self.log.forward(a3)
+
+        return x
+
+    def backward(self, y):
+        self.log_grad = self.log.backward(y)
+        self.softmax_grad = self.softmax.backward(self.log_grad)
+        self.x3_grad, self.W3_grad = self.matmul_3.backward(self.softmax_grad)
+        self.relu_2_grad = self.relu_2.backward(self.x3_grad)
+        self.x2_grad, self.W2_grad = self.matmul_2.backward(self.relu_2_grad)
+        self.relu_1_grad = self.relu_1.backward(self.x2_grad)
+        self.x1_grad, self.W1_grad = self.matmul_1.backward(self.relu_1_grad)
+
+        pass
+
+    def optimize(self, learning_rate, method="GD", momentum=0.9, gamma=0.9):
+        if method == "GD":
+            self.W1 -= learning_rate * self.W1_grad
+            self.W2 -= learning_rate * self.W2_grad
+            self.W3 -= learning_rate * self.W3_grad
+
+        # idea: want different learning rate on different dimensions
+        if method == "Momentum":
+            self.delta1 = momentum * self.delta1 - learning_rate * self.W1_grad
+            self.W1 += self.delta1
+
+            self.delta2 = momentum * self.delta2 - learning_rate * self.W2_grad
+            self.W2 += self.delta2
+
+            self.delta3 = momentum * self.delta3 - learning_rate * self.W3_grad
+            self.W3 += self.delta3
+
+
diff --git a/assignment-2/submission/18340986009/numpy_mnist.py b/assignment-2/submission/18340986009/numpy_mnist.py
new file mode 100644
index 0000000000000000000000000000000000000000..6d2ae6d4817e12e13a11f7b3a38929feb10f52ae
--- /dev/null
+++ b/assignment-2/submission/18340986009/numpy_mnist.py
@@ -0,0 +1,144 @@
+import numpy as np
+import torch
+from numpy_fnn import NumpyModel, NumpyLoss
+from matplotlib import pyplot as plt
+
+
+def plot_curve(data):
+    plt.plot(range(len(data)), data, color='blue')
+    plt.legend(['loss_value'], loc='upper right')
+    plt.xlabel('step')
+    plt.ylabel('value')
+    plt.show()
+
+
+def download_mnist():
+    from torchvision import datasets, transforms
+
+    transform = transforms.Compose([
+        transforms.ToTensor(),
+        transforms.Normalize(mean=(0.1307,), std=(0.3081,))
+    ])
+
+    train_dataset = datasets.MNIST(root="./data/", transform=transform, train=True, download=True)
+    test_dataset = datasets.MNIST(root="./data/", transform=transform, train=False, download=True)
+
+    return train_dataset, test_dataset
+
+
+def one_hot(y, numpy=True):
+    if numpy:
+        y_ = np.zeros((y.shape[0], 10))
+        y_[np.arange(y.shape[0], dtype=np.int32), y] = 1
+        return y_
+    else:
+        y_ = torch.zeros((y.shape[0], 10))
+        y_[torch.arange(y.shape[0], dtype=torch.long), y] = 1
+    return y_
+
+
+def batch(dataset, numpy=True):
+    data = []
+    label = []
+    for each in dataset:
+        data.append(each[0])
+        label.append(each[1])
+    data = torch.stack(data)
+    label = torch.LongTensor(label)
+    if numpy:
+        return [(data.numpy(), label.numpy())]
+    else:
+        return [(data, label)]
+
+
+def mini_batch(dataset, batch_size=128):
+    size = dataset.train_data.shape[0]
+
+    # fetch data and label from dataset MNIST
+    data = []
+    label = []
+    for one_obs in dataset:
+        data.append(np.array(one_obs[0]))
+        label.append(np.array(one_obs[1]))
+
+    data = np.array(data)
+    label = np.array(label)
+
+    # shuffle
+    idx = np.arange(size)
+    np.random.shuffle(idx)
+    data = data[idx]
+    label = label[idx]
+
+    # split to batch, discard the last batch if remaining size < batch_size
+    data_batches = []
+    label_batches = []
+    batched = 0
+    while batched+batch_size <= size:
+        data_batches.append(data[batched:batched + batch_size])
+        label_batches.append(label[batched:batched + batch_size])
+        batched += batch_size
+
+    data_batches = np.array(data_batches)
+    label_batches = np.array(label_batches)
+
+    return zip(data_batches, label_batches)
+
+
+def get_torch_initialization(numpy=True):
+    fc1 = torch.nn.Linear(28 * 28, 256)
+    fc2 = torch.nn.Linear(256, 64)
+    fc3 = torch.nn.Linear(64, 10)
+
+    if numpy:
+        W1 = fc1.weight.T.detach().clone().numpy()
+        W2 = fc2.weight.T.detach().clone().numpy()
+        W3 = fc3.weight.T.detach().clone().numpy()
+    else:
+        W1 = fc1.weight.T.detach().clone().data
+        W2 = fc2.weight.T.detach().clone().data
+        W3 = fc3.weight.T.detach().clone().data
+
+    return W1, W2, W3
+
+
+def numpy_run(epoch_number=3, learning_rate=0.1, batch_size=128):
+    train_dataset, test_dataset = download_mnist()
+
+    model = NumpyModel()
+    numpy_loss = NumpyLoss()
+    model.W1, model.W2, model.W3 = get_torch_initialization()
+
+    train_loss = []
+
+    for epoch in range(epoch_number):
+        for x, y in mini_batch(train_dataset, batch_size):
+            y = one_hot(y)
+
+            y_pred = model.forward(x)
+            loss = numpy_loss.get_loss(y_pred, y)
+
+            model.backward(numpy_loss.backward())
+            model.optimize(learning_rate)
+
+            train_loss.append(loss.item())
+
+        x, y = batch(test_dataset)[0]
+        accuracy = np.mean((model.forward(x).argmax(axis=1) == y))
+        print('[{}] Accuracy: {:.4f}'.format(epoch, accuracy))
+
+    plot_curve(train_loss)
+
+
+if __name__ == "__main__":
+    numpy_run()
+
+#    for lr in [2, 4, 6]:
+#        l_rate = lr/10
+#        numpy_run(epoch_number=3, learning_rate=l_rate, batch_size=128)
+
+#    for bs in [20, 40, 60]:
+#        numpy_run(epoch_number=3, learning_rate=0.1, batch_size=bs)
+
+
+
diff --git a/assignment-3/submission/18340986009/README.md b/assignment-3/submission/18340986009/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..1ce62e27aa51d6f09115eb9c15d02c4a3e82fc14
--- /dev/null
+++ b/assignment-3/submission/18340986009/README.md
@@ -0,0 +1,150 @@
+# Clustering Algorithm (K-Means & GMM)
+
+Below is true for all the following analysis:
+ - Data used: 2-D, Gaussian distributions with given means and variance-covariance matrix.
+ - Default number of iteration is set to 10.
+ - Centroids are initialized by randomly selecting k points from the train dataset.
+
+## 1. Algorithm Realization
+
+Main purpose of this section is to validify the models used for clustering. Dataset below is used for model training.
+
+$
+  N_0 = 300 \hspace{1cm}
+  C_0 \sim \mathcal{N}(\mu = \begin{bmatrix}20\\\\20\end{bmatrix}\,\sigma^{2} = \begin{bmatrix}20 & 5\\\\5 & 35\end{bmatrix})\, 
+$
+
+$
+  N_1 = 300 \hspace{1cm}
+  C_1 \sim \mathcal{N}(\mu = \begin{bmatrix}35\\\\25\end{bmatrix}\,\sigma^{2} = \begin{bmatrix}45 & 0\\\\0 & 20\end{bmatrix})\, 
+$
+
+$
+  N_2 = 300 \hspace{1cm}
+  C_2 \sim \mathcal{N}(\mu = \begin{bmatrix}25\\\\35\end{bmatrix}\,\sigma^{2} = \begin{bmatrix}20 & 0\\\\0 & 15\end{bmatrix})\, 
+$
+
+<img src="img/Fig1.png" width=400 height=300/>
+
+### 1.1 K-Means
+
+Steps：
+
+     1. Initialize centroids by random selection
+     For n_iter times:
+         2. Assign every point to their nearest centroid
+         3. Recalculate Means based on the assigned points 
+
+Result:
+
+<img src="img/Fig2.png" width=400 height=300/>
+
+Red points indicate centroids assigned by our model. They are very close the the actual mean in this example.
+
+### 1.2 GMM
+
+Steps:
+
+    1. Initialize distribution parameters by spliting data into k sub-datasets and perform calculation
+    For n_iter times:
+        2. E-Step, calculate matrix
+        3. M-Step, update parameters (means, variance-covariance, pis)
+
+<img src="img/Fig3.png" width=400 height=300/>
+
+Red points indicate centroids assigned by our model. They are further away from the actual means compared with that of K-Means.
+
+## 2. K-optimizating Experiment
+
+Elbow method:
+Choose K by evaluating the cost (measured by the sum of squared distance from the assigned centroid for every point) of adding every new centroid.
+
+### 2.1 Elbow Method Demonstration
+
+Dataset below is used in this section:
+
+$
+  N_0 = 400 \hspace{1cm}
+  C_0 \sim \mathcal{N}(\mu = \begin{bmatrix}20\\\\20\end{bmatrix},\,\sigma^{2} = \begin{bmatrix}20 & 5\\\\5 & 35\end{bmatrix})\, 
+$
+
+$
+  N_1 = 400 \hspace{1cm}
+  C_1 \sim \mathcal{N}(\mu = \begin{bmatrix}40\\\\20\end{bmatrix},\,\sigma^{2} = \begin{bmatrix}45 & 0\\\\0 & 20\end{bmatrix})\, 
+$
+
+Candicates of K = [1, 2, 3, 4, 5, 6].
+
+For each candidate, run a K-Means model, calculate the sum of squared distance from its centroid for every cluster, and plot the sum of squares to help determine the optimum K.
+
+The optimum K is found automatically by finding the maximum distance between the line of sum-of-squares and the straight line passing through the starting and ending point of the sum-of-squares.
+
+**Elbow Line Plot**
+
+<img src="img/Fig4.png" width=450 height=300/>
+
+**Result From Model**
+
+<img src="img/Fig5.png" width=400 height=300/>
+
+**Misclassified Points**
+
+<img src="img/Fig6.png" width=400 height=300/>
+
+*A limitation of elbow method is that the optimum K will never be the first and last candidate of K, so make sure to be less restricting on choosing the range of K.
+
+### 2.2 Effect of data balancing on Clustering Algorithm
+
+**The effect of data-balancing on Elbow mthod**
+
+Using the same means and variance-covariance matrix in 2.1, we'll investigate if changing the ratios of the number of points between two clusters has any effect on our algorithm.
+
+ratios tested = [1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 1/7]
+
+**Metrics for model evaluation:**
+
+Adjusted Rand Index:
+
+    Let A be a clustering result with clusters {a1, a2, ..., ar}
+    Let B be a clustering result with clusters {b1, b2, ..., bs}
+    Then given a (r,n) contingency table where the values of each cell in the ith, jth position is nij, 
+    the Adjusted Rand Index (ARI) is calculated as:
+
+<img src="img/Fig7.png" width=250 height=40/>
+
+Silhouette Coefficient:
+    
+    Let a be The mean distance between a sample and all other points in the same class
+    Let b be the mean distance between a sample and all other points in the next nearest cluster
+    Then for every point: (Silhouette Coefficient for the whole dataset is taking the mean of all s)
+
+<img src="img/Fig8.png" width=300 height=50/>
+
+**Result**:
+
+|       \      | 1:1 | 1:2 | 1:3 | 1:4 | 1:5 | 1:6 | 1:7 |
+| ------------ |:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| opt K        | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 3 |
+| Adjusted Rand Index| .796|.845|.386|.346|.305|.287|.277|
+| Silhouette Index| .558| .557| .390| .400| .385| .356| .358|
+
+The optimum K found using Elbow Method is 3 for the last 5 cases, which is not the actual number of cluster. Thus, for data 1 ratio less than 1/2, ARI, a measure similar to accuracy drops significantly. Clustering result became less similar to the actual clustering pattern. Silhouette index is roughly equal for 1:1 and 1:2 and equal for the rest of the cases. This difference can also be related to the difference on the choice of K. As two centroids are assigned to a cluster of points which actually shares the same centroid, the separateness between the two clusters becomes small.
+
+<img src="img/Fig9-1.png" width=200 height=150/>
+
+<img src="img/Fig9-2.png" width=200 height=150/>
+
+<img src="img/Fig9-3.png" width=200 height=150/>
+
+<img src="img/Fig9-4.png" width=200 height=150/>
+
+<img src="img/Fig9-5.png" width=200 height=150/>
+
+<img src="img/Fig9-6.png" width=200 height=150/>
+
+<img src="img/Fig9-7.png" width=200 height=150/>
+
+
+```python
+
+```
diff --git a/assignment-3/submission/18340986009/img/Fig1.png b/assignment-3/submission/18340986009/img/Fig1.png
new file mode 100644
index 0000000000000000000000000000000000000000..cb23eba6114b3225a6991c276e9cd3a102c6b0cb
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig1.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig2.png b/assignment-3/submission/18340986009/img/Fig2.png
new file mode 100644
index 0000000000000000000000000000000000000000..a2d3c3a23bf65a44192e1904dcdad43d7891f11b
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig2.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig3.png b/assignment-3/submission/18340986009/img/Fig3.png
new file mode 100644
index 0000000000000000000000000000000000000000..7118b5191c1f9efda7cdef629908fc1475bddc47
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig3.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig4.png b/assignment-3/submission/18340986009/img/Fig4.png
new file mode 100644
index 0000000000000000000000000000000000000000..bab80a2fb31b0a6eb2652e30a6097e9db8525705
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig4.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig5.png b/assignment-3/submission/18340986009/img/Fig5.png
new file mode 100644
index 0000000000000000000000000000000000000000..edf4664ba3ff28f92699c9248ffc8e968de5db6b
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig5.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig6.png b/assignment-3/submission/18340986009/img/Fig6.png
new file mode 100644
index 0000000000000000000000000000000000000000..54af5e9aec8e7f9fddd7fe3b2a9622276693b27e
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig6.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig7.png b/assignment-3/submission/18340986009/img/Fig7.png
new file mode 100644
index 0000000000000000000000000000000000000000..8c9eb0bc633a158b5dfb56a4330915dc22b60506
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig7.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig8.png b/assignment-3/submission/18340986009/img/Fig8.png
new file mode 100644
index 0000000000000000000000000000000000000000..5ac03f92568dd335b741da2be1824d4b9473be30
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig8.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig9-1.png b/assignment-3/submission/18340986009/img/Fig9-1.png
new file mode 100644
index 0000000000000000000000000000000000000000..8137c6fa1bbb53dac83335476e98656e09955cd4
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig9-1.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig9-2.png b/assignment-3/submission/18340986009/img/Fig9-2.png
new file mode 100644
index 0000000000000000000000000000000000000000..b04d8a0fb605b8a6a4d3f661f7473e3945250bff
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig9-2.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig9-3.png b/assignment-3/submission/18340986009/img/Fig9-3.png
new file mode 100644
index 0000000000000000000000000000000000000000..9a3e07f1d98ff8bcd2fceaeb5c1c34f7085b9367
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig9-3.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig9-4.png b/assignment-3/submission/18340986009/img/Fig9-4.png
new file mode 100644
index 0000000000000000000000000000000000000000..9b1206588c7a951f8545538873760e5525809211
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig9-4.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig9-5.png b/assignment-3/submission/18340986009/img/Fig9-5.png
new file mode 100644
index 0000000000000000000000000000000000000000..92cb91dd1d2b600706324b90b007217f4e5d97dd
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig9-5.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig9-6.png b/assignment-3/submission/18340986009/img/Fig9-6.png
new file mode 100644
index 0000000000000000000000000000000000000000..1b59df4bf3d80523740e6152a7e6cea8994d3e57
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig9-6.png differ
diff --git a/assignment-3/submission/18340986009/img/Fig9-7.png b/assignment-3/submission/18340986009/img/Fig9-7.png
new file mode 100644
index 0000000000000000000000000000000000000000..1cc40ecc3063fc3d20ff81cb3bde700925ea6a1c
Binary files /dev/null and b/assignment-3/submission/18340986009/img/Fig9-7.png differ
diff --git a/assignment-3/submission/18340986009/source.py b/assignment-3/submission/18340986009/source.py
new file mode 100644
index 0000000000000000000000000000000000000000..32b86e889b21dc25d48be85bfac90abd394bb5cb
--- /dev/null
+++ b/assignment-3/submission/18340986009/source.py
@@ -0,0 +1,358 @@
+import numpy as np
+import matplotlib.pyplot as plt
+
+import sklearn.metrics
+
+
+
+def shuffle(*datas):
+    data = np.concatenate(datas)
+    label = np.concatenate([
+        np.ones((d.shape[0],), dtype=int) * i
+        for (i, d) in enumerate(datas)
+    ])
+    N = data.shape[0]
+    idx = np.arange(N)
+    np.random.shuffle(idx)
+    data = data[idx]
+    label = label[idx]
+    return data, label
+
+
+def data_2d(ns, means, covs, train_pct=.8):
+    # Generate data sets and labels
+    d_list = []
+    l_list = []
+    for i in range(len(ns)):
+        d = np.random.multivariate_normal(means[i], covs[i], (ns[i],))
+        d_list.append(d)
+        l_list.append(np.repeat(i, ns[i]))
+    data = np.concatenate(d_list)
+    label = np.concatenate(l_list)
+
+    # Shuffle
+    N = np.sum(ns)
+    cut = int(np.ceil(N * train_pct))
+    idx = np.arange(N)
+    np.random.shuffle(idx)
+
+    data = data[idx]
+    label = label[idx]
+
+    return (data, data), len(ns), label
+
+
+def find_dist(x, y):
+    return np.sqrt(np.matmul(x - y, x - y))
+
+
+def plot_cluster(test_data, result, assigned_means, true_means):
+    # Plot model result
+    plt.scatter(test_data[:, 0], test_data[:, 1],
+                c=result, alpha=.6)
+    # Plot model assigned means
+    plt.scatter(assigned_means[:, 0], assigned_means[:, 1],
+                c='red', marker='^', s=70)
+    # Plot true means
+    plt.scatter(true_means[:, 0], true_means[:, 1],
+                c='black', marker='^', s=70)
+    plt.show()
+
+
+class KMeans:
+
+    def __init__(self, n_clusters, n_iter=10):
+        self.k = n_clusters
+        self.iter = n_iter
+        self.centroids = None
+
+    def fit(self, train_data):
+        # INITIALIZE centroids
+        idx = np.random.choice(len(train_data), self.k, replace=False)
+        self.centroids = train_data[idx]
+
+        # ITERATE for a specified number of times to update the centroids
+        centroids_record = [self.centroids]
+        for _ in range(self.iter):
+            # ASSIGN the nearest centroid to every point
+            groups = []
+            for point in train_data:
+                dists = []
+                for centroid in self.centroids:
+                    dist = find_dist(point, centroid)
+                    dists.append(dist)
+                group = np.argmin(dists)
+                groups.append(group)
+
+            # UPDATE centroids based on assigned points
+            new_centroids = []
+            for group in np.unique(groups):
+                idx = [v == group for v in groups]
+                new_centroid = np.mean(train_data[idx], axis=0)
+                new_centroids.append(new_centroid)
+            self.centroids = new_centroids
+            centroids_record.append(np.array(new_centroids))
+
+    def predict(self, test_data):
+        # ASSIGN test data points to their nearest centroid
+        groups = []
+        for point in test_data:
+            dists = []
+            for centroid in self.centroids:
+                dist = find_dist(point, centroid)
+                dists.append(dist)
+            group = np.argmin(dists)
+            groups.append(group)
+
+        return np.array(groups)
+
+
+class GaussianMixture:
+
+    def __init__(self, n_clusters, n_iter=5):
+        self.k = n_clusters
+        self.centroids = None
+        self.covs = None
+        self.pis = None
+        self.iter = n_iter
+        self.epsilon = 10 ** (-5)
+
+    def __Normal__(self, x, miu, cov):
+        # If data is 1-D
+        if cov.shape == (1, 1):
+            c = 1 / (np.sqrt(2 * np.pi) * np.sqrt(cov))
+            md = ((x - miu) / np.sqrt(cov)) ** 2 / (-2)
+
+        # If data is > 1-D
+        else:
+            c = 1 / np.sqrt((2 * np.pi) ** cov.shape[0] * np.linalg.det(cov))
+            md = np.dot(np.dot((x - miu).T, np.linalg.inv(cov)), (x - miu)) / (-2)
+
+        return np.max([c * np.exp(md), self.epsilon])
+
+    def fit(self, train_data):
+        # SPLIT train_data into k sub-data sets
+        N = len(train_data)
+        sub_ds = np.array_split(train_data, self.k)
+
+        # Randomly add another point if there exists only one number in an array
+        for i in range(len(sub_ds)):
+            if len(sub_ds[i]) == 1:
+                idx = np.random.choice(N - 1, 1)
+                dum = sub_ds[i].tolist()
+                dum.append(train_data[idx].tolist()[0])
+                sub_ds[i] = np.array(dum)
+
+        # INITIALIZE means, covariances, and pis for sub-data sets
+        self.pis = [1 / self.k for _ in range(self.k)]
+        self.centroids = [np.mean(v, axis=0) for v in sub_ds]
+        self.covs = []
+        for i in range(self.k):
+            # If data is 1-D
+            if sub_ds[i].shape[0] != 1 and sub_ds[i].shape[1] == 1:
+                cov = np.array([[np.var(sub_ds[i])]])
+            # If data is > 1-D
+            else:
+                cov = np.cov(sub_ds[i].T)
+            self.covs.append(cov)
+
+        # ITERATE for a specified number of times to update parameters
+        for _ in range(self.iter):
+            # E STEP
+            # CALCULATE r matrix
+            r_mat = np.zeros((N, self.k))
+            for n in range(N):
+                for k in range(self.k):
+                    num = self.pis[k] * self.__Normal__(train_data[n], self.centroids[k], self.covs[k])
+                    dum = [self.pis[k] * self.__Normal__(train_data[n], self.centroids[k], self.covs[k])
+                           for k in range(self.k)]
+                    denom = np.sum(dum)
+                    r_mat[n][k] = num / denom
+            Nks = np.sum(r_mat, axis=0)
+
+            # M STEP
+            # UPDATE pis
+            self.pis = [Nks[k] / N for k in range(self.k)]
+
+            # UPDATE mean parameter
+            self.centroids = np.zeros((self.k, len(train_data[0])))
+            for k in range(self.k):
+                for n in range(N):
+                    self.centroids[k] += r_mat[n][k] * train_data[n]
+            self.centroids = [self.centroids[k] / Nks[k] for k in range(self.k)]
+
+            # UPDATE covariance parameter
+            self.covs = [np.zeros((len(train_data[0]), len(train_data[0]))) for _ in range(self.k)]
+            for k in range(self.k):
+                for n in range(N):
+                    v = (train_data[n] - self.centroids[k])
+                    self.covs[k] += r_mat[n][k] * np.outer(v, v)
+            self.covs = [self.covs[k] / Nks[k] for k in range(self.k)]
+
+    def predict(self, test_data):
+        groups = []
+        # ASSIGN test data points to centroids with their maximum conditional probability
+        for n in range(len(test_data)):
+            ps = [self.__Normal__(test_data[n], self.centroids[k], self.covs[k]) for k in range(self.k)]
+            group = np.argmax(ps)
+            groups.append(group)
+
+        return np.array(groups)
+
+
+class ClusteringAlgorithm:
+
+    def __init__(self, Ks, method='KMeans'):
+        self.Ks = Ks
+        self.method = method
+        self.optimum_K = None
+        self.centroids = None
+
+    def __plot__(self, K, S):
+        plt.figure(figsize=(16, 8))
+        plt.plot(K, S, 'bx-')
+        plt.axvline(x=self.optimum_K, linestyle='--', color='r')
+        plt.xlabel('K')
+        plt.title('Elbow Method')
+        plt.show()
+
+    def fit(self, train_data):
+        Scores_of_Ks = []
+
+        # ITERATE over all K candidates
+        for j in range(len(self.Ks)):
+            # Train Model & Obtain result
+            if self.method == 'KMeans':
+                model = KMeans(self.Ks[j])
+            if self.method == 'GMM':
+                model = GaussianMixture(self.Ks[j])
+            model.fit(train_data)
+            result = model.predict(train_data)
+
+            # CALCULATE cumulative sum for each K
+            cumsum = 0
+            for i in range(len(train_data)):
+                centroid = model.centroids[result[i]]
+                cumsum += find_dist(centroid, train_data[i])
+            Scores_of_Ks.append(cumsum)
+
+        # FIND Optimum K
+        d = np.zeros((len(self.Ks), 1))
+        for i in range(len(self.Ks)):
+            y = (Scores_of_Ks[-1] - Scores_of_Ks[0]) / (self.Ks[-1] - self.Ks[0]) * \
+                (self.Ks[i] - self.Ks[0]) + Scores_of_Ks[0]
+            d[i] = y - Scores_of_Ks[i]
+
+        self.optimum_K = self.Ks[np.argmax(d)]
+
+        # VISUALIZE
+        self.__plot__(self.Ks, Scores_of_Ks)
+
+    def predict(self, test_data):
+
+        if self.method == 'KMeans':
+            model = KMeans(self.optimum_K)
+        if self.method == 'GMM':
+            model = GaussianMixture(self.optimum_K)
+
+        model.fit(test_data)
+        self.centroids = model.centroids
+        result = model.predict(test_data)
+
+        return result
+
+    def eval(self, test_data, true_label, true_means, assigned_label):
+        N = len(self.centroids)
+
+        # FIND the corresponding groups
+        corresponds = dict.fromkeys(np.unique(true_label))
+        for g in range(N):
+            flt = true_label[assigned_label == g]
+            score = [len(flt[flt == m]) for m in range(N)]
+            corresponds[g] = np.argmax(score)
+        pred_label = np.array([corresponds[v] for v in assigned_label])
+
+        # Plot model result
+        wrong_pts = test_data[true_label != pred_label]
+        plt.scatter(test_data[:, 0], test_data[:, 1],
+                    c='grey', alpha=.6)
+        # Plot wrong points
+        plt.scatter(wrong_pts[:, 0], wrong_pts[:, 1],
+                    c='red', alpha=.8)
+        # Plot model assigned means
+        plt.scatter([v[0] for v in self.centroids], [v[1] for v in self.centroids],
+                    c='purple', marker='^', s=70)
+        # Plot true means
+        plt.scatter(true_means[:, 0], true_means[:, 1],
+                    c='black', marker='^', s=70)
+        plt.show()
+
+        return assigned_label
+
+
+if __name__ == '__main__':
+    # 基础实验====================================================
+    ns = np.array([300, 300, 300])
+
+    means = np.array([
+        [20, 20],
+        [35, 25],
+        [25, 35]])
+
+    covs = np.array([
+        [[20, 5], [5, 35]],
+        [[45, 0], [0, 20]],
+        [[20, 0], [0, 15]]])
+
+    (train_data, test_data), n_clusters, test_label = data_2d(ns, means, covs)
+
+    # K-Means
+    model = KMeans(n_clusters, n_iter=10)
+    model.fit(train_data)
+    res = model.predict(test_data)
+    plot_cluster(test_data, res,
+                 np.array([v.tolist() for v in model.centroids]),
+                 means)
+
+    # GMM
+    model = GaussianMixture(n_clusters, n_iter=10)
+    model.fit(train_data)
+    res = model.predict(test_data)
+    plot_cluster(test_data, res,
+                 np.array([v.tolist() for v in model.centroids]),
+                 means)
+
+    # 自动选择聚簇数量实验===========================================
+    Ks = [1, 2, 3, 4, 5, 6, 7, 8]
+
+    table = np.zeros((len(Ks), 2))
+    for k in range(len(Ks)):
+        print(k)
+        cut = int(800 / (1 + k))
+
+        ns = np.array([cut, 800 - cut])
+
+        means = np.array([
+            [20, 20],
+            [40, 20]])
+
+        covs = np.array([
+            [[20, 5], [5, 35]],
+            [[45, 0], [0, 20]]])
+
+        (train_data, test_data), n_clusters, test_label = data_2d(ns, means, covs)
+
+        model = ClusteringAlgorithm(Ks, method='KMeans')
+        model.fit(train_data)
+        print('optimum K')
+        print(model.optimum_K)
+        res = model.predict(test_data)
+        # plot_cluster(test_data, res,
+        #              np.array([v.tolist() for v in model.centroids]),
+        #              means)
+
+        assigned_label = model.eval(test_data, test_label, means, res)
+        rand_s = sklearn.metrics.adjusted_rand_score(test_label, assigned_label)
+        silhouette_s = sklearn.metrics.silhouette_score(test_data, assigned_label)
+        table[k, 0] = rand_s
+        table[k, 1] = silhouette_s