diff --git a/assignment-1/submission/18307130116/README.md b/assignment-1/submission/18307130116/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..142f441ff2c2994c3d62c3cbb861fd7c637837f8
--- /dev/null
+++ b/assignment-1/submission/18307130116/README.md
@@ -0,0 +1,235 @@
+# KNN分类器
+
+[toc]
+
+## 依赖包
+
+`numpy`
+
+`matplotlib`
+
+## 函数功能介绍
+
+### KNN
+
+**`fit(self, train_data, train_label)`**
+
+`train_data`训练点集
+
+`train_label`训练标签
+
+**功能简介:**`fit`函数将会取出训练集中的10%用于寻找让准确率最大的K,如果训练集少于10个点,则会默认`K = 1`,否则将会选择1到10中使得准确率最大的K,作为预测时使用的K
+
+---
+
+`predict(self, test_data)`
+
+**功能简介:**根据前一步学习到的K预测对应的类别
+
+### 实验函数与辅助函数
+
+**`distance(point1, point2, method="Euclid")`**
+
+`point1`和`point2`为需要计算距离的两个点
+
+`method`给出了计算距离的指标,默认为欧氏距离,`Manhattan`可按照曼哈顿距离计算
+
+**功能简介:**函数开始会将输入标准化为[m, 1]的向量,并按照相应的方式计算两个点之间的距离
+
+-------
+
+**`dis(dis_label)`**
+
+**功能简介:**`sort`的`key`函数,取出二元组(distance, label)中的distance
+
+---
+
+**`nearest_k_label_max(point, point_arr, label_arr, k)`**
+
+`point`需寻找k个临近点的目标点
+
+`point_arr`已有的点集
+
+`label_arr`已有点集对应的标签集合
+
+`k`考虑的最近的点的数量
+
+**功能简介:**函数将计算目标点和点集中所有点的距离,找到K个距离最近点,并返回出现最多次数的`label`
+
+---
+
+**`data_generate_and_save(class_num, mean_list, cov_list, num_list, save_path = "")`**
+
+`class_num` 共包含的类的数量
+
+`mean_list` 各个类的高斯分布对应的均值矩阵
+
+`cov_list` 各个类的协方差矩阵
+
+`num_list` num_list[i]对应于第i个类的点数
+
+`save_path` 生成的点集的存储路径,默认为当前目录下的`data.npy`,路径需以下划线结尾
+
+**功能简介:**该函数通过调用`numpy.random.multivariate_normal`,生成指定数目的点,随机打乱后,划分其中的80%为训练数据,20%为测试数据,以元组`((train_data, train_label), (test_data, test_label))`的形式保存
+
+---
+
+**`data_load(path = "")`**
+
+`path` 加载点集的存储路径,默认为当前目录下的`data.npy`,路径需以下划线结尾
+
+**功能简介:**点集需以元组`((train_data, train_label), (test_data, test_label))`的形式保存
+
+---
+
+**`visualize(data, label, class_num = 1, test_data=[])`**
+
+*可视化目前只支持二维,如果是高维点集,将只可视化前两维*
+
+`data` 训练点集坐标
+
+`label`训练点集对应的标签
+
+`class_num`类别总数,默认值为1
+
+`test_data`测试点集坐标
+
+**功能简介:**绘制点集散点图,不同类别自适应的用不同颜色表征,测试点集将通过"+"表征
+
+## 实验
+
+首先,我们生成了三类坐标点,每类数量100
+
+其对应的数量和协方差矩阵如下表所示
+
+| | 均值 | 协方差矩阵 |
+| ------- | ------- | ----------------- |
+| class 1 | (1, 2) | [[10, 0], [0, 2]] |
+| class 2 | (4, 5) | [[7, 3], [15, 1]] |
+| class 3 | (-2, 6) | [[0, 1], [1, 2]] |
+
+测试了1-10对应的准确率,如下图所示
+
+
+
+在保证准确率不变的条件下,选择较小的数值k=5,预测的准确率达83.3%,对应数据可视化如下图
+
+
+
+### 对比实验1:减少点集重叠
+
+上图能较为清晰的看到,三种颜色的点集分布基本分离开,但是仍存在一部分重叠,推测重叠部分会使得KNN效果变差,下面通过改变均值和协方差验证这一结论
+
+首先将协方差对应更改成为
+
+| | 均值 | 协方差矩阵 |
+| ------- | ------- | --------------- |
+| class 1 | (1, 2) | [[1,0], [0, 1]] |
+| class 2 | (4, 5) | [[1,0], [0, 1]] |
+| class 3 | (-2, 6) | [[1,0], [0, 1]] |
+
+对应K的曲线和点集分布图如下
+
+
+
+此时选择K = 3,对应的KNN准确率已经提高到了96.7%符合预期
+
+同样的,我们更改对应的均值大小,使得高斯分布尽可能分开
+
+| | 均值 | 协方差矩阵 |
+| ------- | --------- | ----------------- |
+| class 1 | (-10, 2) | [[10, 0], [0, 2]] |
+| class 2 | (4, 5) | [[7, 3], [15, 1]] |
+| class 3 | (-2, -16) | [[0, 1], [1, 2]] |
+
+对应曲线如下,准确率达到1.0,此时K=1已经达到了最大值
+
+
+
+#### 结论
+
+从该对比实验中,我们能够较为清晰的看到点集分布对于KNN准确率的影响,当类之间重合度较低时,KNN的准确率显著提升
+
+### 对比实验2:距离选择
+
+在上述实验中,我们采用的距离为欧式距离,下面将更改距离计算方式为曼哈顿距离,考察对应的影响
+
+当点集区分较开时,曼哈顿距离与欧式距离在准确率上差别不大,这里不做展示,当点集重叠程度较高时,对以下分布生成了多组数据
+
+| | 均值 | 协方差矩阵 |
+| ------- | ------ | ----------------- |
+| class 1 | (1, 4) | [[10, 0], [0, 2]] |
+| class 2 | (2, 5) | [[7, 3], [15, 1]] |
+| class 3 | (2, 6) | [[0, 1], [1, 2]] |
+
+对应的k值选取和准确率(acc)如下表所示
+
+| 欧氏距离 | 曼哈顿距离 |
+| ----------------- | ------------------ |
+| k = 3, acc = 0.7 | k=3, acc= 0.683 |
+| k = 1, acc = 0.53 | k = 1, acc = 0.483 |
+| k = 7, acc = 0.63 | k = 8, acc = 0.567 |
+
+综合来看点集分布重叠程度较高时,欧氏距离优于曼哈顿距离,推测以高斯分布生成的点,欧式距离对某一维度上较大差距的惩罚大于曼哈顿距离,较符合高斯分布点生成方式,较好拟合当前位置的概率密度,从而准确率更高
+
+#### 结论
+
+当点集区分较开时,曼哈顿距离和欧式距离差别不大,点集重合较大时,欧式距离由于曼哈顿距离
+
+### 对比实验3:点集数量
+
+对于如下分布
+
+| | 均值 | 协方差矩阵 |
+| ------- | ------- | ----------------- |
+| class 1 | (1, 4) | [[10, 0], [0, 2]] |
+| class 2 | (2, -3) | [[7, 3], [15, 1]] |
+| class 3 | (2, 5) | [[0, 1], [1, 2]] |
+
+分别生成了[100, 100, 100], [100, 10, 100], [100, 50, 200],[200, 200, 200]四组,每组多次避免偶然误差
+
+结果如下表格所示
+
+| | [100, 100, 100] | [100, 10, 100] | [100, 50, 200] | [200, 200, 200] |
+| ---- | --------------- | -------------- | -------------- | --------------- |
+| 1 | 0.867 | 0.809 | 0.886 | 0.875 |
+| 2 | 0.800 | 0.809 | 0.843 | 0.825 |
+| 3 | 0.867 | 0.809 | 0.857 | 0.9 |
+| 4 | 0.917 | 0.761 | 0.886 | 0.792 |
+| 平均 | 0.862 | 0.797 | 0.868 | 0.848 |
+
+#### 结论
+
+当点集数量上升时,增大重叠面积,准确率相应下降,当某组点数量显著小于其他点集时,将会较大影响到准确率,当差距过大时,将会一定程度上退化成N-1分类问题,反而导致准确率提升
+
+### 对比实验4:各维度尺度
+
+当各个维度的尺度并不匹配时,例如(年龄,财产)二元组,基于空间上欧式距离相当于退化成为闵式距离,为进一步对比其影响,生成了如下数据
+
+| | 均值 | 协方差矩阵 |
+| ------- | -------- | --------------------- |
+| class 1 | (1, 400) | [[10, 0], [0, 20000]] |
+| class 2 | (2, 300) | [[7, 0], [0, 10000]] |
+| class 3 | (2, 300) | [[1, 0], [0, 10000]] |
+
+其中一组对应k和点集分布如下图所示,多次测量的平均准确率为0.399
+
+
+
+为了对比其影响,我们等比例放缩对应的维度100倍,
+
+| | 均值 | 协方差矩阵 |
+| ------- | ------ | ----------------- |
+| class 1 | (1, 4) | [[10, 0], [0, 2]] |
+| class 2 | (2, 3) | [[7, 0], [15, 1]] |
+| class 3 | (2, 3) | [[1, 0], [0, 1]] |
+
+对应的k和点集可视化如下图
+
+
+
+多次测量的平均准确率为0.539
+
+#### 结论
+
+尺度归一化较大程度的影响了准确率的大小,通过等比例尺度放缩,准确率有了较大提升,但是,结合前面点集分布的表现,推测当点集自身区分较开时,归一化的影响不大
\ No newline at end of file
diff --git a/assignment-1/submission/18307130116/img/Figure_1.png b/assignment-1/submission/18307130116/img/Figure_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..b840aa5b2862be15a71968435433efc147086318
Binary files /dev/null and b/assignment-1/submission/18307130116/img/Figure_1.png differ
diff --git a/assignment-1/submission/18307130116/img/Figure_2_1.png b/assignment-1/submission/18307130116/img/Figure_2_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..5e2b73e556a36aa5db294e9c2c42fc039728279d
Binary files /dev/null and b/assignment-1/submission/18307130116/img/Figure_2_1.png differ
diff --git a/assignment-1/submission/18307130116/img/Figure_2_2.png b/assignment-1/submission/18307130116/img/Figure_2_2.png
new file mode 100644
index 0000000000000000000000000000000000000000..3c6ec2fa9693474116ae15a76359f69b442d99b1
Binary files /dev/null and b/assignment-1/submission/18307130116/img/Figure_2_2.png differ
diff --git a/assignment-1/submission/18307130116/img/Figure_2_3.png b/assignment-1/submission/18307130116/img/Figure_2_3.png
new file mode 100644
index 0000000000000000000000000000000000000000..a893f35d277af8c818a69f49cee5e2bbe06c2367
Binary files /dev/null and b/assignment-1/submission/18307130116/img/Figure_2_3.png differ
diff --git a/assignment-1/submission/18307130116/img/Figure_2_4.png b/assignment-1/submission/18307130116/img/Figure_2_4.png
new file mode 100644
index 0000000000000000000000000000000000000000..34e3cb5e2c15ae4104a1f12fbd9ef62af24cb03e
Binary files /dev/null and b/assignment-1/submission/18307130116/img/Figure_2_4.png differ
diff --git a/assignment-1/submission/18307130116/img/Figure_5_1.png b/assignment-1/submission/18307130116/img/Figure_5_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..09921dca1bbeebae81d5b0f71eafe9ab0f0ce75a
Binary files /dev/null and b/assignment-1/submission/18307130116/img/Figure_5_1.png differ
diff --git a/assignment-1/submission/18307130116/img/Figure_5_2.png b/assignment-1/submission/18307130116/img/Figure_5_2.png
new file mode 100644
index 0000000000000000000000000000000000000000..18ed90b7cd1ec5f2c91a863393b21b655b040eb6
Binary files /dev/null and b/assignment-1/submission/18307130116/img/Figure_5_2.png differ
diff --git a/assignment-1/submission/18307130116/img/Figure_6_1.png b/assignment-1/submission/18307130116/img/Figure_6_1.png
new file mode 100644
index 0000000000000000000000000000000000000000..6fdc07c00f7cfdcead4f8cf98880ce1cd76f9526
Binary files /dev/null and b/assignment-1/submission/18307130116/img/Figure_6_1.png differ
diff --git a/assignment-1/submission/18307130116/img/Figure_6_2.png b/assignment-1/submission/18307130116/img/Figure_6_2.png
new file mode 100644
index 0000000000000000000000000000000000000000..72685efbfd9bc42f675811e5f92bf88c6bbc3851
Binary files /dev/null and b/assignment-1/submission/18307130116/img/Figure_6_2.png differ
diff --git a/assignment-1/submission/18307130116/img/k1.png b/assignment-1/submission/18307130116/img/k1.png
new file mode 100644
index 0000000000000000000000000000000000000000..8a81a8e624428a86d14851ca1a9848cf11c61be0
Binary files /dev/null and b/assignment-1/submission/18307130116/img/k1.png differ
diff --git a/assignment-1/submission/18307130116/source.py b/assignment-1/submission/18307130116/source.py
new file mode 100644
index 0000000000000000000000000000000000000000..4daa13c95a45ed7371bb33f20bdd2f4d821894ae
--- /dev/null
+++ b/assignment-1/submission/18307130116/source.py
@@ -0,0 +1,154 @@
+import numpy as np
+import matplotlib.pyplot as plt
+import matplotlib.cm as cm
+
+def distance(point1, point2, method="Euclid"):
+ """
+ suppose dimention of points is m * 1
+ """
+ if point1.ndim == 1:
+ point1 = np.expand_dims(point1, axis=1)
+ if point2.ndim == 1:
+ point2 = np.expand_dims(point2, axis=1)
+ if point1.shape[0] == 1:
+ point1 = point1.reshape(-1, 1)
+ if point2.shape[0] == 1:
+ point2 = point2.reshape(-1, 1)
+ dimention_num = point1.shape[0]
+ result = 0
+ if(method == "Euclid"):
+ if dimention_num != point1.size:
+ print("error")
+ return -1
+ for iter in range(dimention_num):
+ result += (point1[iter, 0]-point2[iter, 0])**2
+ return pow(result, 0.5)
+ if(method == "Manhattan"):
+ if dimention_num != point1.size:
+ print("error")
+ return -1
+ for iter in range(dimention_num):
+ result += abs(point1[iter, 0]-point2[iter, 0])
+ return result
+
+def dis(dis_label):
+ return dis_label[0]
+
+def nearest_k_label_max(point, point_arr, label_arr, k):
+ distance_arr = []
+ for iter in range(len(point_arr)):
+ distance_arr.append((distance(point, point_arr[iter]), label_arr[iter]))
+ distance_arr.sort(key=dis)
+ result = []
+ for iter in range(k):
+ result.append(distance_arr[iter][1])
+ return max(result, key=result.count)
+
+class KNN:
+
+ def __init__(self):
+ pass
+
+ def fit(self, train_data, train_label):
+ num = train_data.shape[0]
+ dimention_num = train_data.shape[1]
+ self.train_data = train_data
+ self.train_label = train_label
+ dev_num = int(num * 0.1)
+ dev_data = train_data[:dev_num]
+ dev_label = train_label[:dev_num]
+ train_data = train_data[dev_num:]
+ train_label = train_label[dev_num:]
+ correct_cout_max = 0
+ k_max = 0
+ accu = []
+ if dev_num == 0:
+ print("points number too few, so we choose k = 1")
+ self.k = 1
+ return
+
+ for iter in range(1, min(num-dev_num, 10)):#find the best k
+ correct_count = 0
+ for j in range(len(dev_data)):
+ predict_label = nearest_k_label_max(dev_data[j], train_data, train_label, iter)
+ if(predict_label == dev_label[j]):
+ correct_count += 1
+ if correct_count > correct_cout_max:
+ correct_cout_max = correct_count
+ k_max = iter
+ accu.append(correct_count/dev_num)
+ x = range(1, min(num-dev_num, 10))
+ #this part is only for experiment, so I commented it for auto test
+ # plt.plot(x,accu)
+ # plt.show()
+ self.k = k_max
+ print("choose k=", k_max)
+
+ def predict(self, test_data):
+ result = []
+ for iter in range(len(test_data)):
+ result.append(nearest_k_label_max(test_data[iter,:], self.train_data, self.train_label, self.k))
+ return np.array(result)
+
+#here we need some utils
+def data_generate_and_save(class_num, mean_list, cov_list, num_list, save_path = ""):
+ """
+ class_num: the number of class
+ mean_list: mean_list[i] stand for the mean of class[i]
+ cov_list: similar to mean_list, stand for the covariance
+ num_list: similar to mean_list, stand for the number of points in class[i]
+ save_path: the data storage path, end with slash.
+ """
+ data = np.random.multivariate_normal(mean_list[0], cov_list[0], (num_list[0],))
+ label = np.zeros((num_list[0],),dtype=int)
+ total = num_list[0]
+
+ for iter in range(1, class_num):
+ temp = np.random.multivariate_normal(mean_list[iter], cov_list[iter], (num_list[iter],))
+ label_temp = np.ones((num_list[iter],),dtype=int)*iter
+ data = np.concatenate([data, temp])
+ label = np.concatenate([label, label_temp])
+ total += num_list[iter]
+
+ idx = np.arange(total)
+ np.random.shuffle(idx)
+ data = data[idx]
+ label = label[idx]
+ train_num = int(total * 0.8)
+ train_data = data[:train_num, ]
+ test_data = data[train_num:, ]
+ train_label = label[:train_num, ]
+ test_label = label[train_num:, ]
+ # print(test_label.size)
+ np.save(save_path+"data.npy", ((train_data, train_label), (test_data, test_label)))
+
+def data_load(path = ""):
+ (train_data, train_label), (test_data, test_label) = np.load(path+"data.npy",allow_pickle=True)
+ return (train_data, train_label), (test_data, test_label)
+
+def visualize(data, label, class_num = 1, test_data=[]):
+ data_x = {}
+ data_y = {}
+ for iter in range(class_num):
+ data_x[iter] = []
+ data_y[iter] = []
+ for iter in range(len(label)):
+ data_x[label[iter]].append(data[iter, 0])
+ data_y[label[iter]].append(data[iter, 1])
+ colors = cm.rainbow(np.linspace(0, 1, class_num))
+
+ for class_idx, c in zip(range(class_num), colors):
+ plt.scatter(data_x[class_idx], data_y[class_idx], color=c)
+ if(len(test_data) != 0):
+ plt.scatter(test_data[:, 0], test_data[:, 1], marker='+')
+ plt.show()
+
+#experiment begin
+if __name__ == "__main__":
+ mean_list = [(1, 4), (2, 3), (2, 3)]
+ cov_list = [np.array([[10, 0], [0, 2]]), np.array([[7, 0], [0, 1]]), np.array([[1, 0], [0, 1]])]
+ num_list = [200, 200, 200]
+ save_path = ""
+ data_generate_and_save(3, mean_list, cov_list, num_list, save_path)
+ # (train_data, train_label), (test_data, test_label) = data_load()
+ # visualize(train_data, train_label, 3)
\ No newline at end of file