# food_recognizer **Repository Path**: yuanyuan2000/food_recognizer ## Basic Information - **Project Name**: food_recognizer - **Description**: 依据cifar10的cnn网络使用三层卷积识别蔬菜水果图片 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2019-10-19 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # food_recognizer # 菜品识别(Fod Recognizer)Demo ### 1. 知识储备 1. 爬虫的基本思想 2. python图像库对图片进行处理 3. python二进制文件的制作 4. 深度学习中神经网络的基本理念 5. CNN(卷积神经网络)的构建 ### 2. 获取数据集 #### 2-1. 爬虫获取图片 图片源选择的是百度图片,由于百度图片是动态js加载的action,所以使用python的request库添加参数动态获取页面。爬虫代码如下: #将搜索关键词作为keyword参数传递,抓取想要数目的页面图片 def getIntPages(keyword, pages): params = [] for i in range(30, 30*pages+30, 30): params.append({ 'tn':'resultjson_com', 'ipn': 'rj', 'ct':'201326592', 'is': '', 'fp': 'result', 'queryWord': keyword, 'cl': '2', 'lm': '-1', 'ie': 'utf-8', 'oe': 'utf-8', 'st': '-1', 'ic': '0', 'word': keyword, 'face': '0', 'istype': '2', 'nc': '1', 'pn': i, 'rn': '30' }) url = 'https://image.baidu.com/search/acjson' #url 链接 urls = [] for i in params: content = requests.get(url, params=i).text img_urls = re.findall(r'"thumbURL":"(.*?)"', content) #正则获取图片链接 urls.append(img_urls) #urls.append(requests.get(url,params = i).json().get('data')) #print("%d times : " % x, img_urls) return urls #将获取的图片保存在本地文件夹中 def fetch_img(path,dataList): if not os.path.exists(path): os.mkdir(path) x = 0 for list in dataList: for i in list: print("=====downloading %d/1200=====" % (x + 1)) ir = requests.get(i) open(path + '%d.jpg' % x, 'wb').write(ir.content) x += 1 if __name__ == '__main__': url = 'https://image.baidu.com/search/acjson' dataList = getIntPages('猪肉', 40) #获取40页猪肉的图片 fetch_img("data/train/meat/", dataList) #### 2-2. 图片预处理 - 图片删选 由于百度图片中爬取的图片五花八门非常复杂,有些图片和关键词差别很大,需要手动对其进行挑选,尽可能使得数据集贴近用户拍照菜品,由于是从demo开始,只使用蔬菜和水果作为数据集,后期可以再添加分类。 - 图片压缩与转换,图片格式化(特征与标签的表示方式) 因使用的是CNN识别cifar10的数据集,需要将图片转化为32*32 size大小的图片,并全部转化为RGB模式的图片: #read file then return a (1,3072) array def read_file(self, filename): img = Image.open(filename) img.convert('RGB') #转化为RGB #print(filename) img = img.resize((32,32)) #压缩为32*32 try: red, green, blue = img.split() red_arr = pltimg.pil_to_array(red) green_arr = pltimg.pil_to_array(green) blue_arr = pltimg.pil_to_array(blue) r_arr = red_arr.reshape(1024) g_arr = green_arr.reshape(1024) b_arr = blue_arr.reshape(1024) result = np.concatenate((r_arr, g_arr, b_arr)) return result except (ValueError): print(filename) #img.show() 压缩与转化完成后,将其与label合并后以字典的形式存储并写入文件: def save_pickle(self, result, label, label_name): print("=====saving picture, please wait=====") dic = {'label': label, 'data':result, 'label_name': label_name} file_path = "data/train_file/" + "data_batch_test" with open(file_path,'wb') as f: p.dump(dic, f) print("=====save mode end=====") 其中label是在读取文件夹的时候以文件夹名称划分: def get_file_name(local_path): label = [] label_name = [] file = [] for i, dirs in enumerate(os.listdir(local_path)): label_name.append(dirs) for f in os.listdir((os.path.join(local_path,dirs))): label.append(i) img_path = os.path.join(os.path.join(local_path,dirs), f) file.append(img_path) return file, label, label_name 至此图片预处理部分完毕 #### 2-3. 训练、测试数据的准备 - 使用上述代码将处理好的2692张蔬菜水果图片制作为训练数据集data_batch_train,将另外299张图片制作为测试数据集data_batch_test - 读取到data_batch文件中的data与label并进行存储返回 #从文件中读取所有数据【label,image_data】 def unpickle(filename): with open(filename, 'rb') as f: dict = p.load(f, encoding='bytes') return dict #分离文件中的label和data def load_data_once(filename): batch = unpickle(filename) data = batch['data'] labels = batch['label'] print("reading data and labels from %s" % filename) return data,labels def load_data(filequeue, data_dir, labels_count): global image_size, image_channels data, labels = load_data_once(data_dir + '/' + filequeue[0]) for f in filequeue[1:]: data_f, label_f = load_data_once(data_dir + '/' + f) data = np.append(data,data_f,axis=0) labels = np.append(labels, label_f,axis = 0) labels = np.array([ [float(i == label) for i in range(labels_count) ] for label in labels]) data = data.reshape([-1,image_channels, image_size, image_size]) data = data.transpose([0,2,3,1]) return data, labels 由于训练数据集不是很多,选择对图像进行截取,移动等方法再进一步处理: #图片截取 def random_crop(batch, crop_shape, padding = None): img_shape = np.shape(batch[0]) if padding: img_shape = (img_shape[0] + 2*padding,img_shape[1], 2*padding) new_batch=[] newPad = ((padding,padding), (padding,padding), (0,0)) for i in range(len(batch)): new_batch.append(batch[i]) if padding: new_batch[i] = np.lib.pad(batch[i], pad_width=newPad, mode='constant', constant_values=0) new_height = random.randint(0, img_shape[0] - crop_shape[0]) new_wight= random.randint(0, img_shape[1] - crop_shape[1]) new_batch[i] = new_batch[i][new_height:new_height + crop_shape[0], new_wight:new_wight + crop_shape[1]] return new_batch #图片数组左右移动 def random_flip_leftRight(batch): for i in range(len(batch)): if bool(random.getrandbits): batch[i] = np.fliplr(batch[i]) return batch #RGB数组预处理 def color_preProcess(x_train, x_test): x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train[:,:,:,0] = (x_train[:,:,:,0] - np.mean(x_train[:,:,:,0])) / np.std(x_train[:,:,:,0]) x_train[:,:,:,1] = (x_train[:,:,:,1] - np.mean(x_train[:,:,:,1])) / np.std(x_train[:,:,:,1]) x_train[:,:,:,2] = (x_train[:,:,:,2] - np.mean(x_train[:,:,:,2])) / np.std(x_train[:,:,:,2]) x_test[:,:,:,0] = (x_test[:,:,:,0] - np.mean(x_test[:,:,:,0])) / np.std(x_test[:,:,:,0]) x_test[:,:,:,1] = (x_test[:,:,:,1] - np.mean(x_test[:,:,:,1])) / np.std(x_test[:,:,:,1]) x_test[:,:,:,2] = (x_test[:,:,:,2] - np.mean(x_test[:,:,:,2])) / np.std(x_test[:,:,:,2]) return x_train, x_test #返回新的图片数组batch def data_augmentation(batch): batch = random_flip_leftRight(batch) batch = random_crop(batch, [32,32], 4) return batch ### 3. 算法的选择与模型训练 #### 3-1. 算法选择 使用广为流传的识别cifar10数据集的CNN神经网络,原因有二,一是CNN 作为比较成熟的算法模型,在cifar10分类中可以获得91%左右的准确率,在mnist数据集中可以获得97%以上的的准确率;二是CNN模型的构建相对容易理解,属于比较基础的模型之一,方便以后的调参优化改进。 #### 3-2. 模型搭建 首先定义一个conv函数创建卷积层: def conv(x, is_train, shape): he_initializer = tf.contrib.keras.initializers.he_normal() W = tf.get_variable('weights', shape=shape, initializer=he_initializer) b = tf.get_variable('bias', shape=[shape[3]], initializer=tf.zeros_initializer) x = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') x = tf.nn.bias_add(x, b) return tf.contrib.layers.batch_norm(x, decay=0.9, center=True, scale=True, epsilon=1e-3, is_training=is_train, updates_collections=None) 模型使用三层卷积,命名为conv1, conv2, conv3, 每层卷积后创建2层池化层, 命名为mlp1-1, mlp1-2,再进行最大池化、dropout防止过拟合后返回一层输出层并作为下一层的输入。最后第三层卷积后的输出作为softmax层的输入,返回图像数据与label数组。 with tf.variable_scope('conv1'): output = conv(x, use_bn, [5, 5, 3, 192]) output = activation(output) with tf.variable_scope('mlp1-1'): output = conv(output, use_bn, [1, 1, 192, 160]) output = activation(output) with tf.variable_scope('mlp1-2'): output = conv(output, use_bn, [1, 1, 160, 96]) output = activation(output) with tf.name_scope('max_pool-1'): output = max_pool(output, 3, 2) with tf.name_scope('dropout-1'): output = tf.nn.dropout(output, keep_prob) with tf.variable_scope('conv2'): output = conv(output, use_bn, [5, 5, 96, 192]) output = activation(output) with tf.variable_scope('mlp2-1'): output = conv(output, use_bn, [1, 1, 192, 192]) output = activation(output) with tf.variable_scope('mlp2-2'): output = conv(output, use_bn, [1, 1, 192, 192]) output = activation(output) with tf.name_scope('max_pool-2'): output = max_pool(output, 3, 2) with tf.name_scope('dropout-2'): output = tf.nn.dropout(output, keep_prob) with tf.variable_scope('conv3'): output = conv(output, use_bn, [3, 3, 192, 192]) output = activation(output) with tf.variable_scope('mlp3-1'): output = conv(output, use_bn, [1, 1, 192, 192]) output = activation(output) with tf.variable_scope('mlp3-2'): output = conv(output, use_bn, [1, 1, 192, 2]) output = activation(output) with tf.name_scope('global_avg_pool'): output = global_avg_pool(output, 8, 1) with tf.name_scope('softmax'): output = tf.reshape(output, [-1, 2]) #由于当前只有两个分类,故softmax输出一个两列数组 模型创建好后将交叉熵,损失精度,训练步数和预测等变量加入tensor: with tf.name_scope('cross_entropy'): cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=output)) with tf.name_scope('l2_loss'): l2 = tf.add_n([tf.nn.l2_loss(var) for var in tf.trainable_variables()]) #使用Momentum 作为迭代优化器 with tf.name_scope('train_step'): train_step = tf.train.MomentumOptimizer(learning_rate, FLAGS.momentum, use_nesterov=True).minimize( cross_entropy + l2 * FLAGS.weight_decay) with tf.name_scope('prediction'): correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 然后定义saver变量作为Saver函数的调用保存model: saver = tf.train.Saver() #### 3-3. 模型的训练与评估 模型搭建完毕后创建一个session对模型进行训练和测试,使用flag变量作为参数,当迭代次数一定后保存模型并输出当前模型的预测精度: sess.run(tf.global_variables_initializer()) #saver.restore(sess, "./check/model.ckpt") summary_writer = tf.summary.FileWriter(FLAGS.log_save_path, sess.graph) for ep in range(1, FLAGS.epochs + 1): lr = learning_rate_schedule(ep) pre_index = 0 train_acc = 0.0 train_loss = 0.0 start_time = time.time() print("\nepoch %d/%d:" % (ep, FLAGS.epochs)) for it in range(1, FLAGS.iteration + 1): if pre_index + FLAGS.batch_size < 50000: #分批次提取训练数据 batch_x = train_x[pre_index:pre_index + FLAGS.batch_size] batch_y = train_y[pre_index:pre_index + FLAGS.batch_size] else: batch_x = train_x[pre_index:] batch_y = train_y[pre_index:] batch_x = data_augmentation(batch_x) _, batch_loss = sess.run([train_step, cross_entropy], feed_dict={x: batch_x, y_: batch_y, use_bn: True, keep_prob: FLAGS.dropout, learning_rate: lr}) batch_acc = accuracy.eval(feed_dict={x: batch_x, y_: batch_y, use_bn: True, keep_prob: 1.0}) #精度预测 train_loss += batch_loss train_acc += batch_acc pre_index += FLAGS.batch_size #测试数据集的评估 if it == FLAGS.iteration: train_loss /= FLAGS.iteration train_acc /= FLAGS.iteration train_summary = tf.Summary(value=[tf.Summary.Value(tag="train_loss", simple_value=train_loss), tf.Summary.Value(tag="train_accuracy", simple_value=train_acc)]) val_acc, val_loss, test_summary = run_testing(sess) summary_writer.add_summary(train_summary, ep) summary_writer.add_summary(test_summary, ep) summary_writer.flush() print( "iteration: %d/%d, cost_time: %ds, train_loss: %.4f, train_acc: %.4f, test_loss: %.4f, test_acc: %.4f" % ( it, FLAGS.iteration, int(time.time() - start_time), train_loss, train_acc, val_loss, val_acc)) #checkpt_path = "./check/model.ckpt" #saver.save(sess, checkpt_path) #print("Model saved in file: %s" % save_path) else: #训练数据集的评估 print("iteration: %d/%d, train_loss: %.4f, train_acc: %.4f" % ( it, FLAGS.iteration, train_loss / it, train_acc / it), end='\r') #checkpt_path = "./check/model.ckpt" #saver.save(sess, checkpt_path) 在经过82次模型的运行,使用0.01的learning rate, 每次进行43次迭代后可以看到该模型预测精度可以达到97%: ### 4. 新样本预测 - 图片转换、预处理 当使用模型预测一个新的样本的时候,同样需要将图片转化为32*32的size并转化为RGB模式的图片以防图片识别失败,之所以要加转为RGB式语句是因为在预处理中遇到过valueError,提示在RGB三通道split的时候返回值不匹配,可能是图片本身并不是RGB模式导致的。 - 特征格式化表示 进行预处理完毕后需要将该图片转为一个1*3072的array,其中第一个1024维度为Red通道,第二个1024维度为Green通道,第三个1024维度为Blue通道 至此图片特征格式化完毕。 ### 5. 后期优化 由于此算法模型只是一个demo,后续还需要一些优化,有以下几个方面: - 对图片的删选,在制作训练数据集的时候还需要进一步对图片进行处理,挑选一些和用户拍照接近的菜品 - 需要更多的菜品分类来更贴近生活如添加肉类分类流程工作图: ### 6. 模型流程图