Convolutional neural network CNN is now widely used in image recognition, and image recognition can achieve very high accuracy. However, the inexplicable of the neural network has not been solved yet, and we cannot learn how the weights and bias of each layer in CNN extract the so-called image features.Therefore, we try to start from the essence of the image, extract the geometric features of the image, to try to reduce the uninterpretability and then identify the image.
Inpired by 'Image super-resolution using gradient profile prior, CVPR 2008'. In a sufficiently small local area, the edge has translational invariance, that is, in the local area, the contour can be approximated fitted by a straight line or curve.
To extract geometric features, we first need to find the orientation of local images. The local template was initially specified to have four directions. The way to define it is: the direction in which the absolute difference of the four directions is smallest. Then, extracting the style function's coefficient,position information, min max information and so on. Finally, the information inside the local window can be represented by a vector.
Actually, there are some mathematical proofs in it, first of all, the rationality of the local parameterization. In this part, we have the edge quadratic function H(x,y) equals to 0. According to the implicit function existence theorem, it is indeed possible to find out that y is a linear function of x, and x is also a linear function of y, indicating that there is a linear implicit function that can fit the edge when the local is small. In addition, the second partial derivative of the edge in the orthogonal direction is bounded, so we use the Teller expansion to estimate the error . In fact, as long as we control the filter size well, the error of our local parameterization can be very small.
Finally, The whole picture is a matrix (with the shape of MxN), and then use CNN to classify.
The model version 1.0 (with four templates)has a slower convergence speed and a lower accuracy rate of 1%-3% comparing to traditional CNN. At this time, a TSNE embedding was done and it was found that the effect was indeed inferior to that of traditional CNN. In version 2.0, more template directions were updated(with 12 templates), and the accuracy was higher than that of CNN and the convergence was faster. In order to test the robustness, gradient attack was added, and it was found that our model was more robust, indicating that the correct image geometric features were indeed extracted. However, it should be noted that our model performs well on MNIST and EMNIST, but poorly on the classification of various color complex images like the CIFAR10 dataset. The reason we conclude is that it is not enough to recognize an image only by edge information, but also by texture information. But there is no reasonable definition of texture image feature, so it has not been extracted in the experiment.
In fact, we've only done one step, which is a simple local parameterization
We would like to express sincere appreciations to Maxwell Liu from ShingingMidas Private Fund, Xingyu Fu from Sun Yat-sen University for their generous guidance throughout the project. Also, we are grateful to Tanli Zuo from Sun Yat-sen University for his assistance all the way. Without their supports, we cannot complish such a challenging task.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。