# zynqnet **Repository Path**: QuizMy/zynqnet ## Basic Information - **Project Name**: zynqnet - **Description**: Master Thesis "ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network" - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-04-23 - **Last Updated**: 2021-10-26 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network This repository contains the results from my Master Thesis. * [Master Thesis Project Report (PDF)](https://github.com/dgschwend/zynqnet/blob/master/zynqnet_report.pdf) * [ZynqNet CNN (prototxt)](https://github.com/dgschwend/zynqnet/tree/master/_TRAINED_MODEL) * [CNN Topology Exploration](https://github.com/dgschwend/zynqnet/tree/master/zynqnet%20cnn) * [ZynqNet FPGA Accelerator (HLS C++)](https://github.com/dgschwend/zynqnet/tree/master/_HLS_CODE) * [ZynqNet Low-Level Firmware](https://github.com/dgschwend/zynqnet/tree/master/_FIRMWARE) * [Netscope CNN Analyzer](http://dgschwend.github.io/netscope/#/preset/zynqnet)

The project has been enabled and supported by [Supercomputing Systems AG](http://www.scs.ch). ## Abstract Image Understanding is becoming a vital feature in ever more applications ranging from medical diagnostics to autonomous vehicles. Many applications demand for embedded solutions that integrate into existing systems with tight real-time and power constraints. Convolutional Neural Networks (CNNs) presently achieve record-breaking accuracies in all image understanding benchmarks, but have a very high computational complexity. Embedded CNNs thus call for small and efficient, yet very powerful computing platforms. This master thesis explores the potential of FPGA-based CNN acceleration and demonstrates a fully functional proof-of-concept CNN implementation on a Zynq System-on-Chip. The _ZynqNet Embedded CNN_ is designed for image classification on ImageNet and consists of _ZynqNet CNN_, an optimized and customized CNN topology, and the _ZynqNet FPGA Accelerator_, an FPGA-based architecture for its evaluation. _ZynqNet CNN_ is a highly efficient CNN topology. Detailed analysis and optimization of prior topologies using the custom-designed _Netscope CNN Analyzer_ have enabled a CNN with 84.5% top-5 accuracy at a computational complexity of only 530 million multiplyaccumulate operations. The topology is highly regular and consists exclusively of convolutional layers, ReLU nonlinearities and one global pooling layer. The CNN fits ideally onto the FPGA accelerator. The _ZynqNet FPGA Accelerator_ allows an efficient evaluation of ZynqNet CNN. It accelerates the full network based on a nested-loop algorithm which minimizes the number of arithmetic operations and memory accesses. The FPGA accelerator has been synthesized using High- Level Synthesis for the Xilinx Zynq XC-7Z045, and reaches a clock frequency of 200MHz with a device utilization of 80% to 90 %. ## Contribution Initially, this master aimed to explore, benchmark and optimize one or more commercial approaches to the acceleration of convolutional neural networks on FPGAs, with a focus on embedded systems. Multiple FPGA and intellectual property vendors have announced frameworks and libraries that target the acceleration of deep learning systems.However, none of these solutions turned out to be ready and available for testing. Nevertheless, we decided to further pursue this promising approach by building our own proof-of-concept FPGA-based CNN implementation from scratch, with a special focus on the optimized co-operation between the underlying hardware architecture and the convolutional neural network. The result is the ZynqNet Embedded CNN, an FPGA-based convolutional neural network for image classification. The solution consists of two main components: 1. The _ZynqNet CNN_, a customized convolutional neural network topology, specifically shaped to fit ideally onto the FPGA. The CNN is exceptionally regular, and reaches a satisfying classification accuracy with minimal computational effort. 2. The _ZynqNet FPGA Accelerator_, a specialized FPGA architecture for the efficient acceleration of ZynqNet CNN and similar convolutional neural networks. ZynqNet CNN is trained offline on GPUs using the Caffe framework, while the ZynqNet FPGA Accelerator employs the CNN for image classification, or _inference_, on a Xilinx Zynq XC- 7Z045 System-on-Chip (SoC). Both components have been developed and optimized within the six month time frame of this master thesis, and together constitute a fully functional convolutional neural network implementation on the small and low-power Zynq platform. This report documents the ZynqNet CNN and the ZynqNet FPGA Accelerator and gives insight into their development. In addition, the _Netscope CNN Analyzer_ is introduced, a custom tool for visualizing, analyzing and editing convolutional neural network topologies. Netscope has been used to analyze a number of different CNN architectures, and the findings are presented in the form of a _Design Space Exploration_ (DSE) of CNN topologies from prior work. Finally, the performance of the ZynqNet Embedded CNN is evaluated and its performance is compared to other platforms. ## Report The report includes - overview + detailed analysis of many popular CNN architectures for image classification (AlexNet, VGG, NiN, GoogLeNet, Inception v.X, ResNet, SqueezeNet) - detailed description of the [*Netscope CNN Analyzer* tool]([https://github.com/dgschwend/netscope) - overview of *CNN analysis and optimization techniques* - detailed report on the design and implementation of the FPGA-based accelerator The final report can be found in [zynqnet_report.pdf](https://github.com/dgschwend/zynqnet/tree/master/zynqnet_report.pdf). ## ZynqNet CNN The fully trained CNN with .prototxt network description and pretrained weights can be found under [_TRAINED_MODEL](https://github.com/dgschwend/zynqnet/tree/master/_TRAINED_MODEL) ## ZynqNet FPGA Accelerator The C/C++ source code for building the FPGA accelerator using High-Level Synthesis (Vivado HLS) can be found under [_HLS_CODE](https://github.com/dgschwend/zynqnet/tree/master/_HLS_CODE). The compiled accelerator bitstream can be found under [_BITSTREAM](https://github.com/dgschwend/zynqnet/tree/master/_BITSTREAM). The firmware for the Zynq XC-7Z045 ARM processors is stored under [_FIRMWARE](https://github.com/dgschwend/zynqnet/tree/master/_FIRMWARE). ## Netscope CNN Analyzer The CNN analysis tool can be found in a separate repository here: [dgschwend/netscope](https://github.com/dgschwend/netscope) ## Copyright and License ZynqNet is Copyright 2016 by David Gschwend. All files in this repository are released under the GNU General Public License as found in the LICENSE file.