Volume 30 Issue 1
Jan.  2021
Turn off MathJax
Article Contents
PENG Xiyuan, YU Jinxiang, YAO Bowen, LIU Liansheng, PENG Yu. A Review of FPGA-Based Custom Computing Architecture for Convolutional Neural Network Inference[J]. Chinese Journal of Electronics, 2021, 30(1): 1-17. doi: 10.1049/cje.2020.11.002
 Citation: PENG Xiyuan, YU Jinxiang, YAO Bowen, LIU Liansheng, PENG Yu. A Review of FPGA-Based Custom Computing Architecture for Convolutional Neural Network Inference[J]. Chinese Journal of Electronics, 2021, 30(1): 1-17.

# A Review of FPGA-Based Custom Computing Architecture for Convolutional Neural Network Inference

##### doi: 10.1049/cje.2020.11.002
Funds:

the National Natural Science Foundation of China 61803121

the Postdoctoral Science Foundation of China 2019M651277

• Author Bio:

PENG Xiyuan   received the B.S., M.S., and Ph.D. degrees from Harbin Institute of Technology, China, in 1984, 1987, and 1992, respectively. He is currently a Full Professor with the School of Electronics and Information Engineering, Harbin Institute of Technology. His current research interests include automatic test, and high performance computing. (Email: pxy@hit.edu.cn)

YU Jinxiang   received the B.S. degree in measurement technology and instrumentation from Harbin Institute of Technology at Weihai, China, in 2015. He is currently pursuing the Ph.D. degree in the School of Electronics and Information Engineering, Harbin Institute of Technology, China. His current research interests include computer vision and high performance computing. (Email: yujinxiang@hit.edu.cn)

YAO Bowen   received the B.S. degree in microelectronics and M.S. degree in electronics science and technology from Harbin University of Science and Technology, China, in 2015 and 2018. He is currently pursuing the Ph.D. degree in the School of Electronics and Information Engineering, Harbin Institute of Technology, China. His current research interests include domain-specific computing, customizable computing, reconfigurable computing, and co-designing efficient algorithms and hardware systems for machine learning. (Email: bwyao@hit.edu.cn)

LIU Liansheng   received the B.S. degree in measurement technology and instrumentation from Harbin Institute of Technology, China, in 2006, M.S. and Ph.D. degrees in instrumentation science and technology both from Harbin Institute of Technology in 2008 and 2017, respectively. From Nov. 2012 to Nov. 2014, he studied at McGill University as a visiting Ph.D. student supported by China Scholarship Council. He is currently an associate professor at Harbin Institute of Technology. His research interests include prognostics and health management, anomaly sensor data detection, and cyber physical systems. (Email: lianshengliu@hit.edu.cn)

• Corresponding author: PENG Yu  (corresponding author) received the B.S. degree in measurement technology and instrumentation and the M.S. and Ph.D. degrees in instrumentation science and technology from the Harbin Institute of Technology, China, in 1996, 1998, and 2004, respectively. He is currently a Full Professor with the School of Electronics and Information Engineering, Harbin Institute of Technology. His current research interests include automatic test technologies, virtual instruments, system health management, and reconfigurable computing. (Email: pengyu@hit.edu.cn)
• Accepted Date: 2020-09-26
• Publish Date: 2021-01-01
• Convolutional neural network (CNN) has been widely adopted in many tasks. Its inference process is usually applied on edge devices where the computing resources and power consumption are limited. At present, the performance of general processors cannot meet the requirement for CNN models with high computation complexity and large number of parameters. Field-programmable gate array (FPGA)-based custom computing architecture is a promising solution to further enhance the CNN inference performance. The software/hardware co-design can effectively reduce the computing overhead, and improve the inference performance while ensuring accuracy. In this paper, the mainstream methods of CNN structure design, hardware-oriented model compression and FPGA-based custom architecture design are summarized, and the improvement of CNN inference performance is demonstrated through an example. Challenges and possible research directions in the future are concluded to foster research efforts in this domain.
•  [1] Y. LeCun, L. Bottou, Y. Bengio, et al., "Gradient-based learning applied to document recognition", Proceedings of the IEEE, Vol. 86, No. 11, pp. 2278–2324, 1998. doi: 10.1109/5.726791 [2] S. Guo, B. Zhang, T. Yang, et al., "Multi-task convolutional neural network and information fusion for fault diagnosis and localization", IEEE Transactions on Industrial Electronics, Vol. 67, No. 9, pp. 8005–8015, 2020. [3] A. Krizhevsky, I. Sutskever and G.E. Hinton, "Imagenet classification with deep convolutional neural networks", International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, pp. 1097–1105, 2012. [4] Z. Li, W. Yang, S. Peng, et al., "A survey of convolutional neural networks: analysis, applications, and prospects", arXiv preprint, arXiv: 2004.02806, 2020. [5] J.T. Huang, J. Li and Y. Gong, "An analysis of convolutional neural networks for speech recognition", IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, Queensland, Australia, pp. 4989–4993, 2015. [6] D. Hernandez and T.B. Brown, "Measuring the algorithmic efficiency of neural networks", arXiv preprint, arXiv: 2005.04305, 2020. [7] J. Zhou, H. Dai and H. Wang, "Lightweight convolution neural networks for mobile edge computing in transportation cyber physical systems", ACM Transactions on Intelligent Systems and Technology, Vol. 10, No. 6, pp. 1–20, 2019. doi: 10.1145/3339308 [8] W.J. Dally, Y. Turakhia and S. Han, "Domain-specific hardware accelerators", Communications of the ACM, Vol. 63, No. 7, pp. 48–57, 2020. doi: 10.1145/3361682 [9] N.P. Jouppi, C. Young, N. Patil, et al., "In-datacenter performance analysis of a tensor processing unit", Annual International Symposium on Computer Architecture, Toronto, ON, Canada, pp. 1–12, 2017. [10] N.P. Jouppi, C. Young, N. Patil, et al., "A domain-specific architecture for deep neural networks", Communications of the ACM, Vol. 61, No. 9, pp. 50–59, 2018. doi: 10.1145/3154484 [11] Google. ai, "Cloud TPUs: Google's second-generation tensor processing unit is coming to cloud", http://g.co/tpu, 2020-9-23. [12] X. Niu, W. Luk and Y. Wang, "EURECA: On-chip configuration generation for effective dynamic data access", ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, pp. 74–83, 2015. [13] T. Chen, Z. Du, N. Sun, et al., "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning", ACM SIGARCH Computer Architecture News, Vol. 41, No. 1, pp. 269–284, 2014. http://dl.acm.org/citation.cfm?id=2644865.2541967 [14] Y. Chen, T. Luo, S. Liu, et al., "Dadiannao: A machine-learning supercomputer", Annual IEEE/ACM International Symposium on Microarchitecture, Cambridge, CAM, UK, pp. 609–622, 2014. [15] D. Liu, T. Chen, S. Liu, et al., "Pudiannao: A polyvalent machine learning accelerator", ACM SIGARCH Computer Architecture News, Vol. 43, No. 1, pp. 369–381, 2015. [16] Z. Du, R. Fasthuber, T. Chen, et al., "ShiDianNao: Shifting vision processing closer to the sensor", Annual International Symposium on Computer Architecture, Portland, OR, USA, pp. 92–104, 2015. [17] S. Venkataramani, A. Ranjan, S. Banerjee, et al., "Scaledeep: A scalable compute architecture for learning and evaluating deep networks", Annual International Symposium on Computer Architecture, Toronto, ON, Canada, pp. 13–26, 2017. [18] Y.H. Chen, T. Krishna, J.S. Emer, et al., "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks", IEEE Journal of Solid-state Circuits, Vol. 52, No. 1, pp. 127–138, 2016. http://ieeexplore.ieee.org/document/7418007 [19] Huawei, "Huawei Atlas 910 AI processor: High-performance AI processor for training", https://e.huawei.com/ascend-910, 2020-9-23. [20] S. Wei and Y. Liu, "The principle and progress of dynamically reconfigurable computing technologies", Chinese Journal of Electronics, Vol. 29, No. 4, pp. 595–607, 2020. [21] S.M. Trimberger, Field-Programmable Gate Array Technology, Springer Science+Business Media, New York, USA, pp. 34–35, 2012. [22] T.J. Todman, G.A. Constantinides, S.J.E. Wilton, et al., "Reconfigurable computing: Architectures and design methods", IEE Proceedings-Computers and Digital Techniques, Vol. 152, No. 2, pp. 193–207, 2005. [23] V. Kathail, "Xilinx vitis unified software platform", ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, CA, USA, pp. 173–174, 2020. [24] V. Kustikova, E. Vasiliev, A. Khvatov, et al., "Intel distribution of OpenVINO toolkit: A case study of semantic segmentation", Int. Conf. on Analysis of Images, Social Networks and Texts, Kazan, Russia, pp. 11–23, 2019. [25] J.E. Stone, D. Gohara and G. Shi, "OpenCL: A parallel programming standard for heterogeneous computing systems", Computing in Science and Engineering, Vol. 12, No. 3, pp. 66–73, 2010. [26] K. Guo, S. Zeng, J. Yu, et al., "A survey of FPGA-based neural network inference accelerators", ACM Transactions on Reconfigurable Technology and Systems, Vol. 12, No. 1, pp. 1–26, 2019. doi: 10.1145/3289185 [27] E. Wang, J.J. Davis, R. Zhao, et al., "Deep neural network approximation for custom hardware: Where we've been, where we're going", ACM Computing Surveys, Vol. 52, No. 2, pp. 1–39, 2019. http://arxiv.org/abs/1901.06955 [28] A. Khan, A. Sohail, U. Zahoora, et al., "A survey of the recent architectures of deep convolutional neural networks", Artificail Intelligencce Review, Vol. 53, No. 8, pp. 5455-5516, 2020. [29] S. Mittal, "A survey of FPGA-based accelerators for convolutional neural networks", Neural Computing and Applications, Vol. 32, No. 4, pp. 1109–1139, 2020. [30] K. Abdelouahab, M. Pelcat, J. Serot, et al., "Accelerating CNN inference on FPGAs: A Survey", arXiv preprint, arXiv: 1806.01683, 2018. [31] M.P. Véstias, "A survey of convolutional neural networks on edge with reconfigurable computing", Algorithms, Vol. 12, No. 8, Page 154, 2019. [32] T. Wang, C. Wang, X. Zhou, et al., "A survey of FPGA based deep learning accelerators: Challenges and opportunities", arXiv preprint, arXiv: 1901.04988, 2018. [33] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", arXiv preprint, arXiv: 1409.1556, 2014. [34] K. He, X. Zhang, S. Ren, et al., "Deep residual learning for image recognition", IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 770–778, 2016. [35] F. Iandola, M. Moskewicz, S. Karayev, et al., "Densenet: Implementing efficient convnet descriptor pyramids", arXiv preprint, arXiv: 1404.1869, 2014. [36] C. Szegedy, W. Liu, Y. Jia, et al., "Going deeper with convolutions", IEEE Int. Conf. on Computer Vision and Pattern Recognition, Boston, MA, USA, pp. 1–9, 2015. [37] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift", International Conference on International Conference on Machine Learning, San Diego, CA, USA, pp. 1–9, 2015. [38] C. Szegedy, V. Vanhoucke, S. Ioffe, et al., "Rethinking the Inception architecture for computer vision", IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 2818–2826, 2016. [39] F.N. Iandola, S. Han, M.W. Moskewicz, et al., "SqueezeNet: AlexNet-level accuracy with 50$\times$ fewer parameters and < 0.5MB model size", arXiv preprint, arXiv: 1602.07360, 2016. [40] A. Gholami, K. Kwon, B. Wu, et al., "SqueezeNext: A hardware-aware neural network design", IEEE International Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 1638–1647, 2018. [41] A.G. Howard, M. Zhu, B. Chen, et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications", IEEE International Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 432–445, 2017. [42] M. Sandler, A. Howard, M. Zhu, et al., "MobileNetV2: Inverted residuals and linear bottlenecks", IEEE International Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 4510–4520, 2018. [43] F. Chollet, "Xception: Deep learning with depth-wise separable convolutions", IEEE International Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 1251–1258, 2017. [44] X. Zhang, X. Zhou, M. Lin, et al., "Shufflenet: An extremely efficient convolutional neural network for mobile devices", International Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 6848–6856, 2018. [45] N. Ma, X. Zhang, H.T. Zheng, et al., "ShuffleNetV2: Practical guidelines for efficient CNN architecture design", European Conference on Computer Vision, Munich, BY, Germany, pp. 116–131, 2018. [46] J. Cong and B. Xiao, "Minimizing computation in convolutional neural networks", International Conference on Articial Neural Networks, Hamburg, Germany, pp. 281–290, 2014. [47] C. Peng, X. Zhang, G. Yu, et al., "Large kernel matters–improve semantic segmentation by global convolutional network", IEEE International Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 1743–1751, 2017. [48] A. Lavin and S. Gray, "Fast algorithms for convolutional neural networks", arXiv preprint, arXiv: 1509.09308, 2015. [49] H.K. Jong, B.A. Mudassar, T. Na, et al., "Design of an energy efficient accelerator for training of convolutional neural networks using frequency-domain computation", Annual Conference on Design Automation, Austin, TX, USA, pp. 87–96, 2017. [50] B. Zoph and Q.V. Le, "Neural architecture search with reinforcement learning", arXiv preprint, arXiv: 1611.01578, 2017. [51] H. Pham, M.Y. Guan, B. Zoph, et al., "Efficient neural architecture search via parameter sharing", arXiv preprint, arXiv: 1802.03268, 2018. [52] M. Tan, B. Chen, R. Pang, et al., "MnasNet: Platform-aware neural architecture search for mobile", IEEE International Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 2815–2823, 2019. [53] D. Misha, S. Babak, D. Laurent, et al., "Predicting parameters in deep learning", International Conference on Neural Information Processing Systems, Stateline, NV, USA, pp. 2148-2156, 2013. [54] J. Teich, "Hardware/software codesign: The past, the present, and predicting the future", Proceedings of the IEEE, Vol. 100, Special Centennial Issue, pp. 1411–1430, 2012. [55] M. Courbariaux, Y. Bengio and J.P. David, "Training deep neural networks with low precision multiplications", arXiv preprint, arXiv: 1412.7024, 2014. [56] B. Jacob, S. Kligys, B. Chen, et al., "Quantization and training of neural networks for efficient integer-arithmetic-only inference", IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 2704–2713, 2018. [57] C. Zhang, G. Sun, Z. Fang, et al., "Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 38, No. 11, pp. 2072–2085, 2018. http://dl.acm.org/citation.cfm?id=2967011 [58] J. Qiu, J. Wang, S. Yao, et al., "Going deeper with embedded FPGA platform for convolutional neural network", ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, pp. 26–35, 2016. [59] L. Lai, N. Suda and V. Chandra, "Deep convolutional neural network inference with floatingpoint weights and fixed-point activations", arXiv preprint, arXiv: 1703.03073, 2018. [60] E. H. Lee, D. Miyashita, E. Chai, et al., "LogNet: Energy-efficient neural networks using logarithmic computation", IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, LA, USA, pp. 5900–5904, 2017. [61] A. Zhou, A. Yao, Y. Guo, et al., "Incremental network quantization: Towards lossless CNNs with low-precision weights", arXiv preprint, arXiv: 1702.03044, 2017. [62] J. Wang, Q. Lou, X. Zhang, et al., "Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA", International Conference on Field Programmable Logic and Applications, Dublin, Ireland, pp. 163–1636, 2018. [63] M. Courbariaux, Y. Bengio and J.P. David, "BinaryConnect: Training deep neural networks with binary weights during propagations", International Conference on Neural Information Processing Systems, Montreal, QC, Canada, pp. 3123–3131, 2015. [64] I. Hubara, M. Courbariaux, D. Soudry, et al., "Binarized neural networks", Int. Conf. on Neural Information Processing Systems, Barcelona, Spain, pp. 4107–4115, 2016. [65] Z. Lin, M. Courbariaux, R. Memisevic, et al., "Neural networks with few multiplications", arXiv preprint, arXiv: 1510.03009, 2015. [66] S. Han, H. Mao and W.J. Dally, "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding", arXiv preprint, arXiv: 1510.00149, 2015. [67] Y. Guo, A. Yao and Y. Chen, "Dynamic network surgery for efficient DNNs", International Conference on Neural Information Processing Systems, Barcelona, Spain, pp. 1379–1387, 2016. [68] N. Lee, T. Ajanthan and P. Torr, "Snip: Single-shot network pruning based on connection sensitivity", arXiv preprint, arXiv: 1810.02340, 2018. [69] S. Srinivas and R.V. Babu, "Data-free parameter pruning for deep neural networks", arXiv preprint, arXiv: 1507.06149, 2015. [70] W. Deng, W. Yin and Y. Zhang, "Group sparse optimization by alternating direction method", Proceedings of the International Society for Optical Engineering, Vol. 8858, DOI: 10.1117/12.2024410,2013. [71] B.Y. Liu, M. Wang, H. Foroosh, et al., "Sparse convolutional neural networks", IEEE International Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, pp. 806–814, 2015. [72] C. Tai, T. Xiao, Y. Zhang, et al., "Convolutional neural networks with low-rank regularization", arXiv preprint, arXiv: 1511.06067, 2015. [73] M. Masana, J.V.D. Weijer, L. Herranz, et al., "Domain-adaptive deep network compression", IEEE International Conference on Computer Vision, Venice, Italy, pp. 4289–4297, 2017. [74] J. Xue, J. Li, D. Yu, et al., "Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network", IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, pp. 6359–6363, 2014. [75] X. Zhang, J. Zou, X. Ming, et al., "Efficient and accurate approximations of nonlinear convolutional networks", IEEE International Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, pp. 1984–1992, 2015 [76] V. Lebedev and V. Lempitsky, "Fast convnets using group-wise brain damage", IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 2554–2564, 2016. [77] M. Janzamin, H. Sedghi and A. Anandkumar, "Beating the perils of non-convexity: Guaranteed training of neural networks using tensor", arXiv preprint, arXiv: 1506.08473, 2015. [78] Y.D. Kim, E. Park, S. Yoo, et al., "Compression of deep convolutional neural networks for fast and low power mobile applications", arXiv preprint, arXiv: 1511.06530, 2015. [79] G. Hinton, O. Vinyals and J. Dean, "Distilling the knowledge in a neural network", arXiv preprint, arXiv: 1503.02531, 2015. [80] A. Romero, N. Ballas, S.E. Kahou, et al., "FITNets: Hints for thin deep nets", arXiv preprint, arXiv: 1412.6550, 2014. [81] P. Luo, Z.Y. Zhu, Z.W. Liu, et al., "Face model compression by distilling knowledge from neurons", AAAI Conference on Artiflcial Intelligence, Phoenix, AZ, USA, pp. 3560–3566, 2016. [82] S. Zagoruyko and N. Komodakis, "Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer", arXiv preprint, arXiv: 1612.03928, 2016. [83] L. Theis, I. Korshunova, A. Tejani, et al., "Faster gaze prediction with dense networks and Fisher pruning", arXiv preprint, arXiv: 1801.05787, 2018. [84] J. Yim, D. Joo, J. Bae, et al., "A gift from knowledge distillation: Fast optimization, network minimization and transfer learning", IEEE International Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 4133–4141, 2017. [85] Y. He, J. Lin, Z. Liu, et al., "AMC: Automl for model compression and acceleration on mobile devices", European Conference on Computer Vision, Munich, Germany, pp. 784–800, 2018. [86] N. Liu, X.L. Ma, Z.Y. Xu, et al., "AutoCompress: An automatic DNN structured pruning framework for Ultra-High compression rates", arXiv preprint, arXiv: 1907.03141, 2019. [87] J.X. Wu, Y. Zhang, J.L. Hou, et al., "Pocketflow: An automated framework for compressing and accelerating deep neural networks", the 32nd International Conference on Neural Information Processing Systems (NIPS 2018), Workshop on Compact Deep Neural Networks with Industrial Applications, Montreal, QC, Canada, https://openreview.net/forum?id=H1fWoYhdim, 2020-2-26. [88] G. Ofenbeck, R. Steinmann and V. Caparros, "Applying the roofline model", IEEE International Symposium on Performance Analysis of Systems and Software, Monterey, CA, USA, pp. 76–85, 2014. [89] E. Wu, X.Q. Zhang, D. Berman, et al., "A high-throughput reconfigurable processing array for neural networks", International Conference on Field Programmable Logic and Applications, Ghent, Belgium, pp. 1–4, 2017. [90] J. Zhang, W. Zhang, G. Luo, et al., "Frequency improvement of systolic array-based CNNs on FPGAs", IEEE International Symposium on Circuits and Systems, Sapporo, Hokkaido, Japan, pp. 1–4, 2019. [91] C. Zhang, P. Li, G. Sun, et al., "Optimizing FPGA-based accelerator design for deep convolutional neural networks", ACM/SIGDA International Symposium on Field-programmable Gate Arrays, Monterey, CA, USA, pp. 161–170, 2015. [92] Y. Ma, Y. Cao, S. Vrudhula, et al., "Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks", ACM/SIGDA International Symposium on Field–Programmable Gate Arrays, Monterey, CA, USA, pp. 45–54, 2017. [93] Q.C. Xiao, Y. Liang, L.Q. Lu, et al., "Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs", Annual Design Automation Conference, Austin, TX, USA, pp. 1–6, 2017. [94] K.Y. Guo, L. Z. Sui, J.T. Qiu, et al., "Angel–Eye: A complete design flow for mapping CNN onto embedded FPGA", IEEE Transactions on Computer–Aided Design of Integrated Circuits and Systems, Vol. 37, No. 1, pp. 35–47, 2017. http://ieeexplore.ieee.org/document/7930521/ [95] Y. Shen, M. Ferdman and P. Milder, "Maximizing CNN accelerator efficiency through resource partitioning", Annual International Symposium on Computer Architecture, Toronto, Canada, pp. 535–547, 2017. [96] X. Wei, Y. Liang, X. Li, et al., "TGPA: Tile-grained pipeline architecture for low latency CNN inference", International Conference on Computer-Aided Design, San Diego, CA, USA, pp. 1–8, 2018. [97] X. Wei, C.H. Yu, P. Zhang, et al., "Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs", Annual Design Automation Conference 2017, Toronto, Canada, pp. 1–6, 2017. [98] Y. Li, S. Lu, J. Luo, et al., "High-performance convolutional neural network accelerator based on systolic srrays and quantization", International Conference on Signal and Image Processing, Wuxi, China, pp. 335–339, 2019. [99] M. Alwani, H. Chen, M. Ferdman, et al., "Fused-layer CNN accelerators", Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, China, pp. 1–12, 2016. [100] A. Erdem, D. Babic and C. Silvano, "A Tile-based fused-layer approach to accelerate DCNNs on low-density FPGAs", IEEE International Conference on Electronics, Circuits and Systems, Genoa, Italy, pp. 37–40, 2019. [101] M. Samragh, M. Ghasemzadeh and F. Koushanfar, "Customizing neural networks for efficient FPGA implementation", International Symposium on Field-programmable Custom Computing Machines, Napa, CA, USA, pp. 85–92, 2017. [102] J. Gustafson and I. Yonemoto, "Beating ﬂoating point at its own game: Posit arithmetic", Supercomput Frontiers and Innovations, Vol. 4, No. 2, pp. 71–86, 2017. http://www.researchgate.net/publication/322151112_Beating_floating_point_at_its_own_game_Posit_arithmetic [103] S.H.F. Langroudi, T. Pandit and D. Kudithipudi, "Deep learning inference on embedded devices: ﬁxed–point vs posit", arXiv preprint, arXiv: 1805.08624, 2018. [104] J. Johnson, "Rethinking ﬂoating point for deep learning", arXiv preprint, arXiv: 1811.01721, 2018.

### Catalog

###### 通讯作者: 陈斌, bchen63@163.com
• 1.

沈阳化工大学材料科学与工程学院 沈阳 110142

Figures(7)  / Tables(3)