PANG Yeyong, WANG Shaojun, PENG Yu, PENG Xiyuan. Fully Pipelined Soft Vector Processor as a CPU Accelerator[J]. Chinese Journal of Electronics, 2017, 26(6): 1198-1205. doi: 10.1049/cje.2017.09.014
Citation: PANG Yeyong, WANG Shaojun, PENG Yu, PENG Xiyuan. Fully Pipelined Soft Vector Processor as a CPU Accelerator[J]. Chinese Journal of Electronics, 2017, 26(6): 1198-1205. doi: 10.1049/cje.2017.09.014

Fully Pipelined Soft Vector Processor as a CPU Accelerator

doi: 10.1049/cje.2017.09.014
Funds:  This work is supported partly by the National Natural Science Foundation of China (No.61301205, No.61571160), the Fundamental Research Funds for the Central Universities (No.HIT.NSRIF.201615), the New Direction of Subject Development in Harbin Institute of Technology (No.01509421), and the Twelfth Government Advanced Research Fund (No.9140A17050114HT01054).
  • Received Date: 2015-09-23
  • Rev Recd Date: 2016-03-07
  • Publish Date: 2017-11-10
  • FPGA based soft vector processing accelerators are used frequently to perform highly parallel data processing tasks. Since they are not able to implement complex control manipulations using software, most FPGA systems now incorporate either a soft processor or hard processor. A FPGA based AXI bus compatible vector accelerator architecture is proposed which utilises fully pipelined and heterogeneous ALU for performance, and microcoding is employed for reusability. The design is tested with several design examples in four different lane configurations. Compared with Central processing unit (CPU), Digital signal processor (DSP), Altera C2H tool and OpenCL SDK implementations, the vector processor improves on execution time and energy consumption by factors of up to 6.6 and 6.4 respectively.
  • loading
  • A. Putnam, A.M. Caulfield, E.S. Chung, et al., "A reconfigurable fabric for accelerating large-Scale datacenter services", Proc. of the 2014 ACM/IEEE 41st International Symposium on Computer Architecture, Minneapolis, MN, USA, pp.13-24, 2014.
    C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao and J. Cong, "Optimizing FPGA-based accelerator design for deep convolutional neural networks", Proc. of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, pp.161-170, 2015.
    Y. Wang, Y. Ma, D. Liu and S. Wu, "An SEU-tolerant approach for space-borne viterbi decoders", Chinese Journal of Electronics, Vol.23, No.4, pp.857-861, 2014.
    J. Yu, C. Eagleston, C. Han-Yu Chou, M. Perreault and G. Lemieux, "Vector processing as a soft processor accelerator", ACM Transactions on Reconfigurable Technology and Systems, Vol.2, No.2, pp.1-34, 2009.
    A. Severance, J. Edwards, H. Omidian and G. Lemieux, "Soft vector processors with streaming pipelines", Proc. of the 2014 ACM/SIGDA International Symposium on FieldProgrammable Gate Arrays, Monterey, California, USA, pp.117-126, 2014.
    Y. Pang, S. Wang, Y. Peng, N.J. Fraser and P.H.W. Leong, "A low latency kernel recursive least squares processor using FPGA technology", Proc. of 2013 IEEE International Conference on Field-Programmable Technology, Kyoto, Japan, pp.144-151, 2013.
    J.L. Hennessy and D.A. Patterson, Computer Architecture:A Quantitative Approach (Fourth Edition), Elsevier, New York, USA, pp.118-199, 2007.
    C. Han-Yu Chou, "VIPERS Ⅱ:A soft-core vector processor with single-copy data scratchpad memory", Master of Applied Science Thesis, The University of British Columbia, Vancouver, Canada, 2010.
    C. Kozyrakis and D. Patterson, "Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks", Proc. of IEEE/ACM International Symposium on Microarchitecture, Istanbul, Turkey, pp.283-293, 2002.
    C. Kozyrakis and D. Patterson, "Overcoming the limitations of conventional vector processors", Proc. of the 2003 ACM/IEEE 30th International Symposium on Computer Architecture, San Diego, USA, pp.399-409, 2003.
    R. Krashinsky, C. Batten, M. Hampton, S. Gerding, B. Pharris, J. Casper and K. Asanovic, "The vector-thread architecture", IEEE Micro, Vol.24, No.6, pp.84-90, 2004.
    J. Yu, C. Eagleston and G. Lemieux, "Vector processing as a soft-core CPU accelerator", Proc. of the 2008 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, pp.222-232, 2008.
    P. Yiannacouras, J.G. Steffan and J. Rose, "Portable, flexible, and scalable soft vector processors", IEEE Transactions on Very Large Scale Integration Systems, Vol.20, No.8, pp.1429-1442, 2011.
    A. Severance and G. Lemieux, "Embedded supercomputing in FPGAs with the VectorBlox MXP matrix processor", Proc. of the 2013 IEEE International Conference on Hardware/Software Codesign and System Synthesis, Montreal, QC, Canada, pp.1-10, 2013.
    J. Kathiara and M. Leeser, "An autonomous vector/scalar floating point coprocessor for FPGAs", Proc. of IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, Salt Lake City, UT, USA, pp.33-36, 2011.
    C. Han-Yu Chou, A. Severance, A.D. Brant, Z. Liu, S. Sant and G. Lemieux. "VEGAS:Soft vector processor with scratchpad memory", Proc. of the 2011 ACM/SIGDA International Sym-posium on Field-programmable Gate Arrays, Monterey, California, USA, pp.15-24, 2011.
    A. Severance and G. Lemieux,"VENICE:A compact vector processor for FPGA applications", Proc. of 2012 IEEE International Conference on Field-Programmable Technology, Seoul, South Korea, pp.261-268, 2012.
    Altera, "Altera Floating-Point IP Cores User Guide", http://www.altera.com/documentation/eis1410764818924.html, 2016.12.09.
    Altera, "Altera DSP Builder Advanced Blockset Handbook", http://www.altera.com/documentation/hco1423077212985.html, 2017.05.02.
    S. Van Vaerenbergh and I. Santamaria, "A comparative study of kernel adaptive filtering algorithms", Proc. of 2013 IEEE Meeting on Digital Signal Processing and Signal Processing Education, Napa, California, USA, pp.181-186, 2013.
    Altera, "Altera Stratix V Device Handbook", avaiable at http://www.altera.com/documentation/sam1403479391092.html, 2016.12.09.
    Y. Engel, S. Mannor and R. Meir, "The kernel recursive least squares algorithm", IEEE Transactions on Signal Processing, Vol.52, No.8, pp.2275-2285, 2004.
    S. Van Vaerenbergh, J. Via and I. Santamaria, "A slidingwindow kernel RLS algorithm and its application to nonlinear channel identification", Proc. of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France, pp.5-6, 2006.
    S. Van Vaerenbergh, I. Santamaria, W. Liu and J.C. Principe, "Fixed-budget kernel recursive least-squares", Proc. of IEEE 2010 International Conference on Acoustics Speech and Signal Processing, Dallas, TX, USA, pp.1882-1885, 2010.
    W. Liu, J.C. Principe and S. Haykin, Kernel Adaptive Filtering:A Comprehensive Introduction, Wiley & Sons, Hoboken, New Jersey, USA, pp.27-41, 2010.
    R. Clint Whaley and A. Petitet, "Minimizing development and maintenance costs in supporting persistently optimized BLAS", Software:Practice and Experience, Vol.35, No.2, pp.101-121, 2005.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (173) PDF downloads(217) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return