Volume 32 Issue 5
Sep.  2023
Turn off MathJax
Article Contents
LIU Sheng, YUAN Bo, GUO Yang, et al., “Vector Memory-Access Shuffle Fused Instructions for FFT-Like Algorithms,” Chinese Journal of Electronics, vol. 32, no. 5, pp. 1077-1088, 2023, doi: 10.23919/cje.2021.00.401
Citation: LIU Sheng, YUAN Bo, GUO Yang, et al., “Vector Memory-Access Shuffle Fused Instructions for FFT-Like Algorithms,” Chinese Journal of Electronics, vol. 32, no. 5, pp. 1077-1088, 2023, doi: 10.23919/cje.2021.00.401

Vector Memory-Access Shuffle Fused Instructions for FFT-Like Algorithms

doi: 10.23919/cje.2021.00.401
Funds:  This work was supported by the National Natural Science Foundation of China (61602493, 61502508).
More Information
  • Author Bio:

    Sheng LIU was born in 1984. He is a Ph.D. and Assistant Research Fellow in National University of Defense Technology. His main research interests include microprocessor architecture and VLSI design. (Email: liusheng83@nudt.edu.cn)

    Bo YUAN was born in 1996. He is a Ph.D. candidate of electronic science and technology. His main research interests include microprocessor architecture and VLSI design. (Email: yuanbo18@nudt.edu.cn)

    Yang GUO (corresponding author) was born in 1971. He is a Ph.D. and Research Fellow. His main research interests include low power VLSI circuit, microprocessor design & verification, and electronic design automation (EDA) for VLSI circuit. (Email: guoyang@nudt.edu.cn)

    Haiyan SUN was born in 1976. She is a Ph.D. and Associate Research Fellow in National University of Defense Technology. Her main research interests include compilation, programming model, and embedded application. (Email: helen@nudt.edu.cn)

    Zekun JIANG was born in 1996. He received the M.E. degree from Chongqing University of Posts and Telecommunications. His main research interests include microprocessor architecture. (Email: jzk61010@hotmail.com)

  • Received Date: 2021-11-17
  • Accepted Date: 2021-12-24
  • Available Online: 2022-03-05
  • Publish Date: 2023-09-05
  • The shuffle operations are the bottleneck when mapping the FFT-like algorithms to the vector single instruction multiple data (SIMD) architectures. We propose six (three pairs) innovative vector memory-access shuffle fused instructions, which have been proved mathematically. Combined with the proposed modified binary-exchange method, the innovative instructions can efficiently address the bottleneck problem for decimation-in-frequency or decimation-in-time (DIF/DIT) radix-2/4 FFT-like algorithms, reach a performance improvement by 17.9%–111.2% and reduce the code size by 5.4%–39.8%. In addition, the proposed instructions fit some hybrid-radix FFTs and are suitable for the terms of the initial or result data placement for general algorithms. The software and hardware costs of the proposed instructions are moderate.
  • loading
  • [1]
    J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Mathematics of Computation, vol.19, no.90, pp.297–301, 1965. doi: 10.1090/S0025-5718-1965-0178586-1
    [2]
    W. Hussain, F. Garzia, T. Ahonen, et al., “Designing fast Fourier transform accelerators for orthogonal frequency-division multiplexing systems,” Journal of Signal Processing Systems, vol.69, no.2, pp.161–171, 2012. doi: 10.1007/s11265-011-0642-6
    [3]
    A. Gupta and V. Kumar, “The scalability of FFT on parallel computers,” IEEE Transactions on Parallel and Distributed Systems, vol.4, no.8, pp.922–932, 1993. doi: 10.1109/71.238626
    [4]
    J. W. Van De Waerdt, S. Vassiliadis, S. Das, et al., “The TM3270 media-processor,” in Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, Barcelona, Spain, pp.331–342, 2005.
    [5]
    Texas Instruments, “Tms320c64x/C64x+ DSP CPU and instruction set reference guide, Texas Instruments User manual SPRU732C,”
    Available at: https://www.ti.com/lit/ug/spru732j/spru732j.pdf, 2015
    [6]
    J. Fridman, “Data alignment for sub-word parallelism in DSP,” in Proceedings of IEEE Workshop on Signal Processing Systems. SiPS 99. Design and Implementation (Cat. No. 99TH8461), Taipei, China, pp.251–260, 1999.
    [7]
    R. Thomas, An Architectural Performance Study of the Fast Fourier Transform on Vector IRAM. University of California, Berkeley, Berkeley, CA, USA, pp.16–35, 2000.
    [8]
    K. Zhang, S. M. Chen, S. Liu, et al., “Accelerating the data shuffle operations for FFT algorithms on SIMD DSPs,” in Proceedings of the 9th IEEE International Conference on ASIC, Xiamen, China, pp.740–743, 2011.
    [9]
    B. S. He, N. K. Govindaraju, Q. Luo, et al., “Efficient gather and scatter operations on graphics processors,” in Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, Reno, NV, USA, 2007.
    [10]
    G. Efland, S. Parikh, H. Sanghavi, et al., “High performance DSP for vision, imaging and neural networks,” in 2016 IEEE Hot Chips 28 Symposium (HCS), Cupertino, CA, USA, pp.1–30, 2016.
    [11]
    M. Woh, S. Seo, S. Mahlke, et al., “AnySP: Anytime anywhere anyway signal processing,” in Proceedings of the 36th Annual International Symposium on Computer Architecture, Austin, TX, USA, pp.128–139, 2009.
    [12]
    S. M. Chen, Y. H. Wang, S. Liu, et al., “FT-Matrix: A coordination-aware architecture for signal processing,” IEEE Micro, vol.34, no.6, pp.64–73, 2014. doi: 10.1109/MM.2013.129
    [13]
    J. H. Huang, “Design and implementation of vectorized FFTs on the YHFT-Matrix,” Master Thesis, Nat. Univ. Defense Technol., Changsha, China, 2012. (in Chinese)
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(8)  / Tables(3)

    Article Metrics

    Article views (634) PDF downloads(23) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return