Citation: | LIU Sheng, YUAN Bo, GUO Yang, et al., “Vector Memory-Access Shuffle Fused Instructions for FFT-Like Algorithms,” Chinese Journal of Electronics, vol. 32, no. 5, pp. 1077-1088, 2023, doi: 10.23919/cje.2021.00.401 |
[1] |
J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series,” Mathematics of Computation, vol.19, no.90, pp.297–301, 1965. doi: 10.1090/S0025-5718-1965-0178586-1
|
[2] |
W. Hussain, F. Garzia, T. Ahonen, et al., “Designing fast Fourier transform accelerators for orthogonal frequency-division multiplexing systems,” Journal of Signal Processing Systems, vol.69, no.2, pp.161–171, 2012. doi: 10.1007/s11265-011-0642-6
|
[3] |
A. Gupta and V. Kumar, “The scalability of FFT on parallel computers,” IEEE Transactions on Parallel and Distributed Systems, vol.4, no.8, pp.922–932, 1993. doi: 10.1109/71.238626
|
[4] |
J. W. Van De Waerdt, S. Vassiliadis, S. Das, et al., “The TM3270 media-processor,” in Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, Barcelona, Spain, pp.331–342, 2005.
|
[5] |
Texas Instruments, “Tms320c64x/C64x+ DSP CPU and instruction set reference guide, Texas Instruments User manual SPRU732C,”
Available at: https://www.ti.com/lit/ug/spru732j/spru732j.pdf, 2015 |
[6] |
J. Fridman, “Data alignment for sub-word parallelism in DSP,” in Proceedings of IEEE Workshop on Signal Processing Systems. SiPS 99. Design and Implementation (Cat. No. 99TH8461), Taipei, China, pp.251–260, 1999.
|
[7] |
R. Thomas, An Architectural Performance Study of the Fast Fourier Transform on Vector IRAM. University of California, Berkeley, Berkeley, CA, USA, pp.16–35, 2000.
|
[8] |
K. Zhang, S. M. Chen, S. Liu, et al., “Accelerating the data shuffle operations for FFT algorithms on SIMD DSPs,” in Proceedings of the 9th IEEE International Conference on ASIC, Xiamen, China, pp.740–743, 2011.
|
[9] |
B. S. He, N. K. Govindaraju, Q. Luo, et al., “Efficient gather and scatter operations on graphics processors,” in Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, Reno, NV, USA, 2007.
|
[10] |
G. Efland, S. Parikh, H. Sanghavi, et al., “High performance DSP for vision, imaging and neural networks,” in 2016 IEEE Hot Chips 28 Symposium (HCS), Cupertino, CA, USA, pp.1–30, 2016.
|
[11] |
M. Woh, S. Seo, S. Mahlke, et al., “AnySP: Anytime anywhere anyway signal processing,” in Proceedings of the 36th Annual International Symposium on Computer Architecture, Austin, TX, USA, pp.128–139, 2009.
|
[12] |
S. M. Chen, Y. H. Wang, S. Liu, et al., “FT-Matrix: A coordination-aware architecture for signal processing,” IEEE Micro, vol.34, no.6, pp.64–73, 2014. doi: 10.1109/MM.2013.129
|
[13] |
J. H. Huang, “Design and implementation of vectorized FFTs on the YHFT-Matrix,” Master Thesis, Nat. Univ. Defense Technol., Changsha, China, 2012. (in Chinese)
|