Vector Memory-Access Shuffle Fused Instructions for FFT-Like Algorithms
-
Abstract
The shuffle operations are the bottleneck when mapping the FFT-like algorithms to the vector single instruction multiple data (SIMD) architectures. We propose six (three pairs) innovative vector memory-access shuffle fused instructions, which have been proved mathematically. Combined with the proposed modified binary-exchange method, the innovative instructions can efficiently address the bottleneck problem for decimation-in-frequency or decimation-in-time (DIF/DIT) radix-2/4 FFT-like algorithms, reach a performance improvement by 17.9%–111.2% and reduce the code size by 5.4%–39.8%. In addition, the proposed instructions fit some hybrid-radix FFTs and are suitable for the terms of the initial or result data placement for general algorithms. The software and hardware costs of the proposed instructions are moderate.
-
-