Citation: | PENG Yuanxi, ZHOU Feng, HAI Yue, et al., “A Multi-instruction Streams Extension Mechanism for SIMD Processor,” Chinese Journal of Electronics, vol. 26, no. 6, pp. 1154-1160, 2017, doi: 10.1049/cje.2017.09.013 |
R. Krashinsky, C. Batten and M. Hampton, "The vector-thread architecture", Proceedings of the 31st Annual International Symposium on Computer Architecture, IEEE Computer Society Washington, DC, USA, pp.52-63, 2004.
|
Lee, Yunsup, et al., "Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators", ACM SIGARCH Computer Architecture News, Vol.39, No.3, pp.129-140, 2011.
|
B.C. Li, J.Z. Wei, W. Guo and J.Z. Sun, "Improving SIMD utilization with thread-lane shuffled compaction in GPGPU", Chinese Journal of Electronics, Vol.24, No.2, pp.684-688, 2015.
|
W.W.L. Fung, I. Sham and G. Yuan, "Dynamic warp formation:Efficient MIMD control flow on SIMD graphics hardware", ACM Transactions on Architecture and Code Optimization (TACO), Vol.6, No.2, pp.407-420, 2009.
|
Aniruddha S. Vaidya, Anahita Shayesteh, Dong Hyuk Woo, et al., "SIMD divergence optimization through intra-warp compaction", ACM SIGARCH Computer Architecture News, pp.368-379, 2013.
|
Rhu, Minsoo and Mattan Erez, "CAPRI:Prediction of compaction-adequacy for handling control-divergence in GPGPU architectures", ACM SIGARCH Computer Architecture News, Vol.40, No.3, pp.61-71, 2012.
|
Minsoo Rhu and Mattan Erez, "Maximizing SIMD resource utilizationin GPGPUs with SIMD lane permutation", International Symposium on Computer Architecture, pp.356-367, 2013.
|
El Tantawy, Ahmed, et al., "A scalable multi-path microarchitecture for efficient GPU control flow", IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), 2014.
|
Yaohua Wang, Shuming Chen, et al., "Instruction shuffle:Achieving mimd-like performance on simd architectures", IEEE Computer Architecture Letters, Vol.11, No.2, pp.37-40, 2012.
|