Citation: | LI Bingchao, WEI Jizeng, GUO Wei, et al., “Improving SIMD Utilization with Thread-Lane Shuffled Compaction in GPGPU,” Chinese Journal of Electronics, vol. 24, no. 4, pp. 684-688, 2015, doi: 10.1049/cje.2015.10.004 |
Chang Yisong, Wei Jizeng, Zhao Guoyu, et al., "A novel architecture of special arithmetic function unit for area-efficient programmable vertex shader", Chinese Journal of Electronics, Vol.22, No.3, pp.483-488, 2013.
|
Liu Li, Liu Li and Yang Guangwen, "A highly efficient GPUCPU hybrid parallel implementation of sparse LU factorization", Chinese Journal of Electronics, Vol.21, No.1, pp.7-12, 2012.
|
E. Lindholm, J. Nickolls, S. Oberman, et al., "Nvidia tesla: A unified graphics and computing architecture", IEEE Micro, Vol.28, No.2, pp.39-55, 2008.
|
Jing Naifeng, Shen Yao, Lu Yao, et al., "An energy-efficient and scalable eDRAM-based register file architecture for GPGPU", Proc. of International Symposium on Computer Architecture, Tel-Aviv, Israel, pp.344-355, 2013.
|
Mark Gebhart, Daniel R. Johnson, David Tarjan, et al., "A hierarchical thread scheduler and register file for energy-efficient throughput processors", Transactions on Computer Systems, Vol.30, No.2, pp.8:1-8:38, 2012.
|
Wing-kei S. Yu, Ruirui Huang, et al., "SRAM-DRAM hybrid memory with applications to efficient register files in finegrained multi-threading", Proc. of International Symposium on Computer Architecture, San Jose, USA, pp.247-258, 2011.
|
S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann, Burlington, USA, 1997.
|
W.W.L. Fung, I. Sham, G. Yuan and T.M. Aamodt, "Dynamic warp formation and scheduling for efficient GPU control flow", Proc. of International Symposium on Microarchitecture, Chicago, USA, pp.407-420, 2007.
|
W.W.L. Fung, I. Sham, G. Yuan and T.M. Aamodt, "Dynamic warp formation: Efficient mimd control flow on simd graphics hardware", ACM Trans. Archit. Code Optim., Vol.6, No.2, pp.7:1-7:37, 2009.
|
N. Brunie, S. Collange and G. Diamos, "Simultaneous branch and warp interweaving for sustained gpu performance", Proc. of International Symposium on Computer Architecture, Portland, USA, pp.49-60, 2012.
|
G. Diamos, B. Ashbaugh, et al., "Simd re-convergence at thread frontiers", Proc. of International Symposium on Microarchitecture, Porto Alegre, Brazil, pp.477-488, 2011.
|
V. Narasiman, M. Shebanow, C.J. Lee, et al., "Improving GPU performance via large warps and two-level warp scheduling", Proc. of International Symposium on Microarchitecture, Porto Alegre, Brazil, pp.308-317, 2011.
|
M. Rhu and M. Erez, "Capri: Prediction of compactionadequacy for handling control-divergence in GPGPU architectures", Proc. of International Symposium on Computer Architecture, Portland, USA, pp.61-71, 2012.
|
M. Rhu and M. Erez, "The dual-path execution model for efficient GPU control flow", Proc. of International Symposium on High Performance Computer Architecture, Shenzhen, China, pp.591-602, 2013.
|
J. Meng, D. Tarjan and K. Skadron, "Dynamic warp subdivision for integrated branch and memory divergence tolerance", Proc. of International Symposium on Computer Architecture, Saint-Malo, France, pp.235-246, 2010.
|
Aniruddha S. Vaidya, Anahita Shayesteh, Dong Hyuk Woo, et al., "SIMD divergence optimization through intra-warp compaction", Proc. of International Symposium on Computer Architecture, Tel-Aviv, Israel, pp.368-379, 2013.
|
W.W.L. Fung and T.M. Aamodt, "Thread block compaction for efficient SIMT control flow", Proc. of International Symposium on High Performance Computer Architecture, San Antonio, USA, pp.25-36, 2011.
|
M. Rhu and M. Erez, "Maximizing simd resource utilization in GPGPUs with simd lane permutation", Proc. of International Symposium on Computer Architecture, Tel-Aviv, Israel,pp.356-367, 2013.
|
Yaohua Wang, Shuming Chen, et al., "Instruction Shuffle: Achieving MIMD-like performance on SIMD architectures", Computer Architecture Letters, Vol.11, No.2, pp.37-40, 2012.
|
A. Bakhoda, G.L. Yuan, W.W.L. Fung, et al., "Analyzing CUDA workloads using a detailed GPU simulator", Proc. of International Symposium on Performance Analysis of Systems and Software, Boston, USA, pp.163-174, 2009.
|
A. Bakhoda, G.L. Yuan, W.W.L. Fung, et al., GPGPU-Sim, http://www.gpgpu-sim.org, 2013.
|
A. Bakhoda, G.L. Yuan, W.W.L. Fung, et al., GPGPU-Sim Manual, http://www.gpgpu-sim.org/manual, 2013.
|
S. Che, M. Boyer, J. Meng, et al., "Rodinia: A benchmark suite for heterogeneous computing", International Symposium on Workload Characterization, Austin, USA, pp.44-54, 2009.
|
NVIDIA Corporation, GPU Computing SDK, Version 2.3, 2009.
|
John A. Stratton, et al., The Parboil Technical Report, University of Illinois at Urbana-Champaign, 2012.
|