SHEN Fanfan, HE Yanxiang, ZHANG Jun, et al., “Feedback Learning Based Dead Write Termination for Energy Efficient STT-RAM Caches,” Chinese Journal of Electronics, vol. 26, no. 3, pp. 460-467, 2017, doi: 10.1049/cje.2017.03.014
Citation: SHEN Fanfan, HE Yanxiang, ZHANG Jun, et al., “Feedback Learning Based Dead Write Termination for Energy Efficient STT-RAM Caches,” Chinese Journal of Electronics, vol. 26, no. 3, pp. 460-467, 2017, doi: 10.1049/cje.2017.03.014

Feedback Learning Based Dead Write Termination for Energy Efficient STT-RAM Caches

doi: 10.1049/cje.2017.03.014
Funds:  This work is supported by the National Natural Science Foundation of China (No.91118003, No.61170022, No.61373039, No.61402145, No.61502346), the Natural Science Foundation of Hubei Province (No.2015CFB338), the Natural Science Foundation of Anhui Province (No.1508085QF138), and the Science and Technology Project of Jiangxi Province Education Department (No.GJJ150605).
More Information
  • Corresponding author: HE Yanxiang (corresponding author) was born in 1952. He is a professor and Ph.D. supervisor at Wuhan University. His research interests include compiler theory, trusted software and software engineering. (Email:yxhe@whu.edu.cn)
  • Received Date: 2015-11-17
  • Rev Recd Date: 2016-03-17
  • Publish Date: 2017-05-10
  • Spin-torque transfer RAM (STT-RAM) is a promising candidate to replace SRAM for larger Last level cache (LLC). However, it has long write latency and high write energy which diminish the benefit of adopting STT-RAM caches. A common observation for LLC is that a large number of cache blocks have never been referenced again before they are evicted. The write operations for these blocks, which we call dead writes, can be eliminated without incurring subsequent cache misses. To address this issue, a quantitative scheme called Feedback learning based dead write termination (FLDWT) is proposed to improve energy efficiency and performance of STT-RAM based LLC. FLDWT dynamically learns the block access behavior by using data reuse distance and data access frequency, and then classifies the blocks into dead blocks and live blocks. FLDWT terminates dead write block requests and improves the estimation accuracy via feedback information. Compared with STT-RAM baseline in the lastlevel caches, experimental results show that our scheme achieves energy reduction by 44.6% and performance improvement by 12% on average with negligible overhead.
  • loading
  • J. Ahn, S. Yoo and K. Choi, "Dasca:Dead write prediction assisted stt-ram cache architecture", Proc. Int. Symp. High Performance Comput. Archit., Orlando, Florida, USA, pp.25-36, 2014.
    E. Chen, D. Apalkov, Z. Diao, et al., "Advances and future prospects of spin-transfer torque random access memory", IEEE Trans. Magn., Vol.46, No.6, pp.1873-1878, 2010.
    M. Hosomi, H. Yamagishi, T. Yamamoto, et al., "A novel nonvolatile memory with spin torque transfer magnetization switching:spin-ram", Int. Electron Devices Meeting, Washington DC, USA, pp.459-462, 2005.
    G. Sun, X. Dong, Y. Xie, et al., "A novel architecture of the 3d stacked mram l2 cache for cmps", Proc. Int. Symp. High Performance Comput. Archit., Raleigh, NC, USA, pp.239-249, 2009.
    P. Mangalagiri, K. Sarpatwari, A. Yanamandra, et al., "A lowpower phase change memory based hybrid cache architecture", Proc. of the 18th ACM Great Lakes Symp. on VLSI, New York, NY, USA, pp.395-398, 2008.
    X. Wu, J. Li, L. Zhang, et al., "Hybrid cache architecture with disparate memory technologies", Comput. Archit. News, Vol.37, No.3, pp.34-45, 2009.
    A. Jog, A.K. Mishra, C. Xu, et al., "Cache revive:Architecting volatile stt-ram caches for enhanced performance in cmps", Proc. Des. Automat. Conf., San Francisco, CA, USA, pp.243-252, 2012.
    C. W. Smullen, V. Mohan, A. Nigam, et al., "Relaxing nonvolatility for fast and energy-efficient stt-ram caches", Proc. Int. Symp. High Performance Comput. Archit., San Antonio, TX, USA, pp.50-61, 2011.
    Z. Sun, X. Bi, H. Li, et al., "Multi retention level stt-ram cache designs with a dynamic refresh scheme", Proc.Int. Symp. Microarchit., New York, NY, USA, pp.329-338, 2011.
    S. Khan, Y. Tian and D. Jimenez, "Sampling dead block prediction for last-level caches", Proc. Int. Symp. Microarchit., Atlanta, GA, USA, pp.175-186, 2010.
    D. Apalkov, A. Khvalkovskiy, S. Watts, et al., "Spin-transfer torque magnetic random access memory (stt-mram)", ACM Journal on Emerging Technologies in Computing Systems, Vol.9, No.2, pp.13-1-13-35, 2013.
    C. Ding and Y. Zhong, "Predicting whole-program locality through reuse distance analysis", ACM SIGPLAN Not., Vol.38, No.5, pp.245-257, 2003.
    N. Binkert, B. Beckmann, G. Black, et al., "The gem5 simulator", Comput. Archit. News, Vol.39, No.2, pp.1-7, 2011.
    J. Fang, J. Wang, C. Li, et al., "Partition-based cache replacement to manage shared l2 caches", Chinese Journal of Electronics, Vol.23, No.3, pp.464-467, 2014.
    X. Dong, C. Xu, Y. Xie, et al., "Nvsim:A circuit-level performance, energy, and area model for emerging nonvolatile memory", IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, Vol.31, No.7, pp.994-1007, 2012.
    C. Bienia, S. Kumar, J.P. Singh, et al., "The parsec benchmark suite:Characterization and architectural implications", Proc. of the 17th Int. Conf. on Parallel Archit. and Compilation Techniques, New York, NY, USA, pp.72-81, 2008.
    Q. Li, J. Li, L. Shi, et al., "Mac:Migration-aware compilation for stt-ram based hybrid cache in embedded systems", Proc. of the Int. Symp. on Low Power Electronics and Design, New York, NY, USA, pp.351-356, 2012.
    Z. Wang, D. Jimenez, C. Xu, et al., "Adaptive placement and migration policy for an stt-ram-based hybrid cache", Proc. Int. Symp. High Performance Comput. Archit., Orlando, FL, USA, pp.13-24, 2014.
    Z. Jun, "Xy-type gpu cache:Exploiting spatial localities in both x and y directions to avoid conflict miss", Chinese Journal of Electronics, Vol.24, No.1, pp.88-95, 2015.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (420) PDF downloads(1364) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return