CHEN Hu, CHEN Shuming, CHEN Xiaowen, et al., “HiSCA: Overcoming the Limitation of Clustered Unicore Processors Through Hardware/Software Codesign,” Chinese Journal of Electronics, vol. 21, no. 3, pp. 439-444, 2012,
Citation: CHEN Hu, CHEN Shuming, CHEN Xiaowen, et al., “HiSCA: Overcoming the Limitation of Clustered Unicore Processors Through Hardware/Software Codesign,” Chinese Journal of Electronics, vol. 21, no. 3, pp. 439-444, 2012,

HiSCA: Overcoming the Limitation of Clustered Unicore Processors Through Hardware/Software Codesign

  • Received Date: 2011-04-01
  • Rev Recd Date: 2011-11-01
  • Publish Date: 2012-07-25
  • The partitioning of resources such as pipelines and register files among clusters has been proven to be an effective way to improve performance and scalability. However, improvements are limited by traditional binary instruction encoding schemes and centralized instruction execution control mechanism. Meanwhile, clustered processors may come at the cost of performance degradation due to limited data locality resulted from a lack of available registers and functional units. This paper introduces a Highly scalable clustered architecture (HiSCA) to improve the scalability and performance of clustered processors. The hardware/software instruction encoding scheme of HiSCA splits the instruction stream into chains of instructions (packs) and encodes common information within the same packs in dedicated instruction words, thus reducing the amount of information encoded in instruction words. The pipeline of HiSCA, which features in-order issuing, out-of-order execution and parallel but in-order commitment, release instruction issuing from the heavy burden of dynamic scheduling, and allows functional units to fetch data and manage their own execution. HiSCA scales efficiently to 32 clusters with 1024 general purpose registers. Experimental results also show that, for a 4- cluster/8-issue configuration, HiSCA can achieve an average of 13.3% performance speedup and a 4.6% improvement in frequency with minimal hardware overhead, as compared to a traditional clustered processor with nearly the same hardware complexity.
  • loading
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (569) PDF downloads(1061) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return