LIU Li, LIU Li, YANG Guangwen. A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization[J]. Chinese Journal of Electronics, 2012, 21(1): 7-12.
Citation: LIU Li, LIU Li, YANG Guangwen. A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization[J]. Chinese Journal of Electronics, 2012, 21(1): 7-12.

A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization

  • Received Date: 2011-02-01
  • Rev Recd Date: 2011-03-01
  • Publish Date: 2012-01-05
  • In this paper, we try to accelerate sparse LU factorization on GPU. We present a tiled storage format and a parallel algorithm to improve the memory access pattern, and a register blocking method to compress the on-chip working set. The OPENMP implementation of our algorithm gives more stable performance over different matrices, and outperforms SuperLU and KLU by 1.88~6 times on an Intel 8-core CPU (Central processing unit) for matrices from the Florida matrix collection. Based on this algorithm, we further propose a GPU-CPU hybrid pipelined scheme to overlap computations on CPU with computations on GPU. Compared to the better of SuperLU and KLU on an Intel 8-core CPU, our algorithm achieves 1.1~19.7-fold speedup on GPU for double precision. Compared to the OPENMP implementation of our algorithm on an Intel 8-core CPU, our GPU implementation gets a 2-fold speedup for the best cases.
  • loading
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (794) PDF downloads(2197) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return