DENG Liang, ZHAO Dan, BAI Hanli, WANG Fang. Performance Optimization and Comparison of the Alternating Direction Implicit CFD Solver on Multi-core and Many-Core Architectures[J]. Chinese Journal of Electronics, 2018, 27(3): 540-548. doi: 10.1049/cje.2018.03.011
Citation: DENG Liang, ZHAO Dan, BAI Hanli, WANG Fang. Performance Optimization and Comparison of the Alternating Direction Implicit CFD Solver on Multi-core and Many-Core Architectures[J]. Chinese Journal of Electronics, 2018, 27(3): 540-548. doi: 10.1049/cje.2018.03.011

Performance Optimization and Comparison of the Alternating Direction Implicit CFD Solver on Multi-core and Many-Core Architectures

doi: 10.1049/cje.2018.03.011
Funds:  This work is supported by the National Key Research and Development Program of China (No.2016YFB0200703) and the National Natural Science Foundation of China (No.61379056).
  • Received Date: 2015-12-21
  • Rev Recd Date: 2016-04-09
  • Publish Date: 2018-05-10
  • We accelerate a double precision Alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational fluid dynamics (CFD) software on the latest multi-core and many-core architectures (Intel Sandy Bridge CPUs, Intel Many integrated core (MIC) coprocessors and NVIDIA Kepler K20c GPUs). Some performance optimization techniques are detailed discussed. We provide an in-depth analysis on the performance difference between Sandy Bridge and MIC. Experimental results show that the proposed GPU-enabled ADI solver can achieve a speedup of 5.5 on a Kepler GPU in contrast to two Sandy Bridge CPUs and our optimization techniques can improve the performance of the ADI solver by 2.5-fold on two Sandy Bridge CPUs and 1.7-fold on an Intel MIC coprocessor. We perform a cross-platform performance analysis (between GPU and MIC), which serves as case studies for developers to select the right accelerators for their target applications.
  • loading
  • P. Giangiacomo and V. Michelassi, "An efficient parallel ADI algorithm for turbomachinery flows", International Journal of Computational Fluid Dynamics, Vol.17, No.1, pp.15-26, 2003.
    P. Panickar, J.P. Erwin, N. Sinha, et al., "Localization of acoustic sources in shock-containing jet flows using phased array measurements", Proc. of 51st AIAA Aerospace Science Meeting, Grapevine, Texas, USA, pp.2013-2025, 2013.
    M.A. Prakash, K. Mayilsamy and P.R. Kanna P, "Numerical simulation of two dimensional laminar sall jet flow over solid obstacle", Applied Mechanics and Materials, Vol.592, No.1, pp.1935-1939, 2014.
    A. Wood and K.H. Wang, "Modeling dam-break flows in channels with 90 degree bend using an alternating-direction implicit based curvilinear hydrodynamic solver", Computers & Fluids, Vol.114, No.3, pp.254-264, 2015.
    N. Satish, C. Kim, J. Chhugani, et al., "Can traditional programming bridge the ninja performance gap for parallel computing applications?", Proc. of ACM SIGARCH Computer Architecture News, New York, USA, pp.440-451, 2012.
    Y. You, H. Fu, S.L. Song, et al., "Evaluating multi-core and many-core architectures through accelerating the threedimensional Lax? Wendroff correction stencil", International Journal of High Performance Computing Applications, Vol.28, No.3, pp.301-318, 2014.
    N. Sakharnykh, "Tridiagonal solvers on the GPU and applications to fluid simulation", Proc. of NVIDIA GPU Technology Conference, San Jose, California, USA, pp.22-28, 2009.
    W. Zhang, B. Jang, Y. Zhang, et al., "Parallelizing alternating direction implicit solver on GPUs", Procedia Computer Science, Vol.18, No.1, pp.389-398, 2013.
    P.V. Le, P. Kumar, A.J. Valocchi, et al., "GPU-based highperformance computing for integrated surface?sub-surface flow modeling", Environmental Modelling & Software, Vol.73, No.3, pp.1-13, 2015.
    Y.X. Wang, L.L. Zhang, W. Liu, et al., "Efficient parallel implementation of large scale 3D structured grid CFD applications on the Tianhe-1A supercomputer", Computers & Fluids, Vol.80, No.1, pp.244-250, 2013.
    J. Treibig, G. Hager and G. Wellein, "Likwid:A lightweight performance-oriented tool suite for x86 multicore environments", Proc. of 39th International Conference on Parallel Processing Workshops, San Diego, California, USA, pp.207-216, 2010.
    M. Sato, S. Tsutsui, N. Fujimoto, et al., "First results of performance comparisons on many-core processors in solving QAP with ACO:Kepler GPU versus Xeon Phi", Proc. of the 2014 Conference Companion on Genetic and Evolutionary Computation Companion, Vancouver, Canada, pp.1477-1478, 2014.
    T. Liu, X.G. Xu and C.D. Carothers, "Comparison of two accelerators for Monte Carlo radiation transport calculations, Nvidia Tesla M2090 GPU and Intel Xeon Phi 5110p coprocessor:A case study for X-ray CT imaging dose calculation", Annals of Nuclear Energy, Vol.82, No.1, pp.230-239, 2015.
    M. Bernaschi, M. Bisson and F. Salvadore, "Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations", Computer Physics Communications, Vol.185, No.10, pp.2495-2503, 2014.
    B. Varghese, "The GPU vs Phi debate:Risk analytics using many-core computing", arXiv preprint arXiv:1501.06326, 2015.
    E.F. Toro, Riemann Solvers and Numerical Methods for Fluid Dynamics, Springer Science & Business Media, Berlin, Germany, pp.32-34, 1997.
    L.B. Van, "Towards the ultimate conservative difference scheme", Journal of Computational Physics, Vol.135, No.2, pp.229-248, 1997.
    D.W. Peaceman, Rachford and H.H. Jr, "The numerical solution of parabolic and elliptic differential equations", Journal of the Society for Industrial & Applied Mathematics, Vol.3, No.1, pp.28-41, 1955.
    M. Harris, "Optimizing parallel reduction in CUDA", http://developer.download.nvidia.com/assets/cuda/files/reduction.pdf, 2012-9-11.
    S. Rennich, "CUDA C/C++ streams and concurrency", http://on-demand.gputechconf.com/gtc-express/2011/presentations/, 2012-7-1.
    NVIDIA Corporation, "CUDA C best practices guide version 4.2", http://www.scribd.com/doc/106303214/CUDA-CBest-Practices-Guide/2012.
    NVIDIA Corporation, "GPU occupancy calculator", http://developer.download.nvidia.com/compute/cuda/, 2010.
    J. Jeffers and J. Reinders, Intel Xeon Phi Coprocessor HighPerformance Programming, Newnes, USA, pp.42-43, 2013.
    Y.X. Wang, L.L. Zhang, Y.G. Che, et al., "Efficient parallel computing and performance tuning for multi-block structured grid CFD applications on Tian-he supercomputer", Chinese Journal of Electronics, Vol.43, No.1, pp.36-44, 2014(in Chinese).
    G. Teodoro, T. Kurc, G. Andrade, et al., "Performance analysis and efficient execution on systems with multi-core CPUs, GPUs and MICs", arXiv preprint arXiv:1505.03819, 2015.
    X. Tian, H. Saito, S.V. Preis, et al., "Effective SIMD vectorization for Intel Xeon Phi coprocessors", Scientific Programming, Vol.501, No.1, pp.69-76, 2015.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (70) PDF downloads(349) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return