CAO Qian, LI Huiyong, ZUO Min, DENG Yun, YU Yuan. An Adaptive Task Creation Pruning Strategy for Optimizing Irregular Applications[J]. Chinese Journal of Electronics, 2015, 24(3): 535-541. doi: 10.1049/cje.2015.07.017
Citation: CAO Qian, LI Huiyong, ZUO Min, DENG Yun, YU Yuan. An Adaptive Task Creation Pruning Strategy for Optimizing Irregular Applications[J]. Chinese Journal of Electronics, 2015, 24(3): 535-541. doi: 10.1049/cje.2015.07.017

An Adaptive Task Creation Pruning Strategy for Optimizing Irregular Applications

doi: 10.1049/cje.2015.07.017
Funds:  This work is supported by the Scientific Research Common Program of Beijing Municipal Commission of Education (No.KM201410011005), the National Natural Science Foundation of China (No.61170113), the National Basic Research Program (973) of China (No.2012CB821200, No.2012CB821206), and Project Foundation of Guangxi Experiment Center of Information Science, Guilin University of Electronic Technology (No.20130206).
More Information
  • Corresponding author: LI Huiyong (corresponding author) was born in Hebei province in 1983. He received the Ph.D degree in computer application from Beihang University. He is now a postdoctor at Peking University. His research interests include high-performance video process and embedded system design. (Email: lihyit@126.com)
  • Received Date: 2014-12-26
  • Rev Recd Date: 2015-02-03
  • Publish Date: 2015-07-10
  • The OpenMP task directive makes it possible to efficiently parallelize irregular applications, with task granularity as one of the most critical issues. To implement OpenMP specification on multi-core architecture, a model is presented specializing in the execution of irregular applications. The model captures computation and communication within a node with host cores and accelerator cores. Based on this model, we propose an adaptive task creation pruning strategy including two stages to adjust dynamically task granularity. The first stage is task creation in breadth-first manner until getting to a threshold, which utilizes potential parallelism of multi-core processor. The second stage is starvation-triggered task regeneration once some worker thread becomes starved, which ensures work-stealing and thus achieves load balance. The evaluation is conducted with a series of typical irregular benchmarks, and the results indicate that our approach offers more effective performance in parallel execution of irregular benchmarks.
  • loading
  • U. Chao, Y.X. HE, Y. Chen, J.B. Liu, W. Wu and Q.A. Li, "A task scheduling method to increase the reliability of the multicore system", Chinese Journal of Electronics, Vol.41, No.5, pp.1019-1024, 2013.
    M. Pricopi and T. Mitra, "Bahurupi: A polymorphic heterogeneous multi-core architecture", ACM Trans. on Architecture and Code Optimization, Vol.8, No.4, pp.1-21, 2012.
    L. Liu, L. Liu and G.W. Yang, "A highly efficient GPU-CPU hybrid parallel implementation of sparse LU factorization", Chinese Journal of Electronics, Vol.21, No.1, pp.7-12, 2012.
    "OpenMP application program interface", Version 3.0. OpenMP Architecture Review Board, 2008.
    A.J.N. Yzelman and D. Roose, "High-level strategies for parallel shared-memory sparse matrix-vector multiplication", IEEE Trans. on Parallel and Distributed Systems, Vol.25, No.1, pp.116-125, 2014.
    H. LU, X. CHEN and J. LIU, "Parallel test task scheduling with constraints based on hybrid particle swarm optimization and taboo search", Chinese Journal of Electronics, Vol.21, No.4, pp.615-618, 2012.
    B. Goldberg, "Multiprocessor execution of functional programs", International Journal of Parallel Programming, Vol.17, No.5, pp.425-473, 1988.
    P. Hudak and B. Goldberg, "Serial combinators: Optimal grains of parallelism", Proc. of Functional Programming Languages and Computer Architecture, Nancy, France, pp.382-399, 2005.
    C. Addison, J. LaGrone, L. Huang and B. Chapman, "OpenMP 3.0 tasking implementation in OpenUH", Open64 Workshop at Code Generation and Optimization, Seattle, USA, pp.78-87, 2009.
    A. Duran, J. Corbalán and E. Ayguadé, "Evaluation of OpenMP task scheduling strategies", Proc. of International Workshop on OpenMP, West Lafayette, IN, USA, pp.101-110, 2008.
    S.L. Olivier, A.K. Porterfield, K.B.Wheeler, M. Spiegel and J.F. Prins, "OpenMP task scheduling strategies for multicore numa systems", International Journal of High Performance Computing Applications, Vol.26, No.2, pp.110-124, 2012.
    Y. Guo, R. Barik, R. Raman and V. Sarka, "Work-first and help-first scheduling policies for async-finish task parallelism", International Parallel and Distributed Processing Symposium (IPDPS), Rome, Italy, pp.1-12, 2009.
    A. Duran, J. Corbalán and E. Ayguadé, "An adaptive cut-off for task parallelism", Proc. of the ACM/IEEE Conf. on Supercomputing, Austin, USA, pp.1-11, 2008.
    T. Hiraishi, M. Yasugi, S. Umatani and T. Yuasa, "Backtracking-based load balancing", Proc. of ACMSIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Raleigh, North Carolina, pp.55-64, 2009.
    L. Wang, H.M. Cui, Y.L. Duan, F. Lu, X.B. Feng and P. Yew, "An adaptive task creation strategy for work-stealing scheduling", Proc. of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Toronto, Canada, pp.266-277, 2010.
    H. Jordan, P. Thoman, J.J. Durillo, S. Pellegrini, P. Gschwandtner, T. Fahringer and H. Moritsch, "A multi-objective autotuning framework for parallel codes", Proc. of 2012 ACM/IEEE International Conference on Supercomputing, Salt Lake City, UT, USA, pp.1-12, 2012.
    N.A. Deshpande and S.A. Edwards, "Statically unrolling recursion to improve opportunities for parallelism", Technical Report, Department of Computer Science, Columbia University, 2012.
    B. Bao and Y. Yang, "Task granularity analysis for task decomposition in collaborative customized product development", International Journal of Applied Mathematics and Statistics, Vol.52, No.4, pp.487-491, 2014.
    P. Thoman, H. Jordan and T. Fahringer, "Adaptive granularity control in task parallel programs using multiversioning", Proc. of 19th International Conference of Parallel Processing, Phoenix, USA, pp.164-177, 2013.
    P. Thoman, H. Jordan and T. Fahringer, "Compiler multiversioning for automatic task granularity control", Concurrency and Computation: Practice and Experience, Vol.26, No.14, pp.2367-2385, 2014.
    N. Muthuvelu, C. Vecchiola, I. Chai, E. Chikkannan and R. Buyya, "Task granularity policies for deploying bag-of-task applications on global grids", Journal of Future Generation Computer Systems, Vol.29, No.1, pp.170-181, 2012.
    A. Duran, X. Teruel, R. Ferrer, X. Martorell, E. Ayguade, "Barcelona OpenMP tasks suite: A set of benchmarks targeting the exploitation of task parallelism in OpenMP", International Conference on Parallel Processing (ICPP), CA, USA, pp.124- 131, 2009.
    A. Tzannes, G.C. Caragea, R. Barua and U. Vishkin, "Lazy binary-splitting: A run-time adaptive work-stealing scheduler", In 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), New Orleans, LA, pp.179- 190, 2012.
    C.H. Liao, J.Q. Daniel, P. Thomas and R. Bronis, "A ROSEbased OpenMP 3.0 research compiler supporting multiple runtime libraries", Proc. of International Workshop on OpenMP, Tsukuba, Japan, pp.15-28, 2010.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (220) PDF downloads(621) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return