XU Guangping, LIN Sheng, SHI Kai, ZHANG Hua. NewBalance: Efficient Data Space Management and Algorithmic Optimization for Large-Scale Storage Systems[J]. Chinese Journal of Electronics, 2017, 26(3): 493-501. doi: 10.1049/cje.2017.03.012
Citation: XU Guangping, LIN Sheng, SHI Kai, ZHANG Hua. NewBalance: Efficient Data Space Management and Algorithmic Optimization for Large-Scale Storage Systems[J]. Chinese Journal of Electronics, 2017, 26(3): 493-501. doi: 10.1049/cje.2017.03.012

NewBalance: Efficient Data Space Management and Algorithmic Optimization for Large-Scale Storage Systems

doi: 10.1049/cje.2017.03.012
Funds:  This work is supported by the National Natural Science Foundation of China (No.61201234, No.61202381, No.61202168, No.61572357, No.61170301, No.11426163), Tianjin Natural Science Foundation (No.13JCQNJC00400, No.14JCZDJC31700), and National Engineering Degree Postgraduate Education Project (No.2016ZX076).
More Information
  • Corresponding author: ZHANG Hua (corresponding author) was born in 1962. She received the Ph.D. degree in computer science from Tianjin University. She is now a professor of Tianjin University of Technology. Her research interests include distributed system and computer supported cooperative work. (Email:hzhang62@163.com)
  • Received Date: 2015-09-21
  • Rev Recd Date: 2015-11-15
  • Publish Date: 2017-05-10
  • Fragmentation usually occurs when data space of original storage nodes has to be reallocated to new added storage nodes during the scale-out evolution of the large-scale storage system. It greatly influences its performance and becomes a challenge to manage the whole space. We present an efficient space management framework, called NewBalance, to reduce fragmentation with the minimum data movement while keeping the storage system load balance. The space management framework has two phases including the collection phase and the allocation phase. For the collection phase, we propose a novel algorithm, called the greedy bi-direction collector, which collects enough space for the new storage nodes. For the allocation phase, we formally represent it as a variant of the bin packing problem and then utilize some bin packing heuristics including the first fitting and the best fitting to allocate collected intervals to new added storage nodes. The experimental results show that the amount of intervals can be reduced by 20%~55% and our algorithmic optimization improves the data lookup performance by at least 10% and the scale-out performance by 2X~3X.
  • loading
  • E. Nightingale, J. Elson, J. Fan, et al., "Flat datacenter storage", Proc. of USENIX Symposium on Operating System Design and Implementation (OSDI), Hollywood, CA, USA, pp.1-15, 2012.
    M. Mesnier, G. Ganger and E. Riedel, "Object-based storage", IEEE Communications Magazine, Vol.41, No.8, pp.84-90, 2003.
    D. Karger, E. Lehman, T. Leighton, et al., "Consistent hashing and random trees:Distributed caching protocols for relieving hot spots on the World Wide Web", Proc. of ACM Symposium on Theory of computing (STOC), El Paso, Texas, USA, pp.654-663, 1997.
    C. Plaxton, R. Rajaraman and A. Richa, "Accessing nearby copies of replicated objects in a distributed environment", Proc. of ACM Symposium on Parallel Algorithms and Architectures (SPAA), Newport, Rhode Island, USA, pp.311-320, 1997.
    S. Surana, B. Godfrey, K. Lakshminarayanan, et al., "Load balancing in dynamic structured peer-to-peer systems", Performance Evaluation, Vol.63, No.3, pp.217-240, 2006.
    R. Kleinberg and T. Leighton, "Consistent load balancing via spread minimization", Proc. of ACM Symposium on Theory of Computing (STOC), San Diego, CA, USA, pp.565-574, 2003.
    A. Miranda, S. Effert, Y. Kang, et al., "Reliable and randomized data distribution strategies for large scale storage systems", Proc. of the 18th International Conference on High Performance Computing (HiPC), Bangalore, India, pp.1-10, 2011.
    A. Miranda, S. Effert, Y. Kang, et al., "Random slicing:Efficient and scalable data placement for large-scale storage systems", ACM Transactions on Storage (TOS), Vol.10, No.3, ID 9, pp.1-35, 2014.
    S. Ghemawat, H. Gobioff and S. Leung, "The Google file system", Proc of ACM Symposium on Operating Systems Principles (SOSP), Bolton Landing, NY, USA, pp.29-43, 2003.
    K. Shvachko, "HDFS scalability:The limits to growth", Login, Vol.35, No.2, pp.6-16, 2010.
    S. Weil, S. Brandt, E. Miller, et al., "CRUSH:Controlled, scalable, decentralized placement of replicated data", Proc. of ACM/IEEE Conference on Supercomputing (SC), Tampa, Florida, ID 122, pp.1-12, 2006.
    A. Brinkmann, S. Effert, F. Heide, et al., "Dynamic and redundant data placement", Proc. of IEEE International Conference on Distributed Computing Systems (ICDCS), Toronto, ON, pp.1-10, 2007.
    U. Wieder, "Balanced allocations with heterogenous bins", Proc. of ACM Symposium on Parallel Algorithms and Architectures (SPAA), San Diego, California, USA, pp.188-193, 2007.
    M. Mense and C. Scheideler, "SPREAD:An adaptive scheme for redundant and fair storage in dynamic heterogeneous storage systems", Proc. of the ACM-SIAM Symposium on Discrete Algorithms (SODA), San Francisco, California, USA, pp.1135-1144, 2008.
    G. Decandia, D. Hastorun, M. Jampani, et al., "Dynamo:Amazon's highly available key-value store", Proc. of the ACM SIGOPS Symposium on Operating Systems Principles (SOSP), Stevenson, Washington, USA, pp.205-220, 2007.
    P. Lensing, T. Cortes, and A. Brinkmann, "Direct lookup and hash-based metadata placement for local file systems", Proc. of International Systems and Storage Conference (SYSTOR), Haifa, Israel, ID 5, pp.1-15, 2013.
    A. Miranda and T. Cortes, "CRAID:Online RAID upgrades using dynamic hot data reorganization", Proc. of USENIX Conference on File and Storage Technologies (FAST), Santa Clara, CA, USA, pp.133-146, 2014.
    S. Weil, S. Brandt, E. Miller, et al., "Ceph:A scalable, high performance distributed file system", Proc. of the 7th Symposium on Operating Systems Design and Implementation (OSDI), Seattle, Washington, USA, pp.307-320, 2006.
    B. LeCun, T. Mautor, F. Quessette, et al., "Bin packing with fragmentable items:Presentation and approximations", Theoretical Computer Science, Vol.602, No.18, pp.50-59, 2015.
    ZHANG Rui, LIN Chuang, MENG Kun, et al., "REST:A reliable and efficient storage technology for content cloud service", Acta Electronica Sinica, Vol.42, No.4, pp.633-639, 2014. (in Chinese)
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (131) PDF downloads(439) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return